Debugging Handle Leaks

This is all well documented stuff and I won’t go into details – it’s here mostly for self reference (3rd time I had to chase this down in google).

Steps are:

(1) Install WDK to integrate the WinDbg engine with VS (not strictly necessary, but very convenient).

(2) Attach to the debugee via ‘User Mode’ transport:

image

(3) Continue execution, and break at the spot where the handle count is at ‘reference’ value.

(4) At the ‘Debugger Immediate Window’ type ‘!htrace –enable’

(5) Continue execution and break at a point where the handle count is supposed to be at reference value but isn’t.

(6) At the ‘Debugger Immediate Window’ type ‘!htrace –diff’.

 

The offending stack[s] should be visible at the debugger immediate window.  If you get garbage, there’s a good chance you’re debugging a 32bit process on a 64bit machine.

UseDebugLibraries and Wrong Defaults for VC++ Project Properties

Many of the projects I’m working on seem to have wrong default properties in Debug configuration.  For example, ‘Runtime Library’ is explicitly set to /MDd but defaults to /MD. ‘Basic Runtime Checks’ is explicitly set to /RTC1 but defaults to  none. ‘Optimization’ is explicitly set to /Od but defaults to /O2, and so on:

image

image

This recently caused us some trouble, and the investigation results are dumped below.

The direct reason is that these vcxproj’s are missing the ‘UseDebugLibraries’ element, under the ‘Configuration’ PropertyGroup: it should be set to true in Debug and false in Release.   A correct vcxproj should include some elements like –

<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
    <ConfigurationType>StaticLibrary</ConfigurationType>
    <UseDebugLibraries>true</UseDebugLibraries>
    <PlatformToolset>v120</PlatformToolset>
    <CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
    <ConfigurationType>StaticLibrary</ConfigurationType>
    <UseDebugLibraries>false</UseDebugLibraries>
    <PlatformToolset>v120</PlatformToolset>
    <CharacterSet>Unicode</CharacterSet>
</PropertyGroup>

Most ‘Configuration’ sub-elements (CharacterSet, ConfigurationType etc.) directly control import of custom property sheets, but UseDebugLibraries doesn’t. Instead, it is expected in various hooks around regular property sheets. For example, Microsoft.Cpp.mfcDynamic.props includes the following -

<ClCompile>
<RuntimeLibrary Condition="'$(UseDebugLibraries)' != 'true'">MultiThreadedDll</RuntimeLibrary>
<RuntimeLibrary Condition="'$(UseDebugLibraries)' == 'true'">MultiThreadedDebugDll</RuntimeLibrary>
</ClCompile>

Why UseDebugLibraries was missing from some libraries and present in others remained a mystery until I noticed that the younger libraries tended to have this element. Indeed, the real culprit is the migration from VS2008- (vcproj format) to VS2010+ (vcxproj/MSBuild format).  MS’s migration code just did not add this element. The generated projects are functional – they just explicitly set every individual compilation switch affected by UseDebugLibraries, which makes it overly verbose and a bit sensitive – especially in the presence of junior devs who tend to stick to defaults…

So every library you have which is 4Y+ old is susceptible to this migration bug, and I suggest you manually add UseDebugLibraries.  If you have a central prop sheet where you can control multiple projects – add it there.

Not much point in reporting this to MS, is there? The chances of a fix are practically zero, and the issue would get equal web-presence here.

Reading Specific Monitor Dimensions

Almost 2 years ago I wrote about the proper way of getting the EDID – and in particular the physical monitor size. I did leave a loose end:

I actually had to query the dimensions of a specific monitor (specified HMONITOR). This was an even nastier problem, and frankly I’m just not confident yet that I got it right. If I ever get to a code worth sharing – I’ll certainly share it here.

Several commenters requested the full solution, and two years later I noticed this is still the most highly viewed post on this blog – so while I am still uncertain of the solution it’s worth dumping here and hope it does more good than evil out there.

Bridging the HMONITOR and the HDEVINFO Universes

HMONITOR is the primary user mode handle to per-monitor information, dating back to GDI. This is how you specify your monitor of interest:  you can obtain an HMONITOR from a window or list them all and pick the one whose RECT matches a location of interest.

HDEVINFO is a handle to a device information set, the primary device-installation data type. This is what eventually allows you to read the per-monitor EDID and read – among others – the monitor physical dimensions.

There is no I couldn’t find a direct way of obtaining one handle from the other. There are many description strings scattered along structs obtainable from these two data types, and the closest I have to a match are these two routes:

 

HMONITOR –> DISPLAY_DEVICE –> DeviceID

HDEVINFO -> SP_DEVINFO_DATA –> Instance

 

As an example, one of my monitors returns ‘DeviceID’ of:

MONITOR\GSM4B85\{4d36e96e-e325-11ce-bfc1-08002be10318}\ 0011

and ‘Instance’ of

DISPLAY\GSM4B85\5&273756F2&0&UID1048833

So DeviceID and Instance share a common substring.    There is probably more robust information in the last substrings (‘0011’, ‘5&273756F2&0&UID1048833’) but Device/Instance IDs are a mess, and I can’t for the life of me find a way to use this extra info.  I suspect (based on this 2010-2013 discussion) it was once possible but Windows 7 broke it.

 

Teh Codez

Usual disclaimers apply more than ever – your mileage may seriously vary on this one. Please do tell me in the comments if it worked for you.

 

#include <atlstr.h>
#include <SetupApi.h>
#include <cfgmgr32.h>   // for MAX_DEVICE_ID_LEN
#pragma comment(lib, "setupapi.lib")

#define NAME_SIZE 128

const GUID GUID_CLASS_MONITOR = { 0x4d36e96e, 0xe325, 0x11ce, 0xbf, 0xc1, 0x08, 0x00, 0x2b, 0xe1, 0x03, 0x18 };

CString Get2ndSlashBlock(const CString& sIn)
{
	int FirstSlash = sIn.Find(_T('\\'));
	CString sOut = sIn.Right(sIn.GetLength() - FirstSlash - 1);
	FirstSlash = sOut.Find(_T('\\'));
	sOut = sOut.Left(FirstSlash);
	return sOut;
}

// Assumes hEDIDRegKey is valid
bool GetMonitorSizeFromEDID(const HKEY hEDIDRegKey, short& WidthMm, short& HeightMm)
{
	DWORD dwType, AcutalValueNameLength = NAME_SIZE;
	TCHAR valueName[NAME_SIZE];

	BYTE EDIDdata[1024];
	DWORD edidsize = sizeof(EDIDdata);

	for (LONG i = 0, retValue = ERROR_SUCCESS; retValue != ERROR_NO_MORE_ITEMS; ++i)
	{
		retValue = RegEnumValue(hEDIDRegKey, i, &valueName[0],
			&AcutalValueNameLength, NULL, &dwType,
			EDIDdata, // buffer
			&edidsize); // buffer size

		if (retValue != ERROR_SUCCESS || 0 != _tcscmp(valueName, _T("EDID")))
			continue;

		WidthMm = ((EDIDdata[68] & 0xF0) << 4) + EDIDdata[66];
		HeightMm = ((EDIDdata[68] & 0x0F) << 8) + EDIDdata[67];

		return true; // valid EDID found
	}

	return false; // EDID not found
}

bool GetSizeForDevID(const CString& TargetDevID, short& WidthMm, short& HeightMm)
{
	HDEVINFO devInfo = SetupDiGetClassDevsEx(
		&GUID_CLASS_MONITOR, //class GUID
		NULL, //enumerator
		NULL, //HWND
		DIGCF_PRESENT | DIGCF_PROFILE, // Flags //DIGCF_ALLCLASSES|
		NULL, // device info, create a new one.
		NULL, // machine name, local machine
		NULL);// reserved

	if (NULL == devInfo)
		return false;

	bool bRes = false;

	for (ULONG i = 0; ERROR_NO_MORE_ITEMS != GetLastError(); ++i)
	{
		SP_DEVINFO_DATA devInfoData;
		memset(&devInfoData, 0, sizeof(devInfoData));
		devInfoData.cbSize = sizeof(devInfoData);

		if (SetupDiEnumDeviceInfo(devInfo, i, &devInfoData))
		{
			TCHAR Instance[MAX_DEVICE_ID_LEN];
			SetupDiGetDeviceInstanceId(devInfo, &devInfoData, Instance, MAX_PATH, NULL);

			CString sInstance(Instance);
			if (-1 == sInstance.Find(TargetDevID))
				continue;

			HKEY hEDIDRegKey = SetupDiOpenDevRegKey(devInfo, &devInfoData,
				DICS_FLAG_GLOBAL, 0, DIREG_DEV, KEY_READ);

			if (!hEDIDRegKey || (hEDIDRegKey == INVALID_HANDLE_VALUE))
				continue;

			bRes = GetMonitorSizeFromEDID(hEDIDRegKey, WidthMm, HeightMm);

			RegCloseKey(hEDIDRegKey);
		}
	}
	SetupDiDestroyDeviceInfoList(devInfo);
	return bRes;
}

HMONITOR  g_hMonitor;

BOOL CALLBACK MyMonitorEnumProc(
	_In_  HMONITOR hMonitor,
	_In_  HDC hdcMonitor,
	_In_  LPRECT lprcMonitor,
	_In_  LPARAM dwData
	)

{
	// Use this function to identify the monitor of interest: MONITORINFO contains the Monitor RECT.
	MONITORINFOEX mi;
	mi.cbSize = sizeof(MONITORINFOEX);

	GetMonitorInfo(hMonitor, &mi);
	OutputDebugString(mi.szDevice);

	// For simplicity, we set the last monitor to be the one of interest
	g_hMonitor = hMonitor;

	return TRUE;
}

BOOL DisplayDeviceFromHMonitor(HMONITOR hMonitor, DISPLAY_DEVICE& ddMonOut)
{
	MONITORINFOEX mi;
	mi.cbSize = sizeof(MONITORINFOEX);
	GetMonitorInfo(hMonitor, &mi);

	DISPLAY_DEVICE dd;
	dd.cb = sizeof(dd);
	DWORD devIdx = 0; // device index

	CString DeviceID;
	bool bFoundDevice = false;
	while (EnumDisplayDevices(0, devIdx, &dd, 0))
	{
		devIdx++;
		if (0 != _tcscmp(dd.DeviceName, mi.szDevice))
			continue;

		DISPLAY_DEVICE ddMon;
		ZeroMemory(&ddMon, sizeof(ddMon));
		ddMon.cb = sizeof(ddMon);
		DWORD MonIdx = 0;

		while (EnumDisplayDevices(dd.DeviceName, MonIdx, &ddMon, 0))
		{
			MonIdx++;

			ddMonOut = ddMon;
			return TRUE;

			ZeroMemory(&ddMon, sizeof(ddMon));
			ddMon.cb = sizeof(ddMon);
		}

		ZeroMemory(&dd, sizeof(dd));
		dd.cb = sizeof(dd);
	}

	return FALSE;
}

int _tmain(int argc, _TCHAR* argv [])
{
	// Identify the HMONITOR of interest via the callback MyMonitorEnumProc
	EnumDisplayMonitors( NULL, NULL, MyMonitorEnumProc, NULL);

	DISPLAY_DEVICE ddMon;
	if (FALSE == DisplayDeviceFromHMonitor(g_hMonitor, ddMon))
		return 1;

	CString DeviceID;
	DeviceID.Format(_T("%s"), ddMon.DeviceID);
	DeviceID = Get2ndSlashBlock(DeviceID);

	short WidthMm, HeightMm;
	bool bFoundDevice = GetSizeForDevID(DeviceID, WidthMm, HeightMm);

	return !bFoundDevice;
}

Blogging 101

This is post #101, which makes the previous post #100.

When I started all this I didn’t think I’d have 100 things to say.  Glad I was wrong, and hope to still have useful things to say for 100 more posts.

Thanks for sticking around!

Vector Deleting Destructor and Weak Linkage

Now that the discussions on weak linker symbols and vector deleting destructors are in place, it is time to discuss a fact that might seem esoteric but has far reaching implications. After that, it is time to ask for your help.

In VC++, Vector deleting destructors are defined with weak linkage at the translation unit that defined the class, and strong linkage at any translation unit that calls new[] on the class.

Say what?

The first part of this statement (v-d-dtors have weak linkage) was already demonstrated at the post on weak linkage – given any cpp file which defines a non trivial class, you can dumpbin its obj file and see for yourself.

Now some code to demonstrate the full statement:

 
//C.h 
struct C 
{
  virtual ~C(); 
}

//C.cpp 
#include "C.h" 
C::~C() {} 

//D.h 
struct D 
{ 
Func(); 
} 

//D.cpp 
#include "D.h" 
#include "C.h" 
D::Func() 
{ 
  C* = new C[42]; 
} 

A dumpbin of C.obj shows:

017 00000000 UNDEF  notype ()    External     | ??3@YAXPAX@Z (void __cdecl operator delete(void *))
018 00000000 SECT4  notype ()    External     | ??1C@@UAE@XZ (public: virtual __thiscall C::~C(void))
019 00000000 SECT6  notype ()    External     | ??_GC@@UAEPAXI@Z (public: virtual void * __thiscall C::`scalar deleting destructor'(unsigned int))
01A 00000000 UNDEF  notype ()    WeakExternal | ??_EC@@UAEPAXI@Z (public: virtual void * __thiscall C::`vector deleting destructor'(unsigned int))

While a dumpbin of D.obj shows:

01D 00000000 UNDEF  notype ()    External     | ??_L@YGXPAXIHP6EX0@Z1@Z (void __stdcall `eh vector constructor iterator'(void *,unsigned int,int,void (__thiscall*)(void *),void (__thiscall*)(void *)))
01E 00000000 UNDEF  notype ()    External     | ??_M@YGXPAXIHP6EX0@Z@Z (void __stdcall `eh vector destructor iterator'(void *,unsigned int,int,void (__thiscall*)(void *)))
01F 00000000 UNDEF  notype ()    External     | ??2@YAPAXI@Z (void * __cdecl operator new(unsigned int))
020 00000000 UNDEF  notype ()    External     | ??3@YAXPAX@Z (void __cdecl operator delete(void *))
021 00000000 SECT8  notype ()    External     | ?Func@D@@QAEXXZ (public: void __thiscall D::Func(void))
022 00000000 UNDEF  notype ()    External     | ??1C@@UAE@XZ (public: virtual __thiscall C::~C(void))
023 00000000 SECT4  notype ()    External     | ??0C@@QAE@XZ (public: __thiscall C::C(void))
024 00000000 SECT6  notype ()    External     | ??_EC@@UAEPAXI@Z (public: virtual void * __thiscall C::`vector deleting destructor'(unsigned int))

What this means is that to successfully complete the linkage of C.obj, the linker must now load D.obj – because both contain implementations of the same function, but C defines a weak external implementation and D defines a strong external implementation (of a C method!).

Ok, that’s kinda weird, but why should I care?

Here’s why:

What happens when C.cpp and D.cpp are part of a static library?

Unlike executables (.exe or .dll), when processing a static lib the linker only loads obj files that are referenced, i.e., whose contents are needed for successful linkage. Once loaded, an obj file must have it’s contents successfully link (unless you’re building with /GL, but let’s ignore that here). Let’s expand the previous example a bit :

//main.cpp
#include "StaticLib\C.h"

int main(int, char)
{
  C c;
  return 0;
}

//StaticLib\C.h 
struct C 
{
  virtual ~C(); 
}

//StaticLib\C.cpp 
#include "C.h" 
C::~C() {} 

//StaticLib\D.h 
struct D 
{ 
  Func(); 
}

//StaticLib\D.cpp 
#include "D.h" 
#include "C.h" 

extern void SomeJunkImplementedElsewhere();
D::Func() 
{ 
  C* = new C[42]; 
  SomeJunkImplementedElsewhere();
}

Can you already see what happens now?

Now for the program to successfully build you must satisfy D.cpp’s linkage – which means dragging in another library – although you never consumed D’s functionality in the first place.

I wish this was just a theoretical peculiarity. The solutions I’m working on consist of a complicated network of literally hundreds of static libraries, and time and time again we find ourselves forced to drag in weird dependencies that the code we actually run never uses.  It seems unbelievable, but almost all of these unexplainable dependencies boil down to this esoteric fact – vector deleting destructors have weak linkage at the point of class definition.

That was nice. Now go and report it.

I did. Over half a year ago.   The report was originally closed as ‘By Design’, and after an explicit request the following explanation from Karl Niu arrived:

To explain the “By Design” resolution, imagine that you have “new A[n]” and “delete[] pA” in different translation units. In such a case, the compiler needs to define the strong external in the translation unit containing the “new A[n]“.

Which I just don’t understand: the weak/strong debate is not over new[] or delete[], but rather over vector deleting destructors, which are not user-overridable in the first place. Wherever delete[] is overloaded, it should be able to fetch the vector-deleting-dtor from the translation unit that defined it – hopefully, the one that defined the class it’s deleting.   I tried to ask again, twice, and got no response for 6 months now.

Now, I regularly report many bugs at MS Connect, almost all of which never get resolved (which I can live with. I’m doing this mostly in hope of helping fellow devs googling their trouble) – but this one leaves me frustrated. It feels as if despite my best efforts I failed to clearly communicate the issue.    It seems like an esoteric technicality, yet it actively hinders decoupling – thereby damaging large software systems at the architecture level!

Why golly Ofek, that’s really bad. But what can I do?

You can either -

(1) Dig in and tell me in the comments where I’m wrong.  It was initially resolved as ‘by design’, and even got an explanation (sorta), so I might be missing some valid reason for this sorry state of affairs.

(2) Go to the bug page and upvote it.  This one realy deserves attention from the VC++ team.

But I urge you to do either.  Thanks!

red-pill-or-blue-pill

Executing Code Once Per Thread in an OpenMP Loop

Take this toy loop:

#pragma parallel for
  for (int i = 0; i < 1000000; ++i)
    DoStuff();

Now suppose you want to run some preparation code once per thread – say, SetThreadName, or SetThreadPriority or whatnot.  How would you go about that? If you code it before the loop the code would execute once, and if you code it inside the loop it would be executed 1000000 times.

Here’s a useful trick: private loop variables run their default constructor once per thread . Just package the action in some type’s constructor, and declare an object of that type as a private loop variable:

struct ThreadInit
{
  ThreadInit()
  {
    SetThreadName("OMP thread");
    SetThreadPriority(THREAD_PRIORITY_BELOW_NORMAL);
  }
};

...
  ThreadInit ti;

#pragma parallel for private(ti)
  for (int i = 0; i < 1000000; ++i)
    DoStuff();

You can set a breakpoint in ThreadInit::ThreadInit() and watch it being executed exactly once in each thread. You can also enjoy the view at the threads window, as all OMP threads now have name and priority adjusted.

[Edit:] Improvement

The original object created before entering the parallel region – is just a dummy meant for duplication, but still runs its own constructor and destructor.  This might be benign, as in the example above (redundant set of thread properties), but in my real life situation I used the ThreadInit object to create a thread-specific heap - and an extra heap is not an acceptable overhead.

Here’s another trick: as the spec says, a default constructor is called for the object copies in the parallel region. Just create the dummy object with a non-default constructor, and make sure the real action happens only in the default one.   Here’s one way to do so (you can also code different ctors altogether):

struct ThreadInit
{
  ThreadInit(bool bDummy = false)
  {
    if(!bDummy)
    {
      SetThreadName("OMP thread");
      SetThreadPriority(THREAD_PRIORITY_BELOW_NORMAL);
    }
  }
};

...
  ThreadInit ti(true);

#pragma parallel for private(ti)
  for (int i = 0; i < 1000000; ++i)
    DoStuff();

 

On Vector Deleting Destructors and some new/delete internals

A word is due on vector deleting destructors – previously mentioned as the only functions that got weakly bound by the linker. The usual disclaimers apply: everything that follows is my own investigation, in code and online. Nothing here is official in any way, and even if I did get something right – it is subject for change at any time.

 

 

While C’s malloc/free deal purely with memory management, C++’s new/delete do more: they construct/destruct the objects being allocated/deallocated.  (Preemptive nitpick: there are other differences, they are not the subject of this post). There is a small family of compiler generated functions that help achieve these additional tasks:  vector constructor, scalar deleting destructor, vector deleting destructor, and vector ctor/dtor iterators.

The following toy code will be used to illustrate:

struct Whatever
{
	Whatever()  {};
	~Whatever() {};
};

int main(int argc, char* argv[])
{
	Whatever* pW = new Whatever;
	delete pW;

	Whatever* arrW = new Whatever[10];
	delete[] arrW;

	return 0;
}

new

when ‘new Whatever’ is executed, two things happen:

1) Memory is allocated, by a call to operator-new (which unless overridden, is essentially a wrapper around plain old malloc),

2) Whatever’s constructor is called.

Proof by a glimpse into unoptimized disassembly:

image

delete

When ‘single’ delete  is called on a Whatever pointer, the opposite happens in reverse order: first Whatever’s destructor is called, then operator-delete (which by default is equivalent to ‘free’) frees the unpopulated memory.  In this case, however, the compiler does not call ~Whatever() and operator-delete directly, but rather generates and invokes a helper function that wraps these two calls. This helper is called scalar deleting destructor – which makes sense, since it destructs and deletes.   Some more disassembly screenshots:

image

image

Why is the new+construction inlined and the delete+destruction wrapped in a helper?  Beats me. I would have thought that the exact same inlining tradeoff (binary size vs. call overhead) applies for both cases.

new[]  / delete[]

When the vector versions new[] and delete[] are called, an additional layer is added, to address the need to iterate over the Whatever object slots, and construct/destruct them one at a time.

Enter ‘vector constructor iterator’ and ‘vector destructor iterator’. In detail:

1) a new[] statement translates into a call to an operator-new with size enough to hold all Whatever’s, then a call to ‘eh vector constructor iterator’ which is essentially a for-loop of Whatever::Whatever()’s in the designated array locations.

2) A delete[] statement translates into a single call to vector deleting destructor, which in turn calls ‘eh vector destructor iterator’ and then operator delete.

Being merciful, I won’t hurt your eyes with more disassembly. Just believe me or go dig in yourselves.

Other findings, in no particular order

1) the ‘eh’ prefix in the vector ctor/dtor iterators stands for exception handling. If you compile with no c++ exceptions, a non-eh version of the iterators is emitted.  (This has nothing to do with std::nothrow, which controls the behaviour of operator new - a different stage of the object creation.)

2) The deleting destructors, both scalar and vector, are generated as hidden methods of the type Whatever.  All other helper functions (vector constructor, ctor/dtor iterators) are not.   Not sure why, but I suspect it has to do with a supposed need for weak linkage – more on that in a future post.

3) The compiler is smart enough to avoid generating and invoking unneeded helper functions. For example, comment out the coded ctor Whatever::Whatever(), and watch as the vector constructor call vanishes.

4) The vector deleting destructor is unique in that it has some built in flexibility. Raymond Chen spelled out pseudo code for it, which I shall shamelessly paste now:

void Whatever::vector deleting destructor(int flags)
{
  if (flags & 2) { // if vector destruct
    size_t* a = reinterpret_cast<size_t*>(this) - 1;
    size_t howmany = *a;
    vector destructor iterator(p, sizeof(Whatever),
      howmany, Whatever::~Whatever);
    if (flags & 1) { // if delete too
      operator delete(a);
    }
  } else { // else scalar destruct
    this->~Whatever(); // destruct one
    if (flags & 1) { // if delete too
      operator delete(this);
    }
  }
}

So from the vector deleting dtor’s viewpoint, memory deallocation is optional and the same function can serve as a scalar deleting dtor (when flags & 2 ==0) . In practice I have never seen a vector deleting destructor called with ‘flags’ different from 3 (i.e., vector, deleting destructor).  I can come up with somewhat contrived scenarios where this flexibility might be useful – say, a memory manager that wants to destroy objects but keep the memory for faster future usage. However, deleting dtors are accessible only to the compiler anyway, so the purpose of this flexibility is not clear to me.   Insights are very welcome.