The Case of the ‘X’ That Didn’t Kill the App

One of our MFC apps recently had a weird bug: occasionally debug builds would result in a binary where the ‘X’ corner button killed the app window but not the app – it would just keep idle indefinitely until killed from the task manager. I found no similar cases online and figured the investigation is worth sharing.

The immediate cause was quick to isolate – it was located in wincore.cpp, at CWnd::OnNcDestroy() :

// WM_NCDESTROY is the absolute LAST message sent.
void CWnd::OnNcDestroy()
{
  // cleanup main and active windows
  CWinThread* pThread = AfxGetThread();
  if (pThread != NULL)
  {
    if (pThread->m_pMainWnd == this)
    {
      if (!afxContextIsDLL)
      {
        // shut down current thread if possible
        if (pThread != AfxGetApp() || AfxOleCanExitApp())
          AfxPostQuitMessage(0);
      }
      pThread->m_pMainWnd = NULL;
  }
  if (pThread->m_pActiveWnd == this)
    pThread->m_pActiveWnd = NULL;
}

In the problematic builds, when OnNcDestroy was called for the main window of the main thread, afxContextIsDll would always evaluate to true, so AfxPostQuitMessage was never called.

afxContextIsDll is defined in afxwin.h :

#define afxContextIsDLL     AfxGetModuleState()->m_bDLL

The investigation that followed was not so quick, as any reference to MFC module states is a lid of a hefty can of worms.  Most of the MODULE_STATE apparatus is implemented in afxstate.cpp, and specifically:

AFX_MODULE_STATE* AFXAPI AfxGetModuleState()
{
  _AFX_THREAD_STATE* pState = _afxThreadState;
  ENSURE(pState);
  AFX_MODULE_STATE* pResult;
  if (pState->m_pModuleState != NULL)
  {
    // thread state's module state serves as override
    pResult = pState->m_pModuleState;
  }
  else
  {
  // otherwise, use global app state
  pResult = _afxBaseModuleState.GetData();
  }
ENSURE(pResult != NULL);
return pResult;
}

_afxThreadState has static access, is a not-so-thin wrapper around some thread-local storage, and implements a non trivial operator=() – so it is not a simple matter of placing a data breakpoint and see who modifies it. Travelling the code gave several direct modification paths (there is in fact an AfxSetModuleState) but most of the action seemed to be in the construction of global THREAD_STATE objects, very early in the app lifecycle. Things got hairy.

On my way home a certain suspicion started to arise.

I arrived at the office the next day, and indeed:

doublemodules

The app linked against both debug and retail versions of the MFC runtime dll.

This entails probably thousands of violations of the ODR principle, and even worse – by the c++ standard such violations do not need to be reported by the linker!

Each of the two MFC dll’s maintain its own MODULE_STATE. During link, two matches for AfxSet/GetModuleState are seen by the linker (one in each MFC version) .Which m_bDLL is altered depends on which version of AfxSetModuleState is resolved. Which m_bDLL state is seen by CWnd::OnNcDestroy() depends on which version of AfxGetModuleState() is resolved!

I paused the app and used the context operator to verify at the watch window:

{,,mfc80d.dll}(*(AfxGetModuleState())).m_bDLL                1

{,,mfc80.dll}(*(AfxGetModuleState())).m_bDLL                   0

The two MFC dll versions (debug and release) see different states of the module.

While the investigation so far did not reveal the explicit flow that generated the inconsistency, obviously the erroneous dependency on the release MFC dll (in debug builds) must be removed.

I rebuilt the solution with /VERBOSE. The output showed a seemingly innocent static lib, written over 3 years ago, that linked against mfc80.dll in debug builds for no apparent reason. A look into the lib properties showed that in debug builds only it was marked to ignore all default libs… This erroneous switch was removed and the problem did not appear since.

Bottom Lines

If you experience the same symptoms, here’s one thing to try:

  1. Check at the Modules window for dependencies on both debug and release versions of MFC or another runtime component.
  2. If any are found, use /VERBOSE to track the unexpected dependency.
  3. This dependency might be explicit, or generated via /NODEFAULTLIB. Either way, Remove it!
Advertisements
This entry was posted in Debugging, VC++. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s