Another Look at the VS2012 Auto Vectorizer

A while ago I did some experimenting with (than beta) VS2012. After these experiments our team migrated to the 2012 IDE but kept to the 2010 toolset. Since then much had happened: an official VS2012 launch + 4 updates, rather thorough documentation and quite a few online talks by the compiler team. It was high time to take another look at the 2012 toolset.

What I care about

Our scenario is somewhat unpleasant, but I suspect very typical in the enterprise world: we have a large (~800K LOC) C++ code base, with some legacy niches that date back 15+ years. The code is computationally intensive and sensitive to performance and yet extremely rich in branch logic. While C++ language advances are nice, what I care about are backend improvements – and specifically, auto vectorization.

Why you should care too

In the last decade or so, virtually 100% of the progress in the x86/x64 ISA was made in vector units on the processor*. The software side, however, is veeeery slow to catch up: to this day, making any use of SSE/AVX processor-treats requires non-standard, non-portable, hard, low level tweaks – that make economic sense only on specific market niches (say video editing & 3D). Moreover, as the years go by – even in these niches such costly tweaks make less and less sense, as this effort is probably better off invested in moving execution to the GPU.

If you think about this for a second – our industry is in a somewhat ridiculous state. For over 10 years, the leading general-purpose processor architecture is evolving in a direction that just isn’t useful to most software it runs!

This is exactly where auto-vectorization ought to come in.
 
To make use of all these virgin, luscious silicon fields, C++ compilers need to be extra clever. They must be able to reason about code – without any help from the code itself (well, not yet) – and automatically translate execution to SIMD units where possible. This is a tough job, and AFAIK until recently only Intel’s compiler was somewhat up to the task. (I passionately hate Intel’s compiler for different reasons, but that’s besides the point now). Only towards VS2012 did MS decide to try to catch up and add decent vectorization support, and if this effort succeeds – I truly believe it can revolutionize the SW industry, no less.

So, I dedicated two afternoons to build one of our products in VS2012 latest public bits with /Qvec-report:2 and try to understand how much of the vectorization potential is fulfilled.

Well, are we there yet?

No.

Only a negligible percentage of loops was vectorized successfully. A random inspection of reported vectorization failures shows that almost all of them are not due to real syntactic reasons. I created connect reports for some of the bogus failures – here are the technical details.

1. Copy operations are pretty much never vectorized. Even MS internal STL code –

_OutIt _Fill_n(_OutIt _Dest, _Diff _Count, const _Ty& _Val)
{    // copy _Val _Count times through [_Dest, ...)
   for (; 0 < _Count; --_Count, ++_Dest)
       *_Dest = _Val;
    return (_Dest);
}

– fails to vectorize, with reported reason being the generic 500. Even worse, the following snippet from MSDN itself:

void code_1300(int *A, int *B)
{
    // Code 1300 is emitted when the compiler detects that there is
    // no computation in the loop body.

    for (int i=0; i<1000; ++i)
    {
        A[i] = B[i]; // Do not vectorize, instead emit memcpy
    }
}

– fails to emit memcpy, as it supposedly does. Eric Brumer responds they are indeed working on this very problem.

2. Vectorization decisions by the compiler are extremely sensitive to unrelated code changes. Eric Brumer’s investigation – in the connect link – shows (IIUC) that vectorization decisions depend on previously made inlining decisions, which is what makes them highly fragile and undependable. The reported reasons for failures in these cases seem outright random. Again, they are working on it.

3. new-operator declaration syntax hides the fact that allocated buffers can be safely considered non-aliased. This might be for valid legal reasons, but comes at a formidable price: the entire alias-analysis done by the compiler is severely crippled, and major optimization opportunities (vectorization being just one) are thereby missed. Stephan Lavavej reports, quote, ‘the compiler back-end team has an active work item to investigate this’.

4. This one is just a quibble, really: turns out the report message ‘1106 Inner loop is already vectorized. Cannot also vectorize the outer loop’ – is misleading. Outer loops are never vectorized, regardless of inner loops vectorization.

5. About vectorizer report ‘1303 Too few loop iterations for vectorization to provide value’: this pretty much rules out any optimization to 2D/3D loops. Now, for one, Intel made a significant investment in short-vector-math-library (intrinsic to the compiler) to harvest potential speedups in just such cases. Second, there are quite a few hand-vectorized 3D libraries out there, so it seems others have also reached the conclusion this is a worthy optimization. So while I don’t have decisive quantitative data – I find this reported reason highly suspicious.

I came by more bumps and weird behaviour, but decided not to investigate any deeper than this.

Bottom Line

People infinitely smarter than me are investing tremendous effort in vectorization technology, and I’m sure it would grow to be impressive. That being said, marketing divisions inevitably work much faster than R&D – and it seems VS2012 is just the start of a ramp-up stage.

For us, VS upgrade is a major hassle. It involves cross team coordination, chasing third party vendors for new builds, and the unavoidable ironing of various wrinkles that come with any migration. I just can’t justify this hassle with any tangible added value for us**. We’ll most likely ‘go Vista’ on VS2012, and just skip it quietly.

I’m anxious to return and test the vectorizer again – but at least after some service-packs*** for VS2013 are out****.
__________________________________________________

* That’s not to say no other significant progress was made in the processor space itself – transistors got tiny, caches got enormous, memory controllers and graphic processors were integrated in, etc. etc. I am saying that practically all the architectural innovations in the last decade were SIMD extensions.

** Well, there is this undocumented goodie that holds some very tangible added value for us, but probably not enough. More on that some day.

*** I know! updates, updates.

**** Footnotes are fun. Just saying.

Posted in VC++ | Leave a comment

Find Where Types are Passed by Value

Say you’re working on a large code base, and you came across several instances where some type of non-negligible size was passed as argument by value – where it was more efficient to pass by const reference. Fixing a few occurrences is easy, but how can you efficiently list all places throughout the code where this happens?

Here’s a trick: define the type as aligned, and rebuild. The compiler would now shout exactly where the type is passed by value, since aligned types cannot be passed as such.

Double clicking every such error would get you immediately to where a fix is probably in order.

Posted in VC++ | 2 Comments

Discovering Which Projects Depend on a Project – II

In a previous post I shared a hack that enables detection of all projects that depend on a given one, either directly or indirectly. @Matt asks by mail if I can suggest a quick way to isolate only the direct dependencies.

Well as a matter of fact I can, but it would be even uglier than the original hack. First delete the project of interest from the solution:

Then note which of all the other projects have changed. This can be as easy as noting a small ‘v’ besides them in the solution explorer, indicating a check-out:

Turns out that Deleting a project (unlike, say, unloading it) chases it up in the references of all other solution sibling projects, and removes it from their references if present. This in turn causes a change to the project file, which can be easy to spot visually. Sibling projects which refer to the project indirectly still hold references to the intermediate projects – and so are left unchanged. Therefore this hack isolates only direct references.

Of course don’t forget to immediately undo these changes afterwards.

Altogether, these hacks are mighty hackish. If you find yourself caring about dependency management more than once or twice, just go get some tool.

Posted in Visual Studio | Leave a comment

Discovering Which Projects Depend on One

I am working with several large-ish (100+ project) solutions – and at this scale, dependency management is a very real issue. While you can easily view (and set) the dependencies of a project by viewing its references, there is no obvious tool to answer the reverse question: which projects depend on a given one?

Obviously a hack is in order. Enter project dependencies – project references predecessor. Both appear in the project context menu (from the solution explorer):

In a nutshell, dependencies are stored per-solution while references are stored per-project, as they should. But that’s beside the point here. The point is: the dependencies display is smart enough to keep you from forming cyclic dependencies. When you click ‘Project Dependencies…’ you’d see something like this:

The checked boxes indicate projects that the current one (selected in the top combo) includes either in its references or dependencies. The greyed out boxes (marked in a red rectangle here) indicate projects that include the current one in a similar manner. Indeed, if you try and check a greyed out box – thereby adding it to the current project dependencies – you get:

So there you have it: the list of greyed out boxes is a poor man’s answer to the question – which projects depend on the current one.

Note two limitations:

  1. These dependencies are both direct and indirect. Distinguishing these still requires some manual extra work.
  2. This hack applies only linker dependencies among projects, and is blind to dependency by header file inclusion. Generally speaking this amounts to dependency upon interfaces and not implementations (neglecting templates and other inlines), and so is a weaker form of dependency – but still one that might be of interest.

A few months ago I decided such hacks are no replacement for a proper tool, and started using CppDepend. It is not perfect, but I’m growing to like it. Maybe more on that in a future post – but in the meantime this hack should be useful to anyone working in large solutions like mine.

Posted in VC++, Visual Studio | 2 Comments

VS2012 Migration #3: autoexp and NoStepInto Replacements

In the past I blogged quite a few times about two immensely useful albeit mostly-unofficial debugger features: watch modification via autoexp.dat, and step-into modification via NoStepInto registry key. A long while ago I raised two suggestions at MS UserVoice, to invest in making these two semi-hacks into documented, supported features. The first suggestion got some traction, and is officially implemented in VS2012. The 2nd suggestion went mostly ignored – but nevertheless, there’s a new and better – though still undocumented – way to skip functions while stepping.

NatVis files

The Natvis (native-visualizers) file format is the shiny new replacement for autoexp.dat. It is well documented, and although still quite rough around the edges – bugs are accepted and treated, which means that for the first time it is actually supported. The new apparatus comes with several design advantages:

  1. It seems to be better isolated and not to crash the IDE so much,
  2. New visualizer debugging facilities are built in,
  3. Separate customized visualizers can be kept in separate files, allowing easier sharing (e.g., library writers can now share distribute .natvis files with their libraries).
  4. Natvis files can be placed at per-user locations.

It isn’t that much fun rehashing the syntax – being official and all – but I will include here a custom mfc-containers natvis, similar to the autoexp section I shared a while back

<?xml version="1.0" encoding="utf-8"?>
<AutoVisualizer xmlns="http://schemas.microsoft.com/vstudio/debugger/natvis/2010">
  <!--from afxwin.h -->
  <Type Name="CArray&lt;*,*&gt;">
    <AlternativeType Name="CObArray"></AlternativeType>
    <AlternativeType Name="CByteArray"></AlternativeType>
    <AlternativeType Name="CDWordArray"></AlternativeType>
    <AlternativeType Name="CPtrArray"></AlternativeType>
    <AlternativeType Name="CStringArray"></AlternativeType>
    <AlternativeType Name="CWordArray"></AlternativeType>
    <AlternativeType Name="CUIntArray"></AlternativeType>
    <AlternativeType Name="CTypedPtrArray&lt;*,*&gt;"></AlternativeType>
    <DisplayString>{{size = {m_nSize}}}</DisplayString>
    <Expand>
      <Item Name="[size]">m_nSize</Item>
      <Item Name="[capacity]">m_nMaxSize</Item>
      <ArrayItems>
        <Size>m_nSize</Size>
        <ValuePointer>m_pData</ValuePointer>
      </ArrayItems>
    </Expand>
  </Type>

  <Type Name="CList&lt;*,*&gt;">
    <AlternativeType Name="CObList"></AlternativeType>
    <AlternativeType Name="CPtrList"></AlternativeType>
    <AlternativeType Name="CStringList"></AlternativeType>
    <AlternativeType Name="CTypedPtrList&lt;*,*&gt;"></AlternativeType>
    <DisplayString>{{Count = {m_nCount}}}</DisplayString>
    <Expand>
      <Item Name="Count">m_nCount</Item>
      <LinkedListItems>
        <Size>m_nCount</Size>
        <HeadPointer>m_pNodeHead</HeadPointer>
        <NextPointer>pNext</NextPointer>
        <ValueNode>data</ValueNode>
      </LinkedListItems>
    </Expand>
  </Type>
  
  <Type Name="CMap&lt;*,*,*,*&gt;::CAssoc">
    <AlternativeType Name="CMapPtrToWord::CAssoc"></AlternativeType>
    <AlternativeType Name="CMapPtrToPtr::CAssoc"></AlternativeType>
    <AlternativeType Name="CMapStringToOb::CAssoc"></AlternativeType>
    <AlternativeType Name="CMapStringToPtr::CAssoc"></AlternativeType>
    <AlternativeType Name="CMapStringToString::CAssoc"></AlternativeType>
    <AlternativeType Name="CMapWordToOb::CAssoc"></AlternativeType>
    <AlternativeType Name="CMapWordToPtr::CAssoc"></AlternativeType>
    <AlternativeType Name="CTypedPtrMap&lt;*,*,*&gt;::CAssoc"></AlternativeType>
    <DisplayString>{{key={key}, value={value}}}</DisplayString>
  </Type>

  <Type Name="CMap&lt;*,*,*,*&gt;">
    <AlternativeType Name="CMapPtrToWord"></AlternativeType>
    <AlternativeType Name="CMapPtrToPtr"></AlternativeType>
    <AlternativeType Name="CMapStringToOb"></AlternativeType>
    <AlternativeType Name="CMapStringToPtr"></AlternativeType>
    <AlternativeType Name="CMapStringToString"></AlternativeType>
    <AlternativeType Name="CMapWordToOb"></AlternativeType>
    <AlternativeType Name="CMapWordToPtr"></AlternativeType>
    <AlternativeType Name="CTypedPtrMap&lt;*,*,*&gt;"></AlternativeType>
    <DisplayString Condition="(m_nHashTableSize &gt;= 0 &amp;&amp; m_nHashTableSize &lt;= 65535">{{size={m_nHashTableSize}}}</DisplayString>
    <Expand>
      <Item Name="num bins">m_nHashTableSize</Item>
      <ArrayItems>
        <Size>m_nHashTableSize</Size>
        <ValuePointer>m_pHashTable</ValuePointer>
      </ArrayItems>
    </Expand>
  </Type>

  <Type Name="CMap&lt;*,*,*,*&gt;">
    <AlternativeType Name="CMapPtrToWord"></AlternativeType>
    <AlternativeType Name="CMapPtrToPtr"></AlternativeType>
    <AlternativeType Name="CMapStringToOb"></AlternativeType>
    <AlternativeType Name="CMapStringToPtr"></AlternativeType>
    <AlternativeType Name="CMapStringToString"></AlternativeType>
    <AlternativeType Name="CMapWordToOb"></AlternativeType>
    <AlternativeType Name="CMapWordToPtr"></AlternativeType>
    <AlternativeType Name="CTypedPtrMap&lt;*,*,*&gt;"></AlternativeType>
    <DisplayString>{Hash table too large!}</DisplayString>
  </Type>
  

  <Type Name="ATL::CAtlMap&lt;*,*,*,*&gt;">
    <AlternativeType Name="ATL::CMapToInterface&lt;*,*,*&gt;"/>
    <AlternativeType Name="ATL::CMapToAutoPtr&lt;*,*,*&gt;"/>
    <DisplayString>{{Count = {m_nElements}}}</DisplayString>
    <Expand>
      <Item Name="Count">m_nElements</Item>
      <ArrayItems>
        <Size>m_nBins</Size>
        <ValuePointer>m_ppBins</ValuePointer>
      </ArrayItems>
    </Expand>
  </Type>
  <Type Name="ATL::CAtlMap&lt;*,*,*,*&gt;::CNode">
    <DisplayString Condition="this==0">Empty bucket</DisplayString>
    <DisplayString Condition="this!=0">Hash table bucket</DisplayString>
  </Type>
</AutoVisualizer>

Visualizing Map is a bit tricky, and I didn’t take the time yet to look deep into it – but the file is hopefully useful as it is. To use, just save the text as, say, MfcContainers.natvis, either under %VSINSTALLDIR%\Common7\Packages\Debugger\Visualizers (requires admin access), or under %USERPROFILE%\My Documents\Visual Studio 2012\Visualizers\ .

NatStepFilter Files

– are the new and improved substitute for the NoStepInto registry key. While there are some online hints and traces, the natstepfilter spec is yet to be introduced into MSDN – or even the VC++ team blog. For now you can watch the format specification, along with some good comments, at the %VSINSTALLDIR%\Xml\Schemas\natstepfilter.xsd near you, or even better – inspect a small sample at %VSINSTALLDIR%\Common7\Packages\Debugger\Visualizers\default.natstepfilter.

The default.natstepfilter is implemented by Stephen T. Lavavej, and is very far from complete – both because of regex limitations and because of decisions not to set non-overridable limitations on users:

“Adding something to the default natstepfilter is a very aggressive move, because I don’t believe there’s an easy way for users to undo it (hacking the file requires admin access), and it may be surprising when the debugger just decides to skip stuff.”

I can think of several ways for users to override .natstepfilter directives (never mind stepping-into via assembly, how about setting a plain breakpoint it the function you wish to step into?) – and so I don’t agree with that decision. Still I hope the default rules would improve alongside the documentation. We mostly avoid STL, so I had no need to customize .natstepfilter’s yet – I’ll be sure to share such customizations if I do go there.

Caveat

Both improvements, natvis and natstepfilter files, do not work for debugging native/managed mixed code, which sadly renders them unusable for most of our code. While this behavior is documented – I would hardly say it is ‘by design’. It does seem to irritate many others, so there is hope – as Brad Sullivan writes that MS are-

“… working on making everything just work in a future release of Visual Studio.”

Posted in Debugging, VC++ | 3 Comments

VS2012 Migration #2: ‘Unspecified error’ Upon Solution Open

Solution: unbind and rebind all projects to source control. You can try and pin-point the offending projects, but why bother.

The sln file does change (new SCC binding fields were added) but is still workable from VS2010.

Posted in Visual Studio | 1 Comment

VS2012 Migration #1: “The Project ‘’ has been renamed’ and Other Errors on Build

About half a year ago I started experimenting with VS2012. It was not a smooth migration by any means, and I finally got around to recording online some of the lessons I learnt along the way.

One of the first things that greeted me was a slew of weird and uninformative error messages, whenever a build was tried: ‘The project ” has been renamed’ on sln build, ‘a build is already in progress’ on individual project build, and ‘Object reference not set to an instance of an object’ – well, just occasionally.

Others come across similar issues as well, but I believe more can be said than what was already noted. Bottom line, the issue is with native references to projects not contained in the current solution.

VS2010 did agree to try and build with such external references, and if a binary copy of them was available where VS2010 could find it – either the build would succeed or fail with surprising errors (e.g., due to debug/release version mismatch). VS2012 refuses to try and build in such a case, which is a very good thing. I only wish the error messages were more informative.

Moreover, VS2010 didn’t show the external references in the references tab. Not only does VS2012 show the missing projects, it (probably unintentionally) makes it easy to distinguish them by displaying them as the explicit path (as stored in the referencing vcxproj file), and not the referenced project name:

Thus, the solution to these error messages is to just scan the solution’s projects’ references , find the missing ones by looking for vcxproj suffixes, and either remove them (if they are redundant) or add them to the solution. If your solution is very large, it might be faster to isolate the offending projects by unloading part – say, half – of the projects at a time, and see if a build attempt still raises errors.

I really wish the reference UI (and dare I say, even the underlying mechanism) for native projects was as clear as for managed projects. But guess we’ll have to do with such hacks for now.

Posted in VC++ | Leave a comment

Entry Point Not Found, and other DLL Loading Problems

Occasionally I come across DLL load problems:

The verbosity of the error messages varies greatly. In their raw form these include at least the DLL name, but as various frameworks come into play (for the error message above, it’s .net) – native exceptions are caught and re-thrown, and more often than not helpful information is lost on the way.

Turns out there’s a built in way to get verbose windows-loader output: the Show Loader Snaps flag. The easiest way to mark it is with the GFlags utility, bundled with debugging tools for windows:

Under the hood, it merely adds a FLG_SHOW_LDR_SNAPS flag (0x00000002), to a DWORD value in the relevant IFEO registry key. This in turn causes Windows Loader to set the _ShowSnaps variable in the ntdll copy specific to the named process.

And now, behold the new and shiny loader trace (dumped to the debugger output window):

…    

2724:245c @ 11813487 – LdrpFindOrMapDll – RETURN: Status: 0x00000000

2724:245c @ 11813487 – LdrpLoadImportModule – RETURN: Status: 0x00000000

2724:245c @ 11813487 – LdrpLoadImportModule – RETURN: Status: 0x00000000

2724:245c @ 11813487 – LdrpLoadImportModule – RETURN: Status: 0x00000000

2724:245c @ 11813487 – LdrpSnapThunk – WARNING: Hint index 0x70a for procedure “?Revert@CStreamMemory@@UAGJXZ” in DLL “YaddaYadda.dll” is invalid

2724:245c @ 11813487 – LdrpSnapThunk – ERROR: Procedure “?Revert@CStreamMemory@@UAGJXZ” could not be located in DLL “YaddaYadda.dll”

First-chance exception at 0x77321d32 (ntdll.dll) in Strategist.exe: 0xC0000139: Entry Point Not Found.

Bam! There’s the offending DLL and the offending imported function, right there in the debugger.

Like many other useful features – it is documented, but very low on discoverability. Which is a fancy way of saying you can find it only if you already know exactly what you are looking for. I personally got around to it after digging around in ntdll assembly (just like Matt Pietrek, 14 years ago), trying to get to a string containing the name of an offending DLL.

The windows-copycat-opensource ReactOS source gives a nice view of the internal usage of this flag – called ShowSnaps in their source. The ‘snapping’ verb in this context refers to one of the actions performed by the loader: after rebasing the loaded DLL in the loading process memory space, the DLL’s exported function addresses are updated and must be copied to the importing process (or other dll) Import Address Table. This – in this context – is called snapping, and that’s where the extra tracing is hooked.

Posted in Debugging, Win32 | 18 Comments

Forcing Construction of Global Objects in Static Libraries

Suppose you have a global object whose constructor does useful stuff – say, registration somewhere or initialization of global resources. Suppose further this object isn’t directly accessed anywhere – you just need the functionality in its ctor. All is fine, until we add the last assumption: suppose this object lies in a static library. This seems to be a long lasting pain, ultimately arising from the old (‘broken’? let’s just say ‘outdated’) C++ compiler-linker model. The way the linker works is by repeatedly searching for implementations of yet-unresolved referenced symbols, and including only the obj files with such implementations – thereby dropping entirely obj files with no external references, such as the one containing the global object whose ctor you need to run. To make things concrete, take the following toy example:


//main.cpp
#include &lt;tchar.h&gt;

int _tmain(int argc, _TCHAR* argv[])
{
	return 0;
}

//GlobalInLib.cpp – compile as static lib
#include &lt;stdio.h&gt;
#include &lt;tchar.h&gt;

struct UsefulCtor
{
	UsefulCtor()  { _tprintf(_T("ThereIsNoSpoon")); }
};

UsefulCtor MyGlobalObj;

Under normal linkage, MyGlobalObj would be ignored. You can verify this either by putting a breakpoint in its constructor and see that it is never hit, or inspecting the output console window and see that it is empty. <Aside> An interesting discussion arose a while ago in MS forums on whether this behavior violates the standard. Here, einros writes:

The C++ standard, section 3.7.1, specifies: “If an object of static storage duration has initialization or a destructor with side effects, it shall not be eliminated even if it appears to be unused, […]”

But MS’ Holder Grund clarifies –

[Your quote of the standard] only holds if the corresponding translation unit is part of the program. In my definition and the one of at least four major toolchain implementators, it is not.

</Aside> Enter ‘Use Library Dependency Inputs’.

This arcane combo box in the project references dialog has the sole documented effect of enabling incremental linking for static libs, but the interesting part is how it does it:

When this property is set to Yes, the project system links in the .obj files for .libs produced by dependent projects, thus enabling incremental linking.

And indeed, setting this option to True causes construction of MyGlobalObj in the example above. Turns out you can force construction of globals in static libs after all.


Addendum: Only after writing this post did I come across this excellent 2005->2012 thread, which mentions this setting as a solution. Still, this effect of the linker is all but undocumented, and qualifies as deserving-more-web-presence.

Posted in VC++ | 4 Comments

Quick Word to Fellow Hebrew Speaking Devs

It’s official – here’s some reference.

(R2L in the editor is admittedly a bitch, though.)

Posted in VC++ | Leave a comment