We recently had some weird issues that turned out to emanate from a failure to allocate a large consecutive chunk of heap memory. (It was an exceptional pain to nail the cause there – maybe more on that in a future post). The desired allocation was to be ~400M, and since machines today ship more-or-less-by-default with 2G-4G RAM, there shouldn’t be a real justification for such allocations to fail. Or should there?
First of all, regardless of your available physical RAM, your real memory playground size is 2G – the bottom half of your process’ address space, its user-mode portion. Yes, I’m well aware of the /3GB boot.ini switch, and trust me – you don’t want to go there in a 3D application. I was badly burnt there already. PAE/AWE have downright hostile API sets too – you’d just have to do with 2G.
The real issue here is memory fragmentation.
An obvious solution would be migrating to Win64, and forgetting about fragmentation issues for the near century. Sadly, this was not a feasible option for us: we have a legacy stash of in-house 32-bit custom hardware drivers, and migrating those would be the absolute last resort.
Happily, a surprisingly short online research gave quite a few constructive 32-bit directions. Here are some.
- Low Fragmentation Heap is a nice built in feature, on by default since Vista. you should apply LFH to the CRT heap, retrieved by _get_heap_handle (just try the sample code). Even better – try applying to all process heaps. There should be no reason not to apply this to all projects, except (screeeeeeeeeeeeech..) it seems the magic doesn’t work on standard debug builds. Which, well, err, makes it kinda useless.
- HeapDecommitFreeBlockThreshold is a magical registry key that is advertised to make a noticeable difference. It does so by causing the heap to hold on to small allocations just a bit longer. Such increase of the HeapManager jurisdiction can potentially prevent page ‘theft’ for non-heap usage, thereby reducing some fragmentation factors.
- Typically a lot of fragmentation (at the 100Megs scale) is caused by sparse mapping of binary images to the process address space, at load time.
In simpler English, say your process uses forty 1-Meg dll, and maps them to memory in regular 50Meg intervals. They now sparsely occupy just 40Megs of your available 2G, leaving no consecutive memory chunk larger than 49M!
To counter that, first map your virtual address usage. Until recently you’d have to use either vadump or direct code instrumentation, but since this summer you have the incredible (as always) SysInternals tool VMMap. When you spot some dll’s that are just teasingly smiling at you from the middle of your address space, use editbin.exe to ruthlessly rebase them away.
- Pre-designate a large heap (say 500M) at link time, thus giving the heap a head start in the race for consecutive pages.
I decided to try the steps in order of increasing effort, and am overjoyed to say (2) & (4) sufficed. We now successfully allocate 400M chunks.
We did peek into the process with VMMap, though, and it did surface some interesting finds. For one, babylon translator, installed on all our development machines, has the HUTZPA to inject captlib.dll into the very middle of our precious address space.
My hunch says rebasing could indeed hold the highest impact. We may have to try that too eventually – I hope to post with some findings.