On _purecall and the Overhead(s) of Virtual Functions

A while ago a friend asked me whether pure virtual functions have higher overhead than regular virtual functions. At the time I answered that this cannot be – the pure/non-pure distinction is meaningful only at compile time, and non-existent at runtime. Only a (long) while later did I connect the dots, and understand what he meant than (sorry Mordy..)

Regular Overhead

Virtual calls are known to be more costly than calls that are resolved at compile time. Elan Ruskin measured ~50% difference – I measured a bit less, but the difference is certainly there. For functions that do real work the actual call overhead can be mostly neglected, but for functions that get called a lot in a scenario you’re struggling to optimize – you can get tangible results by eliminating virtual calls. It’s widely considered a good practice to use the added flexibility of virtual functions when you have a concrete reason and not just for the fun of it.

There are two reasons for the added call cost – main one being the CPU instruction-prefetch mechanism. A regular call is resolved at compile time into something like –

call 0xabcd1234

– which the instruction caching apparatus (a.k.a trace cache) easily resolves ahead of time, and can tell where to continue the fetching from. However, a virtual call is compiled into something like –

call eax

Faced with the impossible task of predicting the contents of eax dozens of instructions in advance, the trace cache just stalls.

The second potential extra cost is an extra dereferencing. The first DWORD_PTR of a c++ object (neglecting virtual inheritance) is a pointer to it’s virtual table – a table that is common to all instances of the same class. A call to a virtual function is resolved by first dereferencing that vtable pointer, and only then calling the function at a fixed offset from the vtable start. Maciej Sinilo tried to isolate this cost by comparing calls via explicit function pointers to calls via virtual functions, and turns out the difference is practically non-measurable. (BTW, I didn’t check but I suspect part of the reason is the compiler’s ability to resolve that dereference in compile time, in many situations).

Pure-Virtual Extra Overhead??

Well, not really. At least not directly. But the function ‘containers’ – classes that are used as interfaces – do come with some extra weight.

To put it short, an interface class by default has its own constructor and destructor – just like any other class. These are called just before/after a child class constructor/destructor – as you would expect in any hierarchy.

But wait… what? Why? If you didn’t provide an implementation for such a constructor, what can it possibly do?

What every compiler-generated constructor does: set up the class vtable pointer. In this case, it is a very short lived pointer – one that is immediately overwritten by the child ctor.

Take the following code:

class A
{
public:
	__declspec(noinline) A()
		{ _tprintf_s(_T(" A::A "));}
	virtual ~A()
		{ _tprintf_s(_T(" A::~A "));}

	virtual	void	f() = 0;
	virtual	void	g() = 0;
};

class B : public A
{
public:
	__declspec(noinline) B()
		{ _tprintf_s(_T(" B::B "));}
	virtual ~B()
		{ _tprintf_s(_T(" B::~B "));}

	virtual	void	f()
		{ _tprintf_s(_T(" B::f "));}
	virtual	void	g()
		{ _tprintf_s(_T(" B::g "));}
};

int _tmain(int argc, _TCHAR* argv[])
{
	B b;
	return 0;
}

The order of events in B’s construction is as follows:

(1) The child ctor, B::B() is called and immediatly calls the parent ctor A::A().

(2) The parent ctor A::A(), setd the object vfptr to point to the common A vtable –

xxx

(keep those xxx placeholders in mind – more on them soon). Watching the object state in VS at this point you see:

– emphasized are the instruction that populates the object vfptr, and the referenced vtable itself.

(3) B::B() continues, and modifies the object vfptr to point to the common B vtable:

B::f()

B::g()

and again a VS view:

If you’d call f() from ~~B::B()~~ A::A() [thanks Roman!], which is the sort of self-foot-shots c++ allows, you’d be using A’s vtable (which is the only one the object knows at that time), end up calling the nonexistent xxx and gloriously crash in runtime. Of course it’s never that direct, and it’s pretty much a consensus you should never call a class virtuals from its own ctor/dtor.

Why all that hassle??

Frankly, I don’t know. It seems C++ goes out of its way to build and tear apart abstract-parent vtables, than exposes them briefly only during child construction/destruction, and then the community expert unanimously recommend never to use them. Herb Sutter does give 3 scenarios where you might consider using them – I find none of them convincing, and generally consider this to be one of the C++ semantic warts.

So, can this extra weight be mitigated?

Yes – at least in MS-specific ways. The direct way is by adding a __declspec(novtable) modifier to the abstract interface declaration. If you can guarantee that the interface class would never need any constructors/destructors (which can be tricky at times), it would be more readable to use __interface instead.

Beyond the direct saving of the extra ctor/dtor work, a happy side effect of novtable is that it eliminates all references to the modified interface vtable. The linker is then able to remove it from the binary altogether – thereby reducing the binary size and providing some extra boost. (When applied to a lot of interfaces, this can get tangible results!)

Bonus – so what is xxx, really?

Google xxx and see for yourself. I’ll wait until you return. It’d probably be a while.

Ok – this probably deserved it’s own post, as there still appears to be some online confusion regarding it. Apparently the ‘= 0’ pure virtual syntax is leading some to believe that the xxx entries are truly zeros. In fact MSDN columnist Paul DiLascia wrote sometime in 2000 that –

…the compiler still generates a vtable all of whose entries are NULL and still generates code to initialize the vtable in the constructor or destructor for A.

That may actually have been true than (I’m not even sure of that), but certainly isn’t now.

xxx is the address of the CRT function _purecall, which is essentially a debugging hook. You can control xxx’s value directly by overloading purecall yourself, or alternatively use _set_purecall_handler to route into your own handler from within _purecall. You might consider doing so, e.g., to collect stack traces or minidumps in production code.

4 Responses to On _purecall and the Overhead(s) of Virtual Functions

Roman says:

June 25, 2010 at 10:25 pm

Looks like the title is a little misleading. If I got you correctly, there is no extra overhead to pure virtual functions (as opposed to regular virtual functions) – the extra overhead you are talking about is the one caused by having many “Interface” classes which are formed by using pure virtual functions. Well then, that’s expected :)

You mention that if a call to f() is made within B::B(), it might be problematic. I think you meant A::A(), as the dispatch within a constructor is indeed what you’d call static (vtable isn’t fully ready) and you’d get a pure virtual function call. like: http://cplusplus.co.il/2009/09/27/checking-file-signature/

Also, the xxx notation (purecall) is interesting. If you do give a body to a pure virtual function ( http://cplusplus.co.il/2009/08/22/pure-virtual-destructor/ ) you’d actually have a valid pointer there instead. One of the bugs in MS Outlook is a pure virtual function call (I’m sure you got that message at least once), guess they have set a handler for that purecall.

All in all, very interesting read. Thanks!

- Ofek Shilon says:
  
  June 26, 2010 at 2:38 pm
  
  Roman – thanks for the comment!
  You’re right about the A::A() typo, will be editet soon.
  Giving a body to a pure virtual would *not* give you a valid pointer in the class vtable – all it does is enable the function to be invoked statically. It also makes sense: marking a function as pure virtual instructs the compiler/linker not to assume it has a parent implementation, thereby forcing them to generate the class vtable with ‘_purecall’ in these slots. The implementation can be defined far away (code wise), and it is unreasonable to demand the linker to go back and revise vtables every time such an implementation is encountered.
  About the title being misleading – I just re-read and still think I was being accurate. There is, as it turns out, a hidden cost in using pure virtuals. It is incurred in class-instansiation time rather than in call time, but is nevertheless there. About this being expected – well, I did find it surprising that ctors of classes I never meant to instantiate (and thus explicitly declared their methods as pure virtual) are being called and are taking a (minor) performence toll. I dare hope future compilers would apply the novtable optimization automatically where it obviously generates identical semantics – which is the vast majority of cases.
  
brucedawson says:

December 4, 2013 at 7:30 pm

I would say that the main overhead to virtual functions is that they (usually) cannot be inlined. A simple accessor like “virtual int GetX() { return m_x; }” would be a single instruction if inlined, but as a virtual it requires fetching the v-table pointer, fetching an address from the table, and then indirectly calling to the function. Meanwhile the caller has to assume that all volatile registers are destroyed so in addition to the cost of the function being called there is also a cost in the caller function. The cost to call a simple accessor function could easily be 8-10 extra instructions in the caller, on top of the cache misses and code execution in the callee.

The costs of missed inlining are *highly* variable — maybe the function wouldn’t have been inlined if it hadn’t been variable.

- Ofek Shilon says:
  
  December 4, 2013 at 7:59 pm
  
  I agree. That is however well known and not the subject of the post. Careless *interfaces* do have an overhead beyond that, that is seldom considered – and that is the post topic.