x86/x64 Library Numerical differences

There are many online sets of examples of 64 bit migration pitfalls, but I recently came across two that that appear not to be mentioned elsewhere.

First, downright compiler bugs.  We still have those and some raise their head only in 64.  (btw – my sincere apologies to Eric Brumer for venting out over him like that. He is not to blame for MS’s infuriating support policy).

Second and more importantly, different implementations of library math functions!

Here are two quick example results from VC++ 2013:

cos(0.37034934158424915),   on 32 gives 0.932200961410311 61, on 64 gives 0.93220096141031150.

cos(0.81476855148534799),   on 32 gives 0.686036806093662 47, on 64 gives 0.68603680609366235.

(on both cases, 32 was actually closer to the accurate results – but that’s probably a coincidence).

This is not the same as the compiler making different decisions on different platforms: the implementations of trigonometric functions were hand-crafted in assembly (at least in 32 bit), and each CRT version knowingly takes different code paths, based on exact platform and architecture (sometime based on run-time processor inspection).

These two examples are the bottom line of a several-day tedious debugging session.  This seemingly negligible difference manifested itself as a ~0.5% difference between results of a numerical optimization routine, in 32 and 64 bit VC++.

While not strictly a bug, this behaviour does make me uncomfortable in several aspects.

(1) Judging by some traces I compared during debugging, on ~95% of cases transcendental functions coincide exactly (to the last digit) on 32 and 64. Which makes one assume they were aiming for binary compatibility, and wonder whether the 5% difference is intentional.

(2) Stepping through the x64 implementation, it makes use of only vanilla SSE instructions, fully accessible to x86. There’s no technical reason limiting the implementations from coinciding.

(3) IEEE-754 had undergone a major overhaul in 2008, and the new version includes a much needed clause – still phrased as a recommendation. Quoting wikipedia:

…it recommends that language standards should provide a means to write reproducible programs (i.e., programs that will produce the same result in all implementations of a language), and describes what needs to be done to achieve reproducible results.

I was hoping /fp:precise would have such an effect, but apparently it doesn’t.  As far as I can say, today the only way of achieving such reproducibility is by hand crafting your own function implementations.

Or if, like me, you can live without the last digits of precision, you can just make do without them.  I now include the following code in every file that uses trig/inverse-trig functions.
[Edit: see a fix in a newer post.]
[Edit 2: Thanks to my colleague Nadav Ben Abir for the conversation that eventually led to this code.]

// TruncatedFuncs.h

#pragma once
#include <math.h>

inline double Truncate(double arg)
	__int64 &ResInt = reinterpret_cast<__int64&> (arg);
			ResInt &= 0xFFFFFFFFFFFFFFF8,  // set the final 3 bits to zero
			ResInt |= 4;   // estimate the middle of the truncated range
	double	&roundedRes = reinterpret_cast<double&> (ResInt);

	return roundedRes;

inline double MyCos(const double ang)
	return Truncate(cos(ang));

inline double MySin(const double ang)
	return Truncate(sin(ang));

inline double MyTan(const double ang)
	return Truncate(tan(ang));

inline double MyAcos(const double ang)
	return Truncate(acos(ang));

inline double MyAsin(const double ang)
	return Truncate(asin(ang));

inline double MyAtan2(const double y, const double x)
	return Truncate(atan2(y, x));

#define cos MyCos
#define sin MySin
#define tan MyTan
#define acos MyAcos
#define asin MyAsin
#define atan2 MyAtan2

This entry was posted in VC++. Bookmark the permalink.

3 Responses to x86/x64 Library Numerical differences

  1. Arieh Schneier says:

    I think you made a very small error in your function, the line that reads:
    ResInt |= 0x100; // estimate the middle of the truncated range
    Should either be:
    ResInt |= 0b100; // estimate the middle of the truncated range
    ResInt |= 0x4; // estimate the middle of the truncated range


  2. 64-bit programs and floating-point calculations: http://www.viva64.com/en/b/0074/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s