In a previous post a truncation scheme was suggested, to circumvent x86/x64 differences in math library implementations:
#pragma once #include <math.h> inline double Truncate(double arg) { __int64 &ResInt = reinterpret_cast<__int64&> (arg); ResInt &= 0xFFFFFFFFFFFFFFF8, // set the final 3 bits to zero ResInt |= 4; // estimate the middle of the truncated range double &roundedRes = reinterpret_cast (ResInt); return roundedRes; } inline double MyCos(const double ang) { return Truncate(cos(ang)); } #define cos MyCos ...
Since then, I accumulated some mileage with the scheme and have come to understand that line 8:
... ResInt |= 4; // estimate the middle of the truncated range ...
-is flawed.
Since we drop the last 3 bits (8 ulps) of accuracy, it seemed like a good to fill the missing bits with the mid-value of the truncation range – thereby lowering the max truncation error from 7 ulps (truncating b111 to b000) to 4 ulps (truncating b000 to b100).
However, this lowers the average error only if you assume that inputs to these functions are uniformly distributed.
In other words, in real code you are far, far more likely to take cosine of 0 than cosine of 0.00000000000000003, so your average error would be better off if you hold back the |=4 sophistication, and just stick to the 000 suffix.
Even worse, in the wise-ass |=4 version, taking cosine of 0 would give a value slightly larger than 1, thereby potentially causing more subtle numerical difficulties than those saved.
All in all, currently my code uses the simple version:
#pragma once #include <math.h> inline double Truncate(double arg) { __int64 &ResInt = reinterpret_cast<__int64&> (arg); ResInt &= 0xFFFFFFFFFFFFFFF8; // set the final 3 bits to zero double &roundedRes = reinterpret_cast (ResInt); return roundedRes; } inline double MyCos(const double ang) { return Truncate(cos(ang)); } #define cos MyCos ...