When Approximations Are More Accurate And Better Performing. Part 1

So we have been talking a lot recently about approximation approaches and what we can do with it, measuring error and some horrible template programming to support this all seamlessly in C++.

This is the post where we show why.
This post will show you how to generate the vertex positions on a circle, sphere, some curved path with greater accuracy and with lower performance cost than the standard algorithm - without breaking any specifications or altering the behavior of the compiler.

To begin with, we want to keep the problem simple as a proof of concept. So we will be generating a semi-circle. We want to be able to compare the different hardware supported precisions available to the programmer so we will use templates to reduce any human error. We want to see how the error scale based on the radius and position of the circle so we will make those parameters too. We also want to control the number of points being generated to increase or lower the precision.
That gives us this accurate implementation:

template <typename T, int numVerts>
std::array<std::pair<float, float>, numVerts> GenerateCircleVertices_A(T _radius, std::pair<T, T> _offset)
{
	std::array<std::pair<float, float>, numVerts> points;

	T theta = 0.f;
	T thetaStep = (T)(PI) / (T)numVerts;
	for (int i = 0; i < numVerts; i++)
	{
		T x = (_radius * cos(theta)) + _offset.first;
		T y = (_radius * sin(theta)) + _offset.second;
		points[i] = std::make_pair(x, y);
		theta += thetaStep;
	}

	return points;
}

Nothing too flashy there. Calculating the positions of the points for every point and offsetting them.

Next we want to write the approximated version. This will take all the same parameters but also take two approximations of the 'sin' and 'cos' functions

template <typename T, int numVerts, T(*approxSin)(T), T(*approxCos)(T)>
std::array<std::pair<float, float>, numVerts> GenerateCircleVertices_A(T _radius, std::pair<T, T> _offset)
{
	std::array<std::pair<float, float>, numVerts> points;

	T theta = 0.f;
	T thetaStep = (T)(PI) / (T)numVerts;
	for (int i = 0; i < numVerts; i++)
	{
		T x = (_radius * approxCos(theta)) + _offset.first;
		T y = (_radius * approxSin(theta)) + _offset.second;

		points[i] = std::make_pair(x, y);

		theta += thetaStep;
	}
	return points;
}

We have templated the types of the functions that can be passed in so that in our approximation functions we can work at a higher precision. However, you will notice that in our approximation we are still using the same type for the output positions, if we did not do this then would be introducing precision at the cost of memory which is not what we want to demo here and would change the shape of the function the user is expecting to call.

So what functions are we going to submit to replace the Sin/Cos in the algorithm?
In this example we have implemented some simple curve fit equations with Chebychev economisation to minimise error over the total range of data we care about. In this instance that is over the range of inputs 0 to PI.

Here is an example of the templated output for 'cos()' in the range 0-PI. Pretty hideious. But it works.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
template<typename T>
static constexpr T cos_approx_10(const T _x) noexcept
{
	return (( 1.00000000922939569214520361128961667     ) + 
		(-0.00000040038351959167077379411758820*(_x)) + 
		(-0.49999581902513079434413612034404650*(_x* _x)) + 
		(-0.00001868808381795402666814172321086*(_x* _x * _x)) + 
		( 0.04171125229782790544419412981369533*(_x* _x * _x * _x)) + 
		(-0.00006331374243904154216523727516375*(_x* _x * _x * _x * _x)) +
		(-0.00133230596002621454881920115553839*(_x* _x * _x * _x * _x * _x)) + 
		(-0.00003250491185282628451994405005543*(_x* _x * _x * _x * _x * _x * _x)) + 
		( 0.00003666795841889910768365487547804*(_x* _x * _x * _x * _x * _x * _x * _x)) + 
		(-0.00000258872188337465851184506469840*(_x* _x * _x * _x * _x * _x * _x * _x * _x)) + 
		(-0.00000000060839243653413992793179150*(_x* _x * _x * _x * _x * _x * _x * _x * _x * _x)));
}

To verify our approximation functions we output them at different levels of accuracy and test the total accuracy and performance. For the functions generated for this test here is the results from our back-end testing where we test our automated generated tables as well as function replacement.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
                          Name         Mean Err.       Median Err.        Total Err.          Runtime
                 Source Function       2.66282e-08        2.2331e-08       2.72673e-05               171
          CompileTimeTable_Float       3.17287e-07       3.08806e-07       0.000324902               106
     CompileTimeTable_LongDouble       3.15743e-07       3.11381e-07       0.000323321                80
              RunTimeTable_Float       6.48356e-08       3.79084e-08       6.63917e-05               101
         RunTimeTable_LongDouble       3.95529e-08                 0       4.05022e-05                79

      functionPoly_float_Order10       9.00522e-08       5.42778e-08       9.22134e-05                11
     functionPoly_double_Order10       1.78518e-09       1.87474e-09       1.82802e-06                11
 functionPoly_longdouble_Order10       1.78518e-09       1.87474e-09       1.82802e-06                11
       functionPoly_float_Order9       7.69145e-08       4.88219e-08       7.87604e-05                10
      functionPoly_double_Order9       1.78519e-09       1.87499e-09       1.82804e-06                10
  functionPoly_longdouble_Order9       1.78519e-09       1.87499e-09       1.82804e-06                10
       functionPoly_float_Order8       3.26524e-07       3.18897e-07        0.00033436                11
      functionPoly_double_Order8       3.14699e-07       3.29983e-07       0.000322252                11
  functionPoly_longdouble_Order8       3.14699e-07       3.29983e-07       0.000322252                10
       functionPoly_float_Order7        3.2993e-07       3.21493e-07       0.000337849                 9
      functionPoly_double_Order7       3.14701e-07       3.29942e-07       0.000322254                 9
  functionPoly_longdouble_Order7       3.14701e-07       3.29942e-07       0.000322254                 9
       functionPoly_float_Order6       3.61088e-05        3.7886e-05         0.0369754                 6
      functionPoly_double_Order6       3.61126e-05       3.78765e-05         0.0369793                 7
  functionPoly_longdouble_Order6       3.61126e-05       3.78765e-05         0.0369793                 6
       functionPoly_float_Order5       3.61177e-05        3.7886e-05         0.0369845                 6
      functionPoly_double_Order5       3.61127e-05       3.78748e-05         0.0369794                 5
  functionPoly_longdouble_Order5       3.61127e-05       3.78748e-05         0.0369794                 5
       functionPoly_float_Order4        0.00239204        0.00250672           2.44945                 5
      functionPoly_double_Order4        0.00239204        0.00250679           2.44945                 5
  functionPoly_longdouble_Order4        0.00239204        0.00250679           2.44945                 5
       functionPoly_float_Order3        0.00239204        0.00250686           2.44945                 4
      functionPoly_double_Order3        0.00239204        0.00250686           2.44945                 5
  functionPoly_longdouble_Order3        0.00239204        0.00250686           2.44945                 4

The first item in this list is the floating point precision implementation of 'cos', the next four lines are our different table generated results (See this post on how that works. Everything after the line break is our generated function replacements that we care about in this example.

You will see that the total error across the range represented in the "Total Err." column. This is the summed total difference between each sampled position when compared against the highest precision of the function we are replacing. So we can see that compared to the "long double" implementation of 'cos' the floating-point implementation incurs a total error of '2.72673e-05' in this range for a mean error of '2.66282e-08' at each sample point.

Where this becomes interesting is that our implementation at floating-point precision has quite close error - but our double and long double implementations have less total error. But a higher-precision having less error is probably not a surprise - what is a surprise is that our higher-precision implementation takes 1/16th of the time of the floating-point function we want to replace.
To put it simply - we have a function which has less error and less computational cost. Although our function does come with an additional near insignificant cost of a cast from double back to float.

In our initial implementation of this we expected the lower computational cost but the lower error was a surprise. It shouldn't have been though. As we have shown in the past error can be expressed in type and in function. Our example is increasing the error in the function but lowering the error in the type. So for any trivial function which has large number of errors from the type (such as anything which performs summed operations on floating-point types) we should in theory be able to beat it on error if the function is simply mappable at any precision and we do so in fewer operations.

So what happens when we apply this to our circle plotting problem?

We see that for relatively small offsets from zero we don't get much difference in the error of the two approaches but as the radius of the circle approaches numbers just a short distance from the origin we see that our approximation is handling it much better - leading to a much better overall accuracy.

Better Accuracy. Better Performance. In a real-world problem.

We are currently further exploring this approach with DSP processing, video media processing and computer graphics.

Source code for a lot of this stuff is available on my Github.