When Approximations Are More Accurate And Better Performing. Part 2

So if you read the last part of this work you might be wondering how was the error less in the approximated solution than the proper implementation?

This comes down to simply exploiting the rules and expectations of the user. The user has selected to use 'float' as the base accuracy for the circle. This is usually done for performance reasons. Sin/Cos in float is cheaper than Sin/Cos in double. 

From this we have some base rules on which to build our case. We know the total error only has to be less than the floating point implementation and we know our approximation has to reduce the cost.

So we need a way to measure the error, and a way to measure the run-time performance so we can compare.
Run-time performance is easy, we will simply run the algorithm and see how long it takes.
Error is a little more tricky. In this example we defined the error as the sum of the distance of the resulting vertex from where the most accurate implementation would place it. Ideally though, to build an approximation we want to understand where the error is coming from so we know where to change.

So let's look at the implementation again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
std::array<std::pair<float, float>, numVerts> points;

T theta = 0.f;
T thetaStep = (T)(PI) / (T)numVerts;
for (int i = 0; i < numVerts; i++)
{
	T x = (_radius * cos(theta)) + _offset.first;
	T y = (_radius * sin(theta)) + _offset.second;
	points[i] = std::make_pair(x, y);
	theta += thetaStep;
}

Where could the error be introduced here? I think it would be better if we split it into individual lines per operation.
For just calculating a single 'x' position:

1
2
3
4
    T theta += thetaStep;
    T cosRes= cos(theta);
    T radiusScaled = cosRes * _radius;
    T x = radiusScaled + _offset.first;

We have four operations taking place, one on each line. Let's see how the error is induced for each line step.

  1. This step introduces error as thetaStep is stored at a fixed precision and is the product of the division of PI into the number of chunks needed to represent each step. As the number of steps gets arbitrarily large then the size of theta step gets arbitrarily smaller. This gives two problems, firstly each subsequent step will be less accurate than the last as the position on the circle is traced out, and secondly if this number is small enough that is within the rounding bounds of a floating point step then the addition to 'theta' could add 0 leading us to be unable to trace the circle or it could add more than it should leading to further inaccuracy.
  2. Depending on the type of theta this will change the version of the function 'cos' that is called. By the cmath standard these functions are all guaranteed accurate to the unit in last place (ULP) for the type that it is working on. In this case we induce error into cosRes as any precision of real number cannot represent the infinite number of unique values between -1 and 1. Luckily, because of the standard we know that this will be on the best possible result though and the error should be relatively small. As 'cosRes' and theta share the same type there should be no rounding or error incurred by the value copy.
  3. In this step we are multiplying two real numbers together. This should result in an arbitrary error as the resulting value is stepped down from the high precision floating point hardware back to the representation we are using. So we should get +/- the floating point step in the scale of the result.
  4. This is the same as step 3.

Steps 2-4 are then repeated for the y component and both are cast to float to store in the vertex list. Quite a lot of potential error - depending on the type!

 Represented in my pigeon math

Represented in my pigeon math

How does this differ from our approximation? Where we simply replace the calls to sin/cos?

approxerror.png

Not very much at all when we are considering the same type! And with the additional error from multiplication and addition which aren't guaranteed to be minimal error for the type like the sin/cos from the cmath library are.

So, when is it that we can use the approximation?
When it follows these two rules:

rules.png

When these two rules hold true then we can replace the function without worrying.

So, for our example of better performance and accuracy we ensure that for 'double' type in the approximation function and 'float' type in the accurate implementation match both rules and we are good to go.

So that gives us this table of results for the built in standard real number types:

table.png

So there really is only a limited area where we meet both of these rules, but it could be a beneificial one.

 

 

When Approximations Are More Accurate And Better Performing. Part 1

So we have been talking a lot recently about approximation approaches and what we can do with it, measuring error and some horrible template programming to support this all seamlessly in C++.

This is the post where we show why.
This post will show you how to generate the vertex positions on a circle, sphere, some curved path with greater accuracy and with lower performance cost than the standard algorithm - without breaking any specifications or altering the behavior of the compiler.

To begin with, we want to keep the problem simple as a proof of concept. So we will be generating a semi-circle. We want to be able to compare the different hardware supported precisions available to the programmer so we will use templates to reduce any human error. We want to see how the error scale based on the radius and position of the circle so we will make those parameters too. We also want to control the number of points being generated to increase or lower the precision.
That gives us this accurate implementation:

template <typename T, int numVerts>
std::array<std::pair<float, float>, numVerts> GenerateCircleVertices_A(T _radius, std::pair<T, T> _offset)
{
	std::array<std::pair<float, float>, numVerts> points;

	T theta = 0.f;
	T thetaStep = (T)(PI) / (T)numVerts;
	for (int i = 0; i < numVerts; i++)
	{
		T x = (_radius * cos(theta)) + _offset.first;
		T y = (_radius * sin(theta)) + _offset.second;
		points[i] = std::make_pair(x, y);
		theta += thetaStep;
	}

	return points;
}

Nothing too flashy there. Calculating the positions of the points for every point and offsetting them.

Next we want to write the approximated version. This will take all the same parameters but also take two approximations of the 'sin' and 'cos' functions

template <typename T, int numVerts, T(*approxSin)(T), T(*approxCos)(T)>
std::array<std::pair<float, float>, numVerts> GenerateCircleVertices_A(T _radius, std::pair<T, T> _offset)
{
	std::array<std::pair<float, float>, numVerts> points;

	T theta = 0.f;
	T thetaStep = (T)(PI) / (T)numVerts;
	for (int i = 0; i < numVerts; i++)
	{
		T x = (_radius * approxCos(theta)) + _offset.first;
		T y = (_radius * approxSin(theta)) + _offset.second;

		points[i] = std::make_pair(x, y);

		theta += thetaStep;
	}
	return points;
}

We have templated the types of the functions that can be passed in so that in our approximation functions we can work at a higher precision. However, you will notice that in our approximation we are still using the same type for the output positions, if we did not do this then would be introducing precision at the cost of memory which is not what we want to demo here and would change the shape of the function the user is expecting to call.

So what functions are we going to submit to replace the Sin/Cos in the algorithm?
In this example we have implemented some simple curve fit equations with Chebychev economisation to minimise error over the total range of data we care about. In this instance that is over the range of inputs 0 to PI.

Here is an example of the templated output for 'cos()' in the range 0-PI. Pretty hideious. But it works.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
template<typename T>
static constexpr T cos_approx_10(const T _x) noexcept
{
	return (( 1.00000000922939569214520361128961667     ) + 
		(-0.00000040038351959167077379411758820*(_x)) + 
		(-0.49999581902513079434413612034404650*(_x* _x)) + 
		(-0.00001868808381795402666814172321086*(_x* _x * _x)) + 
		( 0.04171125229782790544419412981369533*(_x* _x * _x * _x)) + 
		(-0.00006331374243904154216523727516375*(_x* _x * _x * _x * _x)) +
		(-0.00133230596002621454881920115553839*(_x* _x * _x * _x * _x * _x)) + 
		(-0.00003250491185282628451994405005543*(_x* _x * _x * _x * _x * _x * _x)) + 
		( 0.00003666795841889910768365487547804*(_x* _x * _x * _x * _x * _x * _x * _x)) + 
		(-0.00000258872188337465851184506469840*(_x* _x * _x * _x * _x * _x * _x * _x * _x)) + 
		(-0.00000000060839243653413992793179150*(_x* _x * _x * _x * _x * _x * _x * _x * _x * _x)));
}

To verify our approximation functions we output them at different levels of accuracy and test the total accuracy and performance. For the functions generated for this test here is the results from our back-end testing where we test our automated generated tables as well as function replacement.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
                          Name         Mean Err.       Median Err.        Total Err.          Runtime
                 Source Function       2.66282e-08        2.2331e-08       2.72673e-05               171
          CompileTimeTable_Float       3.17287e-07       3.08806e-07       0.000324902               106
     CompileTimeTable_LongDouble       3.15743e-07       3.11381e-07       0.000323321                80
              RunTimeTable_Float       6.48356e-08       3.79084e-08       6.63917e-05               101
         RunTimeTable_LongDouble       3.95529e-08                 0       4.05022e-05                79

      functionPoly_float_Order10       9.00522e-08       5.42778e-08       9.22134e-05                11
     functionPoly_double_Order10       1.78518e-09       1.87474e-09       1.82802e-06                11
 functionPoly_longdouble_Order10       1.78518e-09       1.87474e-09       1.82802e-06                11
       functionPoly_float_Order9       7.69145e-08       4.88219e-08       7.87604e-05                10
      functionPoly_double_Order9       1.78519e-09       1.87499e-09       1.82804e-06                10
  functionPoly_longdouble_Order9       1.78519e-09       1.87499e-09       1.82804e-06                10
       functionPoly_float_Order8       3.26524e-07       3.18897e-07        0.00033436                11
      functionPoly_double_Order8       3.14699e-07       3.29983e-07       0.000322252                11
  functionPoly_longdouble_Order8       3.14699e-07       3.29983e-07       0.000322252                10
       functionPoly_float_Order7        3.2993e-07       3.21493e-07       0.000337849                 9
      functionPoly_double_Order7       3.14701e-07       3.29942e-07       0.000322254                 9
  functionPoly_longdouble_Order7       3.14701e-07       3.29942e-07       0.000322254                 9
       functionPoly_float_Order6       3.61088e-05        3.7886e-05         0.0369754                 6
      functionPoly_double_Order6       3.61126e-05       3.78765e-05         0.0369793                 7
  functionPoly_longdouble_Order6       3.61126e-05       3.78765e-05         0.0369793                 6
       functionPoly_float_Order5       3.61177e-05        3.7886e-05         0.0369845                 6
      functionPoly_double_Order5       3.61127e-05       3.78748e-05         0.0369794                 5
  functionPoly_longdouble_Order5       3.61127e-05       3.78748e-05         0.0369794                 5
       functionPoly_float_Order4        0.00239204        0.00250672           2.44945                 5
      functionPoly_double_Order4        0.00239204        0.00250679           2.44945                 5
  functionPoly_longdouble_Order4        0.00239204        0.00250679           2.44945                 5
       functionPoly_float_Order3        0.00239204        0.00250686           2.44945                 4
      functionPoly_double_Order3        0.00239204        0.00250686           2.44945                 5
  functionPoly_longdouble_Order3        0.00239204        0.00250686           2.44945                 4

The first item in this list is the floating point precision implementation of 'cos', the next four lines are our different table generated results (See this post on how that works. Everything after the line break is our generated function replacements that we care about in this example.

You will see that the total error across the range represented in the "Total Err." column. This is the summed total difference between each sampled position when compared against the highest precision of the function we are replacing. So we can see that compared to the "long double" implementation of 'cos' the floating-point implementation incurs a total error of '2.72673e-05' in this range for a mean error of '2.66282e-08' at each sample point.

Where this becomes interesting is that our implementation at floating-point precision has quite close error - but our double and long double implementations have less total error. But a higher-precision having less error is probably not a surprise - what is a surprise is that our higher-precision implementation takes 1/16th of the time of the floating-point function we want to replace.
To put it simply - we have a function which has less error and less computational cost. Although our function does come with an additional near insignificant cost of a cast from double back to float.

In our initial implementation of this we expected the lower computational cost but the lower error was a surprise. It shouldn't have been though. As we have shown in the past error can be expressed in type and in function. Our example is increasing the error in the function but lowering the error in the type. So for any trivial function which has large number of errors from the type (such as anything which performs summed operations on floating-point types) we should in theory be able to beat it on error if the function is simply mappable at any precision and we do so in fewer operations.

So what happens when we apply this to our circle plotting problem?

We see that for relatively small offsets from zero we don't get much difference in the error of the two approaches but as the radius of the circle approaches numbers just a short distance from the origin we see that our approximation is handling it much better - leading to a much better overall accuracy.

Better Accuracy. Better Performance. In a real-world problem.

We are currently further exploring this approach with DSP processing, video media processing and computer graphics.

Source code for a lot of this stuff is available on my Github.

UNDERSTANDING ERROR AND APPROXIMATION: 7. Relaxation and Regularisation

So far in our approximation approaches we have been considering generating functions which match the input and out parameters of the function we want to optimise. In this post we will be considering two techniques which do not so directly follow this approach.

Regularisation
Is generally considered in mathematical modelling when we are looking to prevent overfitting or to solve the problem of modelling when the function is not a well-posed problem. In this context an ill-posed function is one that either: 

  • Has no solution
  • The solution is not unique
  • or, the solutions behaviour doesn't change only on initial conditions.

When we are looking for an approximate function we are hoping that the function is ill-posed as we would ideally want the function solution to not be unique so we can find a better one!

In other articles we discussed the range of the function we are interested in, as well as the accuracy for each point in this range. In this section we are asking if there are discete points in the range that we care about more than others. For example, you might have some function in your application which takes some real number and returns it expensively mathematically altered in some fixed way - but you only call it in your application for the values of 0 or 1. In this instance we probably don't want to have the costly the calculation if the answer is either the result of f(0) or f(1). It might be cheaper to have a function which maps to those two results directly. 

In this case for the case of only caring about those two results we can generate a "regularised" function that matches those requirements exactly.

These regularised functions can often be cheaper than the actual function as they only have to care about a specific set of points. 

Relaxation
The next topic in the same vein is relaxation. Relaxation in the mathematical sense, more formally called "Integer Linear Programming Relaxation" is a technique for taking a program which has only integer solutions and allowing them to be expressed as real numbers to simplify the problem and then rounded back to integer numbers. This is known for its ability to take NP complexity programs and take them to P complexity (if that is of interest).

When we look at how this relaxation changes a problem we can see that it has the possibility to expand the total problem domain (by a range dependent on your rounding choice) and may not actually give the correct solution due to some constraints. Full details on the Mathematics and proofs for this can be found around Wikipedia.

For our use of it, we are looking at it for its ability to take a function given discrete results and model it as a continuous function. This allows us to approach some integer problems which are technically "infeasible" in an approximate manner - however, never optimally.

We can see approximate algorithms solving problems through this method if you look up the "Weighted Vertex Cover Problem" or similar shortest path or optimal distribution problems.

The book "Algorithms to Live By" contains some very good examples and explanations in the chapter covering this type of relaxation and lagrangian 

 

 

 

 

UNDERSTANDING ERROR AND APPROXIMATION: 6. Continuous Approximation

In the last few posts we have covered ways to measure the error and bounds in different functions and how that effects how we view them when coming to approximate them. A lot of what we have been discussing has been in the area of "continuous functions". A continuous function is one where the answers for neighbouring inputs flow into one another smoothly (or at least in a predictable fashion). For example, we know for the function "x=y" that the values between 'x=1' and 'x=2' will be smoothly interpolated between 1 and 2.

If this wasn't the case, if the the function was discrete and only gave results on integer values then when we samples the function at 'x=1.5' there may not be a valid result and any result would be an error. Or the function could have discontinuous periods around this area where the results are totally unrelated to the surrounding results.

This identification of continuous and discrete results make it an important factor in understanding the function want to replace and its behavior.

If a graph is continuous then a common numerical approximation method would be to generate a 2 or 3D polynomial Taylor expansion to represent the curve. (See examples of how this is done here). This gives us a curve which matches the polynomial across certain ranges under certain conditions. 

graph2.png

Shown above is the continuous function sin(x) with different orders of Taylor series approximating it.

Here is the graph of 'tan(x)'. In this example we cannot approximate the whole range of 0 to 2PI as there are discontinuities every 'PI' distance in x. To correctly approximate this curve we would need to split the curve into discrete sections of range PI and calculate from that. Essentially splitting a discontinuous function into n continuous chunks. In the case of tan(x) each chunk is a repeat of the last, so it is simply re-framing that needs to be done. But for more complex functions this can vary.

You may notice in the taylor series example that our approximation in the lower orders quickly diverge. This happens as values get further away from the central reference point we used to build the series. For some complex function you may want to chop the function into chunks to get better precision across certain ranges. This is a good thing to do when we only care about the values being directly output, but we have to be aware of the effect that has at the boundaries between curves.

If we take a look at the differential for the curve the discontinuities as we switch from one curve to another which were previously near invisible will become clearly obvious. This analysis of the gradient changes at these points is important as some uses of the results of the function may rely on them and in that case the resultant behaviour may be drastically different than what we were replacing.

This is where we need to express that even though the numerical error is low in the direct results, the actual use-case has large error. At the end of the day, the error we really care about is how it effects the end result!

 

UNDERSTANDING ERROR AND APPROXIMATION: 5. Radius of Convergence

One of the main factors when you are looking to approximate a function is understanding what part of the function you are approximating. Approximation is inherently a trade-off, so when we are approximating a function we may want to only approximate a certain part of it, or may want to approximate one section of the input to a higher accuracy than another. 

But, before we can make any decisions on any of this we have to understand the full features of the function we want to approximate and a major feature of a lot of functions falls into the category of the "Radius of Convergence".

The most simple way to understand the "Radius of Convergence" is to consider y = ∑0.5x from 'x = 1' to 'x = infinity'. At x gets larger the result being added to the sum decreases. This causes the function to converge at y=1.

So if we were going to approximate this function by sampling it for various values of x, it would be quite wasteful for us to sample past x=10 as the function has fully converged then. This gives this function a radius of convergence of y= 1.9990. (This is quite fun to play with on WolframAlpha)

For a function which has convergence point (or points) it is important that we understand it so that we can increase the value of each point we sample and use in our own function generation. This gives us bounds and simplifies the work we have to do. Similar to how we can simplify intractable algorithms with priors, we can use this information to form our own "priors" in our generated functions.

 

 

UNDERSTANDING ERROR AND APPROXIMATION: 4. Math Max Error (Unit in Last Place)

This post is focused on the representation of real numbers in our application. Here we are looking at how they are represented as a finite number of bits and what considerations we must make in our application to be able to correctly predict the behavior to use them effectively.

Real Number Representation
As you probably already know, if you are reading this, types such as float and double are represented. MDSN gives a very concise description of how they are represented on their page about float type:

Floating-point numbers use the IEEE (Institute of Electrical and Electronics Engineers) format. Single-precision values with float type have 4 bytes, consisting of a sign bit, an 8-bit excess-127 binary exponent, and a 23-bit mantissa. The mantissa represents a number between 1.0 and 2.0. Since the high-order bit of the mantissa is always 1, it is not stored in the number. This representation gives a range of approximately 3.4E–38 to 3.4E+38 for type float.

The IEEE format they are talking about is IEEE 754 which covers in much greater detail the rules and behavior of float types but is a pretty heavy read (trust me, it's OK to just read the plot synopsis on this book).

The MSDN page misses the equation to calculate a number from the representation it describes. That equation is:

floatdesc.png

With this representation we can see how numbers are encoded like this.

With this binary view in mind it is very visible that there is only a finite number of bits to switch and due to the nature of the exponential some of those switches will have varying degrees of change based on how large the number currently is.

This leads us to the main point of this article...

Unit in Last Place
So when we are thinking about representing numbers and errors that creep into calculations we have to consider what the size of the distance between the current number being represented and the next number being represented is. In single-precision floating point (shown above) it is capable of being accurate to very very small fractions of numbers in the range 0-1 but then becomes increasing less accurate as the size of the number increases due to the lack of sufficient bits to represent it. This means that when working in the millions or tens of millions we can lose the fractional part of the number entirely!

This small error can be used with the equations shown in the last post on error propagation to show the cumulative error and accuracy loss as a function involving lots of floating point numbers progresses. This is an important factor to consider when dealing with long numerical functions.

Usage
So when we come to use floating point numbers in large calculations we can calculate how much error we expect to accumulate through the numbers being truncated to fit in the number of bits provided and the rounding of the numbers to the nearest representable floating point number.  This error must be of an acceptable level for the function we are trying to write, otherwise we may need to turn to alternative algorithms or more precise data types to represent the data. 

Understanding Error and Approximation: 3. Error Propagation

When dealing with error we not only need to know how to measure it, but also how what the error of a value that is produced by performing an operation on two values with differing levels of error.

For example, if we have a variable A which has an error of +/- 5 and a variable B which has an error of +/- 3, how would we calculate the accuracy of the result of A+B?

It turns out this can be solved by the 'Rules for Error Propagation'. These rules define how the combination of values of different uncertainty interact to produce a new value of uncertainty.

 The main 'Rules of Error Propagation'.

The main 'Rules of Error Propagation'.

The equations we are given to calculate the solution to our problem is quite simple. We simply place our values of δA (+/- 5) and δB (+/- 3) into the first equation (as we are doing addition, 'A + B'). 

errorpropogation.png

This results in our new value having a potential error of 5.83 which as we expected is greater than either of the inputs as error in this situations can only grow due to missing information.

So we can see that with this method of approaching error it is relatively simple to encode this into your software to analyse values as they are passed through functions.

Understanding Error and Approximation: 2. Absolute and Relative Error

The first in our guide to error and approximation in computing is starting with absolute and relative error.

Error in measurement or a calculation can be easily thought of as a simple thing, but when we are trying to be precise in our description of error and how it effects the validity of our results we need to consider the context of the error that we are talking about. To do this we need to define the error.

Starting with Absolute Error. Absolute error is the total error. For most tools we use in the real-world to measure things they are often marked with +/- 1mm or similar to tell the user the absolute amount of error in measurement. In computing terms, we could say the absolute error in a floating point value that we have just assigned a real number is the maximum amount of rounding possible for a real number in that range. Quite simple and easy to understand.

Expanding on that we also have the Mean Absolute Error, as it may sound this is a quite straight forward extension of the absolute error. This is simply the average (mean) of the errors between related results in a series. So the average value of f(x) - g(x) for all x in a series where f is the correct function and g is some approximation. This is used when evaluating error on continuous data as opposed to single measurement results.

Finally we have Relative Error, this is calculated from the Absolute Error and the total range you are working with. It gives you the error in your measurement relative to the total outcome. For example, if we are measuring weight and our scales are accurate to +-1Kg and the thing we are weighing is 20kg then we have a Relative Error of 5%. But if we were weighing something that was 2000kg then the error is 0.05% which is much better. This is an important consideration when measuring error in software as it can directly link the precision of the data type you are using to the size of the data you are trying to store.

That's about it on Absolute and Relative error. All quite simple but important to know the difference when reading or writing a paper so that you understand the context the error is being described in.

 

Understanding Error and Approximation: 1. Intro

Error and approximation can take many forms. It can be mathematical, numerical, algorithmic... pretty much any part of what we consider computer science has some level of error or approximation arising from the software, hardware or simply mistakes in our code.

When we are planning to take advantage of acceptable error as I am then we must have a decent understanding of what we consider error and the forms it can take in our software.

TOPICS

  1. Introduction (This page)
  2. Absolute and Relative Error
  3. Error Propagation
  4. Math Max Error
  5. Radius of Convergence
  6. Continuous Approximations
  7. Regularisation and Relaxation

What if we want Hash Collisions?

In an earlier post we detailed a method to generate a function level cache which could be used to compare the input values to the function to past values in a cache so that we could return answers we had already calculated when the input values were near enough.

A large bottle neck in that method was that the input values and cache had to be manually searched and compared adding a significant overhead if the function it was mapping to was not significantly complex. It also meant that the choice of what to keep in the cache was difficult and expensive to manage.

Luckily, this problem is very similar to that normal cache behaviour - we can solve it with some clever hashing. We only want to look into the cache where we expect to find our similar data. The problem with hash functions is that they often try very hard to avoid collisions. In this circumstance we want to get collision, but only collisions when the input value to the hash function are similar.

For example, a simple hash function for a float value might be to simply return the integer representation binary for that float . It is a 1:1 mapping of address index to value, nice and simple. If we were instead to add 0.5f to the float and cast the float to an integer, effectively rounding to the nearest integer we would have a hashing function which was accurate only to within 1 unit. This means that 3.1, 3.4 and 2.7 would map to the same hash address - 3. Thus achieving an intentional cache collision and allowing us to not have to search through all of our cache for all possible values which are within one unit, we simply hash the input and if we have a value matching it in the cache we are good to go and if not we can continue without the cache and add it in afterwards for the next similar query to read.

As you might expect the implementation of this is relatively simple. Using Modern C++ we could implement this on using an std::unordered_map with the hash function overloaded for our type or passed in explicitly (see this example for an std::unordered_set).

However, we have our own requirements and constraints. Our map needs to have an explicit memory footprint and some type of ordering would be useful depending on the type of hashing we want to do as different functions have different relationships for their inputs and output, so some customisation is desired. As this is being written as a test, being able to explicitly add debug and control the flow without the complexity of overriding standard library functions is also a plus.

Therefore, it seems worthwhile to show a trivial implementation which can be expanded to meet our criteria and be used to more clearly express this idea. Also, its fun.

So, to begin with we have to think of what the constraints on our container are. If we want to match the behaviour of normal maps than we need a type for the key and a type for the data it is stores. In our case this will be the input type of the function we are caching and the output type of the result, respectively. Next, we want to be able to specify the number of entries allowed in the cache so that we can tweak the size for performance. And finally, we want to be able to specify the hash function so that we can correctly place inputs into nearby addresses. This gives us a class definition that looks something like this:

template<typename Key, typename T, unsigned int Size, unsigned int(*hashFunction)(T)>
class CollidingHashTable
{
....
};

A little long winded, but it means we can statically declare a lot in the definition of the class.
(Side note: In this example we are only hashing for one input. This is for simplicity in this example. It would be possible to use variadics to allow for multiple inputs and storage to simplify the work of the programmer using the class - but its probably better to simply store all the inputs in a struct and use the type of that struct for simplicity in the code base. No point making it harder than it needs to be if you are probably going to change it later...)

Now that we have the vague definition for the class we need a definition for the structure which the input data will be held in.
In this step we need to think about what type of cache behaviour we would like as some of that behaviour will require information stored for each entry. In this example we have decided that I want the cache entries that are frequently accessed to stay in the cache as long as they are being used and some measure of their use for debug purposes. For debug, we probably also want to store the value that was used to generate the hash for that location also.
As a result we have four entries in our map entry class, two for usage and two for debug:

  • Storage Value : The value of the result of the function we are caching.
  • Hash Value: The value of the input to the function we are caching.
  • Hit Count: A running count of how many time this cache entry has been accessed.
  • Last Touch Index: The last time the cache entry was accessed (with the time being measured by a counter that increments every time the map performs an operation).

This gives code which looks like this:

	struct HashEntry
	{
		HashEntry() { hits = 0; hashValue = -1; storageValue = -1; lastTouchIndex = 0; }
		T hashValue;
		T storageValue;
		unsigned int hits;
		unsigned int lastTouchIndex;
	};

Next, we need to define how our map is going to store the data and control information. In this case we have opted to use an std::array of our HashEntry with the size which we pass in during the creation of the map. We also need to store our operation index to use as a timer for determining how old information is in the cache, and an arbitrary value to decide the maximum time an object can be in the cache without being used to allow us to replace them efficiently.
With this extra information, our collision prone map should now look something like this:
 

template<typename Key, typename T, unsigned int Size, unsigned int(*hashFunction)(T)>
class CollidingHashTable
{
public:
	CollidingHashTable() : m_lastOperationIndex(0), m_staleThreshold(Size)
	{
	}
	struct HashEntry
	{
		HashEntry() { hits = 0; hashValue = -1; storageValue = -1; lastTouchIndex = 0; }
		T hashValue;
		T storageValue;
		unsigned int hits;
		unsigned int lastTouchIndex;
	};

	std::array<HashEntry, Size> m_table;
	unsigned int				m_lastOperationIndex;
	unsigned int				m_staleThreshold;

};

To make this into a usable map class, we now have to add the insert and get functions.

The get is quite a trivial function, it simply takes the value, calculates a hash for it and then if an entry exists at that address returns the value and updates the hit count and last touch index of the entry.

The insert function has to handle a little bit more, it must handle if the value being passed in should be inserted into an empty position if it exists, or replace the current value that is being stored if that value is considered stale. It must also handle invalid inputs - this is important as we are specifying our own hash function and want to map the result of the hash function to the indices of our table. This can be changed to support any returned hash value (like an std::map) but it adds more complexity.

The code for these functions, with the rest, should look something like this:

template<typename Key, typename T, unsigned int Size, unsigned int(*hashFunction)(T)>
class CollidingHashTable
{
public:
	CollidingHashTable() : m_lastOperationIndex(0), m_staleThreshold(Size)
	{
	}
	struct HashEntry
	{
		HashEntry() { hits = 0; hashValue = -1; storageValue = -1; lastTouchIndex = 0; }
		T hashValue;
		T storageValue;
		unsigned int hits;
		unsigned int lastTouchIndex;
	};

	int Get(Key _hashValue, T& result)
	{
		m_lastOperationIndex++;

		unsigned int hash = hashFunction(_hashValue);
		if (m_table[hash].hits == 0)
		{
			return -1;
		}
		else
		{
			m_table[hash].hits++;
			result = m_table[hash].storageValue;
			m_table[hash].lastTouchIndex = m_lastOperationIndex;
			return 1;
		}
	}

	//Ret: 0-> Added, 1 -> Replaced, 2-> Occupied, -1 -> Error
	int Insert(Key _hashValue, T _storageValue)
	{
		//Increment the index of this operation and get the hash index.
		m_lastOperationIndex++;
		unsigned int hash = hashFunction(_hashValue);

		if (hash < 0 || hash >= Size)
		{
			return -1;
		}

		int returnVal = -1;

		//If the entry is empty or stale replace.
		if (m_table[hash].hits == 0)
		{
			returnVal =  0;
		}
		else if ((m_lastOperationIndex - m_table[hash].lastTouchIndex) > m_staleThreshold)
		{
			returnVal = 1;
		}
		else
		{
			return 2;
		}

		m_table[hash].storageValue = _storageValue;
		m_table[hash].hashValue = _hashValue;
		m_table[hash].hits = 1;
		m_table[hash].lastTouchIndex = m_lastOperationIndex;

		return returnVal;
	}

	std::string ToString()
	{
		std::string text;
		text.append("hashValue");
		text.append(", ");
		text.append("storageValue");
		text.append(", ");
		text.append("hits");
		text.append(", ");
		text.append("lastTouchIndex");
		text.append("\n");
		for (int i = 0; i < Size; i++)
		{
			text.append(std::to_string(m_table[i].hashValue));
			text.append(",");
			text.append(std::to_string(m_table[i].storageValue));
			text.append(",");
			text.append(std::to_string(m_table[i].hits));
			text.append(",");
			text.append(std::to_string(m_table[i].lastTouchIndex));
			text.append("\n");
		}

		return text;
	}

	std::array<HashEntry, Size> m_table;
	unsigned int				m_lastOperationIndex;
	unsigned int				m_staleThreshold;

};

With this class altogether we can create a map like this:

	CollidingHashTable< float, float, HASHTABLESIZE, SimpleHash> ourHashTable;

Where 'SimpleHash' is any hashing function we want.

As a simple example, here is the hashing function described above that is mapped for float inputs to a function in the range 0-20.

#define HASHTABLESIZE 20
unsigned int SimpleHash(float _in)
{
	unsigned int intRep = (unsigned int)(_in+0.5f);
	unsigned int modVal = intRep % (HASHTABLESIZE-1);
	return intRep;
}

And we can now test this by generating a cache of answers to the 'sqrt' function where it is accurate only to '+- 0.5f' of the input.

int SqrtAndStore(float _value, CollidingHashTable< float, float, HASHTABLESIZE, SimpleHash>& _table)
{
	float sqrtVal = sqrt(_value);
	int res = _table.Insert(_value, sqrtVal);
	return res;
}

int main()
{
	CollidingHashTable< float, float, HASHTABLESIZE, SimpleHash> ourHashTable;
	float testValues[] = { 2.f, 2.5f, 3.5f, 3.7f, 10.f, 19.f, 10.01f, 10.9f, 11.f, 11.4f };
	for (int i = 0; i < 10; i++)
	{
		int res = SqrtAndStore(testValues[i], ourHashTable);
		if (res == 0)
			std::cout << "sqrt(" << testValues[i] << "): Added to empty hash index.\n";
		if (res == 1)
			std::cout << "sqrt(" << testValues[i] << "): replaced what was at hash index.\n";
		if (res == 2)
			std::cout << "sqrt(" << testValues[i] << "): hash was occupied so did nothing.\n";
		if (res == -1)
			std::cout << "sqrt(" << testValues[i] << "): encountered an error in the hashing process.\n";
	}
	std::cout << "\n" << ourHashTable.ToString();

	return 0;
}

This code and wrapping debug output will tell us the behaviour for each call to insert and print out the state of the cache so we can see how it behaved over the run. It should give you an output that looks like this:

sqrt(3.7): hash was occupied so did nothing.
sqrt(10): Added to empty hash index.
sqrt(19): Added to empty hash index.
sqrt(10.01): hash was occupied so did nothing.
sqrt(10.9): Added to empty hash index.
sqrt(11): hash was occupied so did nothing.
sqrt(11.4): hash was occupied so did nothing.

hashValue, storageValue, hits, lastTouchIndex
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
2.000000,1.414214,1,1
2.500000,1.581139,1,2
3.500000,1.870829,1,3
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
10.000000,3.162278,1,5
10.900000,3.301515,1,8
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
-1.000000,-1.000000,0,0
19.000000,4.358899,1,6

As you can see by the resulting table we have successfully mapped results to the same hash locations based on value, but we have a lot of empty space in table. Ideally we would want to shape the size of the table and hash function to the function that is being called, but another alternative would be to extend this class to be able to switch the hash function being used in different conditions and reorder the information stored within. Alternatively we could have the cache resize up to a maximum size at the cost of a little computing, or behave like a standard std::map and have an index to data to allow all occupied cache entries to stored without any gaps to possibly get better cache coherence (if the additional small index doesn't overshadow the gain in a small table...). But that is all work for another day!

I hope this was helpful!

Additional Point: This method can be cheaply extended to converge on an average result in the range by adding a number of insertion attempts at the same hash and recording a sum of all the storage values and then when it is read from the cache return the sum divided by the number of entry attempts. This wont give you the average result of the function in the range covered by that cell but will give you the value weighted towards the most common insertion point - which is probably closer to the answer you are trying to get back.