0x5f400000: Understanding Fast Inverse Sqrt the Easy(ish) Way!
In this post, we attempt to unravel the mysteries of the “magic” constant found in the fast inverse sqrt method in an intuitive fashion.
 28 October, 2014 
 Algorithm, Article, C, Numerical Analysis 
 Tags : floating point, numerical analysis
 2 Comments
Table Of Content
Converting between a Float and its Bits in memory
0x3F800000 – Generalized fast power method
Requested by averageisnothalfway: Differences between a1a and df
TL;DR – Start Here!
1. Let be the floattobit function (that takes in a floating point number and outputs a 32bit long representing its IEEE 754 encoding used on your everyday computer) and be the bittofloat function, then the fast inverse square root method can be rewritten as
2. It turns out that for some nonlinear function that doesn’t vary a lot with , so it becomes an optimization game to find such that .
3. We know that when no matter what is, so if we want to find such that , all we need to do is set because then
so , which let’s us make arbitrary fast power methods on demand.
4. ???
5. Profit!
1. Introduction
The first time I saw the magical fast inverse square root method, I was a freshman in college and the code pretty much killed my brain. I’d just learned about algorithms and data structures and I thought that binary search trees were like the coolest things ever. Turns out I didn’t know them well enough to tackle this problem back then.
WTF?
If you’re not familiar with the fast inverse square root method, here’s what it looks like in C (source: wikipedia)
float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = * ( long * ) &y; // evil floating point bit level hacking i = 0x5f3759df  ( i >> 1 ); // what the fuck? y = * ( float * ) &i; y = y * ( threehalfs  ( x2 * y * y ) ); // 1st iteration // y = y * ( threehalfs  ( x2 * y * y ) ); // 2nd iteration, this can be removed return y; }
[Note: The use of pointer aliasing here is illegal, as Maristic pointed out in the followup discussion on reddit. Use unions instead.]
Okay, so first you declare i, x2, y, and a constant threehalfs. Next, you set x2 to be and set . So far so good!
Okay, next, i gets the dereferenced value of the float* pointer to y casted into a long* pointer instead?? Okay…
Next, i = 0x5f3759df – (i/2)??? Huh?
y = *(float*)&i????
y = y * ( threehalfs – ( x2 * y * y ) )??????
What. The. Fucking. Fuck.
Obsession
Before we go on, I should probably warn you that this code was a constant obsession of mine for the better part of my adult life. Because of this, a lot of what I’m about to say may sound preachy, cheesy, or even outright annoying. I’ll try to tone it down, but truth be told, I am proud of this work; I’ve come to see the fast inverse square root method as my baby and because of that, I’ve become quite possessive of it. As much as this is a post on the possible derivation of 0x5f3759df, this is also a memoir of my college life (minus the fun parts ;).
From the very beginning, I tried to wrap my head around this strange game of algebraic reasoning, and for the next 4 years, my academic life largely revolved around this mystery: I spent my winter break freshman year learning C and figured out that the “bit level” hack gave you the array of bits that your computer encoded floating point numbers as; sophomore year, I took a first course on numerical methods and understood how iterating the sequence
ad infinitum will eventually converge to ; then, during the summer before my junior year, I worked through a lot of tedious algebra dealing with the quirks of IEEE754 and found another bitlevel hack to quickly compute the square root a la the spirit of the fast inverse square root method, but this time exploiting the distribution of floats and a convenient analogy to second order taylor expansions.
Indeed, my obsession with this method has given me an appetite for numerical analysis, programming languages, software verification, compilers, and even combinatorics and dynamical systems.
Without it, I would have never explored most of the areas of Noise Control science that I currently consider my home.
But through it all, there was always a constant question at the back of my mind that I’ve failed to answer time and time again: where did the 0x5f3759df come from?
I never got to a point where I can claim that I have a satisfactory answer to that question, but I think I did come to discover something along the road: a different magical constant 0x5f400000 that I believe lay at the very beginning of the history of this method, and it is the joy of discovering this constant that I would like to gift to my readers.
2. Switching Strategy
Towards the end of my college years, I realized that there was a big problem in the way I approached the problem of looking at 0x5f3759df. Most of the traditional research into the subject has been through the perspective that whoever wrote the algorithm went after the most optimal method possible. In the process of hunting for the jackpot, a lot of great intuition was left behind because those ideas weren’t efficient or effective enough. This is what led me down the path of looking for inefficient but elegant methods that could inspire the fast inverse square root method.
But first, let’s ramp up on some preliminaries. A lot of work and paper on this matter jump immediately to the representation of floating point numbers; I will completely forgo that, instead relying on mathematical intuition to do the job. (This is in part why I didn’t become a mathematician :P)
Converting between a Float and its Bits in memory
Let’s first consider two functions: , which stands for floattobits will return the IEEE 754 bit array representation of the floating number , effectively performing *(long*)&y; and , which stands for bitstofloat and takes a bit array and turns it back into a float. Note that and vice versa so that .
It turns out that the magical fast inverse square root method is equivalent to
Looking at b2f(f2b(x)*b)
With the conversion functions out of the way, I next turned my attention to everything but the magic constant. Specifically, what are the effects of multiplying inside the . I fired up my favorite plotting library and looked at
After a little bit of wrestling around with the plot, I realized that when plotting in loglog style, it has the same shape as . This means that
for some constant .
I fitted a “regression” line (this isn’t a real regression however, don’t be mislead) through the data onto and found that the relative error is quite small as can be seen in the figure below.

In fact, if you tick through the slider at the bottom of the figure, you’ll come to find that for any arbitrary , it seems to be the case that
for some constant .
3. Homing In on 0x5F400000
Adding Alpha
Let’s now consider the following function:
Notice here that we now have added an additive term to the equation. Again, we can rewrite the fast inverse square root method as
A natural question one might consider is the effect of varying on the behavior of . It turns out that on the loglog graph of , varying does not change the slope of the line, but only its yintercept. In other words, varying will only affect magnitude of the constant , as we can see below.

This then suggests that determines the degree of while determines its magnitude. Symbolically:
where is some constant.
0x3F800000 – Generalized fast power method
We are now left at an interesting point in the puzzle. We have on one hand the knowledge that
for some function and we want to determine such that
Canceling out the on either side, we are left with the functional equation
So right now we need to find such that . But for , for any , so the naive thing to do here is to reduce our approximation to
Let’s take a moment and appreciate this statement:
This says that for any exponent (positive or negative, integer or fractional), we can approximate with
That’s mind blowing because we can immediately generate “fast power” methods using nothing but !
float fast_pow(float x, float n) { long bit = (*((long*)&x))*n + 0x3f800000*(1n); return *((float*)&bit); }
Going back to inverse square roots, we have , which is the namesake of this article.
4. Epilogue – The Hard Parts
Closing In On 0x5f3759df
After discovering the existence of 0x5f400000, things became much clearer. However, it was still quite off from SGI’s 0x5f3759df or Lomont’s 0x5f375a86, and so the question of how this constant came to be still nagged at the back of my head. Nevertheless, I am certain that 0x5f400000 appeared at least in some shape or form in the earlier versions of this numerical recipe.
Many others have previously discussed the possible origins of 0x5f3759df, and in the rest of this post, I will offer my own views on the matter.
For the most part, I believe that the discovery of 0x5f3759df is an iterative process: it starts with 0x5f40, then it gets refined to 0x5f375X, and finally someone must have decided on 0x5f3759df. As engineers are practitioners by nature, I don’t believe that anyone that worked on 0x5f3759df cared for its optimality; therefore, I believe that intuition and visualization play the most important part in its discovery.
In this section, I will first outline one path to refine 0x5f40 to 0x5f375X through intuition about the maximum error bounds of the recipe, followed by a discussion on the final discovery as a product of analyzing one step of newton’s iteration.
Refinement – 0x5f376000
If we look at the graph of the relative error of on a logarithmic xaxis below, you will notice a couple of peculiar things.
First and foremost, changing the value of the magic constant by small amounts seems to lift the relative error up. Next, the error is periodic with respect to with a period of (so that ). Finally, the position of the maximum and the minimum of the relative error is largely unchanged when we vary the magic constant.
We can use these observations to our advantage. Notice that the error for is completely negative. However, if we were to use a smaller magic constant, we can shift part of the error up into the positive zone, which reduces the absolute error as can be seen below.
We are trying to reduce the error bounds as much as possible (under the infinity norm optimization as we would say). In order to do so, we need to ensure that the maximum point plus the minimum point in the nonabsolute relative error plot is as close to 0 as possible.
Here’s what we’re going to do. First, we’re going to zoom in onto the section of the graph within the interval because it is periodic. Next, we will look at the maximum point and minimum point and look at their sum.

You will see that the optimal constant here lies somewhere close to 0x5f376000, which is about away from the SGI constant.
Newton Iteration – 0x5f375a1a
In our final trick tonight, we will refine this constant down further to within 60 away from SGI’s 0x5f3759df.
One of the important things to notice from the quake code is that the second newton iteration was commented away. This suggests that whoever worked on the latest version of the fast inverse square root method must have optimized the constant with the effects of newton iteration applied. Let’s look at how well the constants do with the help of an iteration.
Wow, using 0x5f3759df as the magic constant improved the accuracy nearly tenfolds! This likely suggests that whoever worked on quake’s version of fast inverse square root method sought after 4 digits of accuracy. In order to achieve this with the vanilla constant 0x5f400000, they would need to run two newton iterations.
Let’s dig in a little bit and try to find the magic constant that minimizes the maximum error after one newton iteration.

Incidentally, this occurs when the right most two peeks in the above graph collide. We find that the magic constant 0x5f375a1a works best here, which is only 59 away from 0x5f3759df.
We can look at the mean error as well (the onenorm):

but as the bulk of the mass of the error is tucked under the two humps, the 1norm/meanmetric (and in fact, the first few reasonable norms) will tend to disregard the significance of the outer edges and their maximumbounds. Therefore, these will attempt to decrease the constant by as much as possible.
Finally, let’s look at the maximum relative error after two iterations:

This ended up giving us the same magic constant of 0x5f375a1a, which I guess makes sense.
Closing Thoughts
In this article, we’ve come across a couple of different constants: 0x5f400000, 0x54376000, 0x54375a1a. None of these are exactly the 0x543759df that we’ve been searching for, but the last one is really close, which leads me to believe that the discovery of 0x543759df went through a similar process: 1. intuition, and 2. refinement.
The dark art of fast reciprocal square roots have been largely done away with nowadays with faster intrinsics. If anything, this article contributes little technical value besides proposing a halfassed solution to a great puzzle. Nevertheless, I sincerely hope that, having made it this far, you’ve also found some value in this research. Good Luck!
Requested by averageisnothalfway: Differences between a1a and df

Jason S

Phailure
