Add Me!Close Menu Navigation

Somewhat Fast Square Root

Written for a challenge from /r/dailyprogrammer, I present to you an algorithm to approximate square roots that is still unfortunately slower than the _mm_sqrt_ps primitive offered by Intel. The following article is largely inspired by the brilliantly written Fast InvSqrt algorithm from id software.

Disclaimer: I’m no expert, in fact I’m a lowly undergrad who is very prone to making mistakes, so take what I have written with a healthy dose of skepticism.

I’m aware of this wikipedia algorithm. It performs as accurately as the non-compensated version of qsqrt(x) exactly half of the time, but have the advantage of not having to use up an extra cycle on the FPU for the multiplication by \sqrt 2. Can you figure out the cases where the wikipedia method doesn’t perform as well?

1. To whet your appetite

I will explain why the scary looking code listing presented below approximates \sqrt n to 7 digits of precision. To see it in action, see (The reason that the relative errors are all 0 is because each of the relative error \delta_n < \epsilon_{{mach}} associated with n are subnormal, or underflows values representable by single precision floating point numbers.)

#define SQRT2 1.4142135623730951

float qsqrt(float x){
	// x = [0 eeeeeeeeeee mmmmmmmmmmmmmmmmmmmmmmm]
	int i = *(int*)&x;
	int j = (((i/2-0x1fc00000)+0x3f800000)&0x3ff800000)+((i&0x7fffff)/5*2);

	float y = (*(float*)&j) * ((i-0x3f800000)&(0x800000) ? SQRT2 : 1); // 2-3 sig figs of significance

	// two iterations of newton: y = y - (y^2 - x)/2y yields 2^-18 points of precision
	y = (y + x/y)/2; // this yields 2^-8 points of precision
	return (y + x/y)/2;

2. A brief introduction to the IEEE standard

Say we stored a number x as a single precision floating bit number; as a bit vector consisting of 32 bits (4 bytes), it will look something like

    \[\begin{tabular}{ c | c | c | c }   bit position & 31 & biased exponent - 30 to 23 & mantissa - 22 to 0 \\   Float(x) stored as & sign & e_7 e_6 e_5 e_4 e_3 e_2 e_1 e_0 & m_{22} m_{21} \hdots m_2 m_1 m_0 \\ \end{tabular}\]

The first 23 bits, from bit 0 to bit 22, are reserved for the mantissa, which represents the binary rational number 0.m such that

    \[0.m = \sum_{i=0}^{22} m_i \times \left(\frac{1}{2}\right)^{i+1} = \frac{m_0}{2} + \frac{m_1}{4} + \frac{m_2}{8} + \hdots + \frac{m_{21}}{2^{22}} + \frac{m_{22}}{2^{23}}\]

where each of m_i can either be the binary bit 0 or 1. You can easily show that \sum_{i=1}^{\infty} \left(\frac{1}{2}\right)^{i} = 1, which means that 0.m must be strictly less than 1.

Furthermore, the next 8 bits, bits 23 to 30, are reserved for the biased exponent, which represents the binary integer e+127 such that

    \[e+127 = \sum_{i=0}^{7} e_i \times 2^i = e_0 + 2e_1 + 4e_2 + \hdots + 2^6 e_6 + 2^{7} e_{7}\]

where each of e_i is also either 0 or 1. Now, the decimal x is represented as

    \[x = (1 + 0.m_{(x)}) \times 2^{(e_{(x)}+127) - 127}\]

Here, e_{(x)}+127 is notation for the biased exponent of the floating point number x, not the x^{th} bit within the exponent’s bit vector; similarly, 0.m_{(x)} is the notation for the mantissa of the floating point number x.

Let’s look at an example before moving on: Say we are given the following floating point number, and we are asked to find its decimal point equivalent.

    \[\begin{tabular}{ c | c | c | c }    & 31 & biased exponent - 30 to 23 & mantissa - 22 to 0 \\   ? & 0 & 00001111111 & 00110011001100110011001\\ \end{tabular}\]

We start by finding out what e_{(?)} + 127 is. From the above formula, we have

    \[e_{(?)} + 127 = \sum_{i=0}^{10} e_i \times 2^i = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 0 + 0 + 0 + 0 = 127\]

Note that e_0 in the above diagram is all the way at the right while e_{10} is at the leftmost.

Similarly, we need to figure out what 0.m_{(?)} is. We start off with the formula given above

    \begin{align*} 0.m_{(?)} &= \sum_{i=0}^{22} m_i \times \left(\frac{1}{2}\right)^{i+1} \\ &= \frac{m_0}{2} + \frac{m_1}{4} + \frac{m_2}{8} + \hdots + \frac{m_{21}}{2^{22}} + \frac{m_{22}}{2^{23}} \\ &= \frac{1}{2} + \frac{0}{4} + \frac{0}{8} + \frac{1}{16} + \frac{1}{32} + \hdots + \frac{0}{2^{22}} + \frac{0}{2^{23}} \\ &\approx 0.5999999046325684 \end{align*}

So then this means that

    \[? = (1 + 0.m_{(?)}) \times 2^{(e_{(?)}+127)-127} = (1 + 0.5999999046325684) \times 2^{(127) - 127} = 1.5999999046325684\]

3. How to square root

Taking the above definition of a floating point number, let’s see how the square root affects its components

    \[y = \sqrt{ x } = \sqrt{(1 + 0.m_{(x)}) \times 2^{(e_{(x)}+127) - 127}}\]

now, y itself can be stored as a floating point number

    \begin{align*} y &= (1 + 0.m_{(y)}) \times 2^{(e_{(y)}+127) - 127} \\ &= \sqrt{(1 + 0.m_{(x)}) \times 2^{(e_{(x)}+127) - 127}} \\ &= \sqrt{1 + 0.m_{(x)}} \times \sqrt{2^{(e_{(x)}+127) - 127}} \\ &= \sqrt{1 + 0.m_{(x)}} \times 2^\frac{(e_{(x)}+127) - 127}{2} \\ \end{align*}

If you’ve taken a physics course on E&M of particles and waves, you’ll probably remember having to use the following approximation

    \[\sqrt{1 + x} \approx 1 + \frac{x}{2}\]

This is taken from the first order taylor expansion of the square root function, let f(x) = \sqrt{x}

    \begin{align*} f(1 + \epsilon) &= f(1) + \epsilon f'(1) + \frac{\epsilon^2}{2} f''(1) + \mathbb{O}(\epsilon^3)\\ &= \sqrt 1 + \frac{\epsilon}{2 \sqrt 1} - \frac{\epsilon^2}{8 \sqrt 1^3} + \mathbb{O}(\epsilon^3) \\ &= 1 + \frac{\epsilon}{2} - \frac{\epsilon^2}{8} + \mathbb{O}(\epsilon^3) \end{align*}

So we can simplify y further

    \begin{align*} y &= (1 + 0.m_{(y)}) \times 2^{(e_{(y)}+127) - 127} \\ &= \sqrt{1 + 0.m_{(x)}} \times 2^\frac{(e_{(x)}+127) - 127}{2} \\ &\approx \left(1 + \frac{0.m_{(x)}}{2}\right) \times 2^\frac{(e_{(x)}+127) - 127}{2} \end{align*}

By matching up the e‘s and m‘s, the above suggests that for y = \sqrt x

    \[0.m_{(y)} \approx \frac{0.m_{(x)}}{2}\]


    \[(e_{(y)} + 127) - 127 = \frac{(e_{(x)}+127) - 127}{2} \iff (e_{(y)} + 127) = \frac{(e_{(x)}+127) - 127}{2} + 127\]

That is, the mantissa bits (bit 0 to 22) of y should be approximately half that of the mantissa for x, and the “biased exponent” of y (e_{(y)} + 127) should be about half of (the biased exponent of x – 127), and all of that added together with 127, so this is what we will attempt to write.

4. Onto the code

In this section, we’ll try to fit the above logic into C from scratch, then figure out how to optimize it

We can’t access the bits of a floating point number conveniently in C, so we need to find a way to convert it into an underlying type where it is easy to work with the bits. You can’t just cast it to an int as the compiler will actually just truncate the number to an integer and discard all of the underlying information about how the number was originally represented.

Instead, we can put it in memory, and then dereference that location as an integer pointer to get the actual bits representing the floating point number.

    float x;
    int i = *(int*)&x; // this gives you the bit array representing x

Alternatively, you can also use the following

    float x;
        float x;
        int i;
    } u;
    u.x = x;
    int i = u.i; // this gives you the bit array representing x

On gcc4.7 default optimization level, both compiles to a single SSE mov instruction.

Once we figure out how to get the bit array of a floating point number into a variable i, we need to be able to isolate out the “biased exponent” bits (23 to 30) and the exponent bits. We can do this by creating a bit mask and bitwise anding the bit array with the mask to extract out the required parts

// exp is between 23 and 30
//      bits  10987654321098765432109876543210
#define EXP 0b01111111100000000000000000000000
#define MAN 0b00000000011111111111111111111111
#define BIAS (1<<23)

    int m_x = i&MAN;
    int e_x = (i&EXP) - BIAS; // this is the unbiased exponent

Next, we find an approximation for y as from above

    // m_y = m_x/2
    int m_y = m_x/2;
    // e_y = e_x/2, where e_y and e_x are the unbiased exponents
    // be careful, if e_x is odd, e_x/2 will get truncated
    int e_y = e_x/2;
    // however, we need to turn the unbiased exponent into the biased one, we do this by adding the bias
    e_y += BIAS

Next, we need to concatenate the exponent and the mantissa together, we can do this by oring the two together, which is just the addition operator

    int j = (e_y&EXP)+(m_y&MAN); 
    float y = *(float*)&j;

Finally, if the unbiased exponent for x is odd, then the division of the exponent in the previous piece of code will truncate the exponent by an error of 2^{\frac{1}{2}}, which is just \sqrt 2. Hence, we need to test to see if e_x is odd, and if it is, we need to multiply y by \sqrt 2

#define SQRT2 1.4142135623730951
    if (e_x & (1<<23)){
        y *= SQRT2;

    return y;

However, we can rewrite this more succintly. Convince yourself that the above code can be more succintly written as

    int j = (((i/2-BIAS/2)+BIAS)&EXP)+((i&MAN)/2);

5. Crude Error compensation

As we’ve seen previously, the first order taylor approximation of \sqrt{1 + 0.m_x} has an error that can be approximated by \frac{\epsilon^2}{8}. Now there’s no easy way of calculating \epsilon^2 in integer arithmetic, so what we do instead is bound \epsilon^2 to something reasonable. Since around any reasonable neighborhoods, 0.m^2 \approx \frac{0.m}{2} \mbox{ to } 0.m, we compromise and decide experimentally that \frac{\epsilon^2}{8} \approx \frac{\epsilon}{10}, which means that 1 + \frac{\epsilon}{2} - \frac{\epsilon^2}{8} \approx 1 + \frac{2\epsilon}{5}. This is how we got the expression

    int j = (((i/2-BIAS/2)+BIAS)&EXP)+((i&MAN)*2/5);
Posted By Lee