Add Me!Close Menu Navigation

Scientific Computation

Binary representation of numbers and floating point precision arithmetic.

Once upon a time, I absolutely despised matlab. In my own spoiled preconceptions, I believed that matlab was designed without a single shred of care as to how ugly it is. Of course, I eventually had to finally write my first line of matlab and, upon discovering that I’m still in one piece, concluded that matlab wasn’t really that bad. Anyways, I’ve moved onto grander things now, things such as singing obnoxiously in public, browsing reddit for hours on end, and Python.

In the beginning, there was unary, which is the numerical basis you use when you count things using your fingers or using sticks.

  1. 1
  2. 11
  3. 111
  4. 1111
  5. etc

The problem is that there’s no efficient way of representing very large numbers using this system, since we’ll probably need to cut down all of the trees in the world in order to count the number of people there are in the world. (and then, since we lost count of the number of trees used so far, we start destroying more things until the entire earth burns in the vain attempts of humans trying to count using an unary basis! CAN YOU SEE IT? This was what the Mayans were prophesizing before they went over to the base 20 system) Ahem, anyways, there’s an interesting property of numerical basis that any basis larger than one will need a logarithmic number of digits to represent. Take for example, the “modern” base-10 (decimal) system. As everybody hopefully knows, (I’m sorry, I understand that this is as offensive to people that count in French as flashing YOLO (you only live once) to a group of cats) our system consist of the literals 0 through 9. In order to express a number below 10, you only need a single digit; two if you need a slightly bigger number below 100, and if you’re really feeling adventurous, 12978189 digits to express the largest known prime as of writing this piece.

Digits Smallest decimal number it represents
1 0
2 10
3 100
4 1000
5 10000
6 10^{5}
k 10^{k-1}

As we can see, the magnitude of the number versus the number of digits is exponential, then it must follow that the number of digits versus the magnitude of the number is the inverse function of the exponent function, which is the logarithm.

 \begin{array}{ccc} n(d) &=& 10^{d-1}\\ log_{10}(n) &=& log_{10}\left(10^{d-1}\right) \\ &=& d-1 \\ &\iff& \\ d(n) &=& log_{10}(n)+1 \end{array}

Where d(n) is the number of digits of n, and n(d) is the smallest decimal number with d digits.

Anyways, for some arbitrary reason, American scientists decided to use base-2 arithmetic in its first implementations of a digital computation, and everyone else in the world (except for Russia, who dabbled with the ternary, or base-3, system for a while) decided that, meh, might as well.

In order to count in base two, you do the same exact thing as you would in decimal, except you are only allowed to use two digits 0 and 1.

Decimal integer Binary
0 0
1 1
2 10
3 11
4 100
5 101
6 110
7 111
8 1000

Since you’re here, the chances are you probably already understand how to count using binary, so I won’t bore you further. The interesting bit comes from how to represent non-integer numbers. Now, in order to have parts of an integer, we must have the notion of division somewhere along the way. A half is what you get by dividing a unit into two pieces, a third from three pieces, two and a half from dividing five units into two pieces, and \pi… well, we’ll get to \pi later.

Now, remember those pesky division problems you had to do in elementary school? Well, if you bet your teacher that you were never going to need to use those “put the numerator inside the house and ask the denominator to knock” methods, it’s probably a good time to pay up now. Recall that in order to divide 5 into 2, we would need to do something like this

\begin{array}{cc}  &\ 0.4 \\  5 &)\overline{2.0} \\  &\ \underline{2.0} \\  &\ \ \ 0  \end{array}

Similarly for binary, we do the exact same thing, except we convert 2 and 5 into their binary forms 0b10 and 0b101

\begin{array}{cc}  &\ \ 0.01 \\  101 &)\overline{10.00} \\  &\ \ \underline{1.01} \\  &\ \ \ \ \ .110  \end{array}

The attentive reader should point out at this point that since 101 divides into 110, we can keep on going

\begin{array}{cc}  &\ \ 0.011 \\  101 &)\overline{10.00} \\  &\ \ \underline{1.01} \\  &\ \ \ \ \ .110 \\  &\ \ \ \ \ .\underline{101} \\  &\ \ \ \ \ \ \ \ \ \ \ \ \ \ 1000 \\  &\ \ \ \ \ \ \ \ \ \ \ \ \ \ ...  \end{array}

Then using the binary point system for non-whole numbers, we find that \frac{2}{5} is 0.011001100110011… in binary, whereas it is just 0.4 in decimal. Why is 0.4 a repeating binary-point number? This is actually quite simple. Since 10 has prime factors 2 and 5, then in decimal, all numbers with denominators that are exact multiples of only 1, 2 and 5 are finite decimal points, whereas in base 2, only numbers with denominators that are exact multiples of 2 are finite binary points. Since \frac{2}{5} has denominator 5, it is finite in base 10 but repeating in base 2. (Don’t worry if this is confusing, you won’t really need this in practice.

Alright, now we explain where the magical “floating” point system came from. First, imagine that you have a computer with infinite amount of memory; stop, because those don’t exist. So then, if we need potentially infinitely many digits in order to represent \frac{1}{3} in decimal, how do we represent all of those digits? Now some of you may propose to store all numbers as a division between the numerator and division, and by presuming integers to be precise to some finite precision, so will the point-numbers. This works great for numbers that can be expressed as a fraction, but not all numbers are rational. How do we do this for \pi? (*Actually, this is a fallacious argument often made to highlight why round-off in the fraction is still imprecise, however this problem also presents itself in floating point numbers. The real difference is that it takes much less computational power to work with the numbers, since comparing 2/5 and 4/10 should yield true, we need to be able to somehow normalize 2/5 and 4/10 to realize that they are indeed the same number, which could be quite expensive in bulk quantities)

Instead, the standard way of doing this is to compute the point-number to be precise to some number of digits, and then tack on an exponent. For example, in binary-point, we could round \frac{2}{5} to 56 digits of precision as 0.0110011001100110011001100110011001100110011001100110011 and write this as 00110011001100110011001100110011001100110011001100110011e-55 or 0.0110011001100110011001100110011001100110011001100110011e1. Notice how the exponent changes as the magnitude of the number changes, and hence it floats based on the order of magnitude of the number.

So, how do we represent \pi? Well, we refer to our previous idea of representing everything as a division of integers, and represent \pi as the fraction \frac{31415926535897932}{10^{16}} or \frac{1101111100111001001111001100101011101100100001101001100}{100011100001101111001001101111110000010000000000000000} and round the resulting number accordingly.

Alright, if you’ve made it this far, there’s probably one huge question that you’re dying to ask. What the hell is scientific computing? Have you ever hated math in grade school? I’ve memorized my share of multiplication tables and trig identities that I even convinced myself by the time I entered college that math sucks and I would never ever not even in a million years touch it ever again. Of course, I still had to do my core college math courses in the forms of linear algebra, multivariable calculus, differential equations, and more calculus and diffeq in mechanics, E&M, and waves and quantum. On the other hand, about English courses, do you know that there’s a difference between business English and normal English? Yes there is and for you to fully understand that topic, go to And just when I thought I had finally escaped it, it came back to me again in the form of stats, discrete math, and even the underpinning topic of theoretical CS, polynomial approximation algorithms for NP-Hard problem. During the course of undertaking what I imagined at first will be an extremely painful process, I found that I actually really really enjoyed math. And it finally clicked for me, I hated math in high school because there was absolutely no reason for me to learn it at the time (except for calculus, which I enjoyed if not only for the reason that it was new). But now, I understand why and when to use all these things that I have learned back in highschool.

Underlying the field of scientific computing is the ability to solve numerical problems to high precision, and to be able to understand when and why the algorithms we develop break down. (See Computational science) Why would you ever want to do this? Here’s a simple example. Where I go to school, we have a free newspaper stand right next to the dining halls. Every day, me and my roommate would flip open to the end of the OpEd section of the Daily Sun (our school newspaper) and do the sudoku. When we started this tradition, I have never actually sat through an entire sudoku before (attention span issues I guess, I still have those). Now, to ensure that I won’t be beaten, I could either keep on doing sudoku the old fashioned way and hope that I get better or I could attempt to develop an algorithm that can solve sudoku for me. Let’s assume that I chose the latter, I would probably google a bit online, and find some interesting forums telling me that sudoku is an instance of the exact covering problem, and I would have probably nodded my head knowingly and attempt to implement a solver for exact covering. Now, using dynamic programming, we can actually muster up quite a nice pseudopolynomial algorithm, but let’s assume that I was recently exposed to some interesting seminars on reducing known hard problems to integer programming, which can be approximated using some randomized linear programming… err, I seem to have gone completely offtopic. Anyways, the thing to take away is that even with scientific computing, it’s still impossible to solve sudoku in a reasonable amount of time. :)

Anyways, stick around for next time, when we learn how to analyze the effects of round-off errors during floating point arithmetic, and why subtraction came from hell.

Posted By Lee