There was recently a good article on scientific computing, defined loosely as the dark art, as it may have seemed to the uninitiated, of deriving solutions to equations, dynamical systems, or what-not that would have made your Mechanics professor scream in horror at the thought that they will need to somehow “solve” these systems. Of course, in the rich computer world today, almost any problem imaginable (exaggeration of course!) can already be solved by some existing tool. Hence more and more, the focus gets shifted from “how do I solve this differential equation” to “what do I ask google?”

My dad once told me of a glorious time “back in [his] days” when every respectable engineering institution would have a crash course on this dark art on top of Fortran. Of course, my dad is only 43, and that was only 19 years ago.

Even now, when computer science departments everywhere no longer believes in the necessity in forcing all of their graduates to have a basic grasp on numerical analysis, there is still some draw in the subject that either makes people deathly afraid of it or embrace it as their life ambition. I am personally deathly afraid of it. Even then, there are quite many cute gems in the field, and as such, I am still very much so attracted to the field.

Scientific computing is the all encompassing field involving the design and analysis of numerical methods. I intend to start a survey of some of the basic (but also most useful) tools such as methods that: solve linear and nonlinear systems of equations, interpolate data, compute integrals, and solve differential equations. We will often do this on problems for which there exists no “analytical” solution (in terms of the common transcendental functions that we’re all used to).

### 1. Safety First

In an ideal world, there would be a direct correspondence between numerical algorithms their implementation. Everything would work out of the box and there would be no need to worry that, even if you’ve implemented the on-paper algorithm correctly, it would somehow behave “differently”.

Of course, this isn’t the case. We’ve all heard of the age old saying that computers are finitary, and therefore it cannot represent all real numbers, specifically, there’s no way to represent irrationals, and in most of the cases, we do not fully represent all rationals either. Instead, we round numbers to a certain digit.

On the surface, this doesn’t seem too unfortunate. You probably did the same in your Chemistry lab report, to even more horrendous precision than what your computer will likely do for you. If you aced your Chemistry lab, then this will likely seem like a perfectly good scheme. But if you, like me, were troubled by the “sig fig” rules, then you are probably a little wary.

In fact, more often than not, you will not be bothered by the lack of a full spectrum of real numbers to choose from. However, when these little nasty “roundoff” errors are the culprit, they are often resolved through hours upon hours of debugging and general sense of hopelessness. To inherit a roundoff bug from someone else is like contracting the spanish flu: either you know what you’re doing and your system (with a little bit of agonizing) successfully resolve it, or you just won’t have any idea what you’re going to do (and die in the spanish flu metaphor). Think of this article as a vaccination against the roundoff bugs :)

### 2. A Modern Day Little Gauss Story

Suppose little Gauss lived in the modern age. little Gauss’ teacher wanted to surf the internet, so he assigned all of his students the following integral to evaluate:

Being the clever alter-ego of the boy who immediately saw , little Gauss constructed a sequence

and with a little bit of clever manipulation (integration by parts), he found that, using

and

Why, this is a simple recursive function! Little Gauss was absolutely thrilled, he has at his disposal a programmable calculator capable of python (because he’s Gauss, he can have whatever the fuck he wants), and he quickly coded up the recurrence:

**Python Code**

import math def s(k): if k == 0: return 1 - math.exp(-1) else: return 1 - k*s(k-1)

He verified the first few cases by hand, and upon finding his code satisfactory, he computed the 100th element of the sequence as `-1.1599285429663592e+141` and turned in the first few digits.

His teacher glanced at his solution, and knowing that there’s no way little Gauss could have done his school work with such proficiency, immediately declared it wrong. Upon repeated appeal, the teach finally relented and looked up the solution in his solution manual and, bewildered… again told little Gauss that he was WAAAAY off. Unfortunately for our tragic hero, this would not be a modern day retelling of a clever little boy who outsmarted his teacher. No, this is a tragic story of a clever little boy who succumbed to a fatal case of the roundoff bugs.

#### At a Brief Glance

In fact, had his teacher been a little bit more clever, he would have been able to immediately see why Gauss’ solution wouldn’t have worked. It’s immediately obvious that between and , is always positive.

Furthermore, if a function is positive inside an interval, and suppose is also a positive in side the same interval but is everywhere smaller than , then obviously the area under must be bigger than that of . Similarly, if is everywhere larger than , then the area under must also be larger than that of . Now suppose , then because inside , then inside , and

or

and in general

of course can’t be on the order of !

So what went wrong? Well, whenever you’re in trouble, just make a plot!

Hmm. Everything seems to be going fine until around .

It turns out that there was nothing wrong with little Gauss’ method and the integral is perfectly well-behaved. The culprit lies in the fact that can only be represented approximately on his calculator. As we know already, is irrational, and cannot be represented in finite amount of memory. is only represented approximately, slightly perturbed so that to the computer, we’re actually giving them a initial for that small perturbation (think of it as a really really really tiny number). Now, let’s see what Gauss’ calculator is computing once we unravel the recursion (we’ll use the notation to mean the calculated value of on the calculator):

Oh god! No wonder the method produced the wrong answer, the slight perturbation in the computed value of “propagates” throughout the computation and at the step, manifests itself as -factorial times that original perturbation . Even at , we will see around fudged into the calculation. For , the answer will have an additional factor of ! Little Gauss definitely should have learned about round-off errors.

### 3. Some Basics – Errors

Before we dig into the floating point encoding underlying most modern computing platforms, let’s talk about errors. Suppose that we’re computing the value of something and the true value of that computation is “stored” in a variable , but our computation is inexact, and in the end, we shall put that inexact result in a “hatted” version of the known result, that is .

Since there’s this notion of inexactness, we can talk about the “error” of our computation. There are two kinds of errors:

**Absolute Error**

This is just the difference between the true value of the computation and the inexact value . We shall define this as

note that ( subscript ) can be taken to mean the error of the computation of .

**Relative Error**

This the the ratio of the absolute error to the true value of the computation, or in other words

we read to mean the relative error in the computation of .

First, we can’t compute the absolute or relative errors, because if we can, then we would have know the true value of the computation already! Errors are purely analytic objects that can help us determine how well-behaving our computations are. Second, in the analysis of floating point roundoff, we will typically exclusively use relative error. This may seem counter-intuitive, but it has a few nice properties that simplifies error analysis, as we will see.

One more thing to add, if we allow to have any sign, then through some simple algebra, we will find that

#### Error Propagation

Suppose that, through some series of computations, that we have the computed values of and . Now, in the next instruction, we wish to compute the value of . Because, we have already the (potentially inexact) computed values and , then it seems natural that to compute . we would just add :

Now, suppose that and similarly for , then it seems that

now, if we were doing error analysis, then we would want the relative error of , which from our earlier notation we already called . To find this, we merely need to solve the above equation for :

distributing both sides, we get

canceling the from both sides, we end up with

which gives .

Wow, what a mouthful. This defies intuition, as you would expect error to accumulate additively. Of course, this is true of the absolute errors:

but this no longer holds when you consider the relative error.

So why use relative error at all for analysis? It turns out that while convenient here, it becomes less tractable when reasoning about roundoff. Whenever you do an addition operation in floating point, you accumulate a small bit of absolute error from that operation itself! This absolute error unfortunately depends on the value of the computations itself much like relative error does, so sooner or later, you’re going to need to start reasoning about relative error. Worry not, we will develop a systematic formula for reasoning about the propagation of relative error that will boil down to high school level algebra (and some calculus).

Now, we’ve done addition. What about subtraction? You should easily verify for yourself that

where the relative error is defined as

Let’s now derive the propagated relative error of multiplication:

again, solving for boils down to solving the equation above

assuming that neither nor are 0, we can divide out the to get

which tells us that

It is traditional to assume that the relative errors of your computations are small (), for otherwise, there’s no point in computing a value if you can’t even get one digit correct! Therefore, if is tiny, and is tiny, then is insignificant in comparison. Therefore, we typically discard “higher order” terms. In effect, we just say that

I’ll leave it to your to verify that division propagates as

#### Arbitrary Differentiable Function

The propagation schemes we’ve talked about so far are all fine and dandy, but if we already have and we want to compute for some arbitrary but [twice] differentiable function ?

Well, let’s just call with as the argument then!

now, we can do some algebra and get

but we can no longer use our typical algebraic tools to solve the above equation for , since could be anything! Whatever will we do?

If you’ve had a little bit of calculus, then the section heading should probably give the answer away. We’ve already reasoned earlier that we don’t care about second order error terms because they’re so insignificant. But then, we can express as a polynomial in the error term through taylor series centered around !

if is twice differentiable, then is bounded to some constant, then that second order term tends to , which we can disregard (with some skepticism of how nonlinear/curvy is around the neighborhood of ). Using the first order taylor approximation as the right hand side, we can rewrite the above equation as

which, as long as , gives

Now, the restriction might seem a bit unfair, but recall that if , then we won’t have any digits of accuracy, so in effect, the relative error is in fact .

#### Table of Error Propagation

Summarized, suppose that we want to run the computation , but we have inexact computed values and each with relative error and respectively, then the computation will also be inexact, and its relative error will be

### 4. Example

Suppose that we’ve computed with relative error and with no error. Furthermore, suppose that the true value of and . What will be the computed error of ?

We’re looking to compute

Now, we need to figure out a few things:

#### 1.

We can simplify this to , but even then, we’re still going to take a first order taylor expansion to get

Since we’re looking for the relative error, we need to factor out a out to get

Now, we’re just left with

#### 2.

From our table of error propagation above, we see that

so we’re just left with

#### 3.

From our table of error propagation, we see that

which finally simplifies our computed value as

so that the final relative error is . Note that the final relative error isn’t just , because we need to also take into account the error of computing . Also note that we are not working with above. To see this more concretely, we are essentially looking for in the following system

which gives the same solution .

This concludes our brief overview of error propagation. We’ll start on IEEE floating point encoding of rational numbers and how to avoid errors in computer arithmetic next time.