Add Me!Close Menu Navigation

Introduction to Scientific Computing: Error Propagation

The first part on a series designed to survey the design and analysis of various numerical methods will look at error propagation.

There was recently a good article on scientific computing, defined loosely as the dark art, as it may have seemed to the uninitiated, of deriving solutions to equations, dynamical systems, or what-not that would have made your Mechanics professor scream in horror at the thought that they will need to somehow “solve” these systems. Of course, in the rich computer world today, almost any problem imaginable (exaggeration of course!) can already be solved by some existing tool. Hence more and more, the focus gets shifted from “how do I solve this differential equation” to “what do I ask google?”

My dad once told me of a glorious time “back in [his] days” when every respectable engineering institution would have a crash course on this dark art on top of Fortran. Of course, my dad is only 43, and that was only 19 years ago.

Even now, when computer science departments everywhere no longer believes in the necessity in forcing all of their graduates to have a basic grasp on numerical analysis, there is still some draw in the subject that either makes people deathly afraid of it or embrace it as their life ambition. I am personally deathly afraid of it. Even then, there are quite many cute gems in the field, and as such, I am still very much so attracted to the field.

Scientific computing is the all encompassing field involving the design and analysis of numerical methods. I intend to start a survey of some of the basic (but also most useful) tools such as methods that: solve linear and nonlinear systems of equations, interpolate data, compute integrals, and solve differential equations. We will often do this on problems for which there exists no “analytical” solution (in terms of the common transcendental functions that we’re all used to).

1. Safety First

In an ideal world, there would be a direct correspondence between numerical algorithms their implementation. Everything would work out of the box and there would be no need to worry that, even if you’ve implemented the on-paper algorithm correctly, it would somehow behave “differently”.

Of course, this isn’t the case. We’ve all heard of the age old saying that computers are finitary, and therefore it cannot represent all real numbers, specifically, there’s no way to represent irrationals, and in most of the cases, we do not fully represent all rationals either. Instead, we round numbers to a certain digit.

On the surface, this doesn’t seem too unfortunate. You probably did the same in your Chemistry lab report, to even more horrendous precision than what your computer will likely do for you. If you aced your Chemistry lab, then this will likely seem like a perfectly good scheme. But if you, like me, were troubled by the “sig fig” rules, then you are probably a little wary.

In fact, more often than not, you will not be bothered by the lack of a full spectrum of real numbers to choose from. However, when these little nasty “roundoff” errors are the culprit, they are often resolved through hours upon hours of debugging and general sense of hopelessness. To inherit a roundoff bug from someone else is like contracting the spanish flu: either you know what you’re doing and your system (with a little bit of agonizing) successfully resolve it, or you just won’t have any idea what you’re going to do (and die in the spanish flu metaphor). Think of this article as a vaccination against the roundoff bugs :)

2. A Modern Day Little Gauss Story

Suppose little Gauss lived in the modern age. little Gauss’ teacher wanted to surf the internet, so he assigned all of his students the following integral to evaluate:

    \[\int_0^1 x^{100} e^{x-1} dx\]

Being the clever alter-ego of the boy who immediately saw \sum^n_k k = {n+1 \choose 2}, little Gauss constructed a sequence

    \[s_k = \int_0^1 x^k e^{x-1} dx\]

and with a little bit of clever manipulation (integration by parts), he found that, using u_k = x^k, dv = e^{x-1}dx, v = e^{x-1}

    \[s_k = \int_0^1 u_kdv = \left.u_kv\right|_0^1 - \int_0^1 vdu_k\]

    \[= \left. x^ke^{x-1}\right|_0^1 - k\int_0^1x^{k-1}e^{x-1}dx\]

    \[= \left. x^ke^{x-1}\right|_0^1 - ks_{k-1}\]

    \[= 1 - ks_{k-1}\]

and

    \[s_0 = \int_0^1 e^{x-1}dx = 1 - 1/e\]

Why, this is a simple recursive function! Little Gauss was absolutely thrilled, he has at his disposal a programmable calculator capable of python (because he’s Gauss, he can have whatever the fuck he wants), and he quickly coded up the recurrence:

Python Code

import math
def s(k):
    if k == 0:
        return 1 - math.exp(-1)
    else:
        return 1 - k*s(k-1)

He verified the first few cases by hand, and upon finding his code satisfactory, he computed the 100th element of the sequence as `-1.1599285429663592e+141` and turned in the first few digits.

His teacher glanced at his solution, and knowing that there’s no way little Gauss could have done his school work with such proficiency, immediately declared it wrong. Upon repeated appeal, the teach finally relented and looked up the solution in his solution manual and, bewildered… again told little Gauss that he was WAAAAY off. Unfortunately for our tragic hero, this would not be a modern day retelling of a clever little boy who outsmarted his teacher. No, this is a tragic story of a clever little boy who succumbed to a fatal case of the roundoff bugs.

At a Brief Glance

In fact, had his teacher been a little bit more clever, he would have been able to immediately see why Gauss’ solution wouldn’t have worked. It’s immediately obvious that between x = 0 and x = 1, x^{100}e^{x-1} is always positive.

Furthermore, if a function f(x) is positive inside an interval, and suppose g(x) is also a positive in side the same interval but is everywhere smaller than f(x), then obviously the area under f(x) must be bigger than that of g(x). Similarly, if h(x) is everywhere larger than f(x), then the area under h(x) must also be larger than that of f(x). Now suppose f(x) = x^{100}e^{x-1}, then because 0 \le e^{x-1} \le 1 inside 0 \le x \le 1, then \frac{x^{100}}{e} \le x^{100}e^{x-1} \le x^{100} inside 0 \le x \le 1, and

    \[\int_0^1 \frac{x^{100}}{e} dx \le \int_0^1 x^{100}e^{x-1} dx \le \int_0^1 x^{100} dx\]

or

    \[\frac{1}{e(100 + 1)} \le s_{100} \le \frac{1}{100 + 1}\]

and in general

    \[\frac{1}{e(k+1)} \le s_k \le \frac{1}{k+1}\]

of course s_{100} can’t be on the order of -10^{141}!

So what went wrong? Well, whenever you’re in trouble, just make a plot!
Little Gauss' Sequence

Hmm. Everything seems to be going fine until around k=17.

It turns out that there was nothing wrong with little Gauss’ method and the integral is perfectly well-behaved. The culprit lies in the fact that s_0 can only be represented approximately on his calculator. As we know already, s_0 = 1 - e^{-1} is irrational, and cannot be represented in finite amount of memory. s_0 is only represented approximately, slightly perturbed so that to the computer, we’re actually giving them a initial \hat s_0 = s_0 + \epsilon for \epsilon that small perturbation (think of it as a really really really tiny number). Now, let’s see what Gauss’ calculator is computing once we unravel the recursion (we’ll use the notation \hat s_k to mean the calculated value of s_k on the calculator):

    \begin{align*} \hat s_0 &= s_0 + \epsilon \\ \hat s_1 &= 1 - \hat s_0 = s_1 - \epsilon \\ \hat s_2 &= 1 - 2\hat s_1 = s_2 + 2\epsilon \\ \hat s_3 &= 1 - 3\hat s_2 = s_3 - 6\epsilon \\ &\vdots\\ \hat s_{19} &= 1 - 19\hat s_{18} = s_{19} - 19!\epsilon \\ \hat s_k &= s_{k} \pm k!\epsilon \end{align*}

Oh god! No wonder the method produced the wrong answer, the slight perturbation in the computed value of \hat s_0 “propagates” throughout the computation and at the k^{th} step, manifests itself as k-factorial times that original perturbation \epsilon. Even at k=19, we will see around 10^{17}\epsilon fudged into the calculation. For k=100, the answer will have an additional factor of 10^{158}\epsilon! Little Gauss definitely should have learned about round-off errors.

3. Some Basics – Errors

Before we dig into the floating point encoding underlying most modern computing platforms, let’s talk about errors. Suppose that we’re computing the value of something and the true value of that computation is “stored” in a variable x, but our computation is inexact, and in the end, we shall put that inexact result in a “hatted” version of the known result, that is \hat{x}.

Since there’s this notion of inexactness, we can talk about the “error” of our computation. There are two kinds of errors:

Absolute Error
This is just the difference between the true value of the computation x and the inexact value \hat{x}. We shall define this as

    \[e_x = | x - \hat{x}|\]

note that e_x (e subscript x) can be taken to mean the error of the computation of x.
Relative Error
This the the ratio of the absolute error to the true value of the computation, or in other words

    \[\delta_x = \left| \frac{x - \hat{x}}{x} \right|\]

we read \delta_x to mean the relative error in the computation of x.

First, we can’t compute the absolute or relative errors, because if we can, then we would have know the true value of the computation already! Errors are purely analytic objects that can help us determine how well-behaving our computations are. Second, in the analysis of floating point roundoff, we will typically exclusively use relative error. This may seem counter-intuitive, but it has a few nice properties that simplifies error analysis, as we will see.

One more thing to add, if we allow \delta_x to have any sign, then through some simple algebra, we will find that

    \begin{align*} \delta_x &= \frac{\hat x - x}{x} \\ x\delta_x &= \hat x - x \\ \hat x &= x + x\delta_x  \\ &= x(1 + \delta_x) \end{align*}

Error Propagation

Suppose that, through some series of computations, that we have the computed values of \hat x and \hat y. Now, in the next instruction, we wish to compute the value of x + y. Because, we have already the (potentially inexact) computed values \hat x and \hat y, then it seems natural that to compute \widehat{x + y}. we would just add \hat x + \hat{y}:

    \[\widehat{x + y} = \hat{x} + \hat{y}\]

Now, suppose that \hat{x} = x(1 + \delta_x) and similarly for \hat{y}, then it seems that

    \[\widehat{x+y} = (x + y)(1 + \delta_{x+y})  = x(1 + \delta_x) + y(1 + \delta_y)\]

now, if we were doing error analysis, then we would want the relative error of \widehat{x + y}, which from our earlier notation we already called \delta_{x+y}. To find this, we merely need to solve the above equation for \delta_{x+y}:

    \[(x + y)(1 + \delta_{x+y})  = x(1 + \delta_x) + y(1 + \delta_y)\]

distributing both sides, we get

    \[x + y + x\delta_{x+y} + y\delta_{x+y} = x + x\delta_x + y + y\delta_y\]

canceling the x+y from both sides, we end up with

    \[\delta_{x+y}(x+y) = x\delta_x + y\delta_y\]

which gives \delta_{x+y} = \frac{x\delta_x + y\delta_y}{x+y}.

Wow, what a mouthful. This defies intuition, as you would expect error to accumulate additively. Of course, this is true of the absolute errors:

    \[\widehat{x+y} = \hat{x} + \hat{y} = x + y + e_x + e_y = (x+y) + e_{x+y}\]

but this no longer holds when you consider the relative error.

So why use relative error at all for analysis? It turns out that while convenient here, it becomes less tractable when reasoning about roundoff. Whenever you do an addition operation in floating point, you accumulate a small bit of absolute error from that operation itself! This absolute error unfortunately depends on the value of the computations itself much like relative error does, so sooner or later, you’re going to need to start reasoning about relative error. Worry not, we will develop a systematic formula for reasoning about the propagation of relative error that will boil down to high school level algebra (and some calculus).

Now, we’ve done addition. What about subtraction? You should easily verify for yourself that

    \[\widehat{x-y} = (x-y)(1 + \delta_{x-y})\]

where the relative error \delta_{x-y} is defined as

    \[\delta_{x-y} = \frac{x\delta_x - y\delta_y}{x-y}\]

Let’s now derive the propagated relative error of multiplication:

    \[\widehat{x \times y} = (x\times y)(1+\delta_{x\times y}) = \hat{x} \times \hat{y} = x(1+\delta_x)\times y(1+\delta_y)\]

again, solving for \delta_{x\times y} boils down to solving the equation above

    \[(x\times y)(1+\delta_{x\times y}) = \hat{x} \times \hat{y} = x(1+\delta_x)\times y(1+\delta_y)\]

assuming that neither x nor y are 0, we can divide out the x\times y to get

    \[1+\delta_{x\times y} = (1+\delta_x)(1+\delta_y) = 1 + \delta_x + \delta_y + \delta_x\delta_y\]

which tells us that

    \[\delta_{x\times y} = \delta_x + \delta_y + \delta_x\delta_y\]

It is traditional to assume that the relative errors of your computations are small (\delta {\small<\!\!<} 1), for otherwise, there’s no point in computing a value if you can’t even get one digit correct! Therefore, if \delta_x is tiny, and \delta_y is tiny, then \delta_x\delta_y is insignificant in comparison. Therefore, we typically discard “higher order” \delta terms. In effect, we just say that

    \[\delta_{x\times y} \approx \delta_x + \delta_y\]

I’ll leave it to your to verify that division propagates as

    \[\delta_{x/y} \approx \delta_x - \delta_y\]

Arbitrary Differentiable Function

The propagation schemes we’ve talked about so far are all fine and dandy, but if we already have \hat x = x(1 + \delta_x) and we want to compute f(x) for some arbitrary but [twice] differentiable function f(\cdot)?

Well, let’s just call f with \hat x as the argument then!

    \[\widehat{f(x)} = f(\hat x)\]

now, we can do some algebra and get

    \[f(x)(1 + \delta_{f(x)}) = f(x + x\delta_x) = f(x + e_x)\]

but we can no longer use our typical algebraic tools to solve the above equation for \delta_{f(x)}, since f(\cdot) could be anything! Whatever will we do?

If you’ve had a little bit of calculus, then the section heading should probably give the answer away. We’ve already reasoned earlier that we don’t care about second order error terms because they’re so insignificant. But then, we can express f(x + e_x) as a polynomial in the error term through taylor series centered around x!

    \[f(x + e_x) = f(x) + f'(x)e_x + O(e_x^2f''(\xi))\]

if f(\cdot) is twice differentiable, then f''(\xi) is bounded to some constant, then that second order term tends to O(e_x^2), which we can disregard (with some skepticism of how nonlinear/curvy f is around the neighborhood of x). Using the first order taylor approximation as the right hand side, we can rewrite the above equation as

    \[f(x) + f(x)\delta_{f(x)} = f(x) + f'(x)(x\delta_x)\]

which, as long as f(x) \ne 0, gives

    \[\delta_{f(x)} =\frac{f'(x)}{f(x)}x\delta_x\]

Now, the f(x)\ne 0 restriction might seem a bit unfair, but recall that if f(x) = 0, then we won’t have any digits of accuracy, so in effect, the relative error is in fact \infty.

Table of Error Propagation

Summarized, suppose that we want to run the computation x \circ y, but we have inexact computed values \hat x and \hat y each with relative error \delta_x and \delta_y respectively, then the computation \widehat{x \circ y} will also be inexact, and its relative error will be

    \[\delta_{x + y} = \frac{x\delta_x + y\delta_y}{x + y}\]

    \[\delta_{x - y} = \frac{x\delta_x - y\delta_y}{x - y}\]

    \[\delta_{x \times y} = \delta_x + \delta_y\]

    \[\delta_{x /y} = \delta_x - \delta_y\]

    \[\delta_{f(x)} = \frac{f'(x)}{f(x)}x\delta_x\]

4. Example

Suppose that we’ve computed \hat x with relative error \delta_x and \hat y with no error. Furthermore, suppose that the true value of x = 1 and y = \frac{\pi}{2}. What will be the computed error of \cos(x  +y)?

We’re looking to compute

    \begin{align*} \widehat{\cos(x+y)} &\approx \cos(\widehat{x+y})(1 + \delta_{\cos(x+y)})\\ &= \cos((x+y)(1 + \delta_{x+y}))(1 + \delta_{\cos(x+y)}) \end{align*}

Now, we need to figure out a few things:

1. \cos((x+y)(1 + \delta_{x+y}))

We can simplify this to \cos(x+y + e_{x+y}), but even then, we’re still going to take a first order taylor expansion to get

    \[\cos(x+y + e_{x+y}) = \cos(x+y) - \sin(x+y)(x+y)\delta_{x+y}\]

Since we’re looking for the relative error, we need to factor out a \cos(x+y) out to get

    \[\cos(x+y)(1 - \tan(x+y)(x+y)\delta_{x+y})\]

Now, we’re just left with

    \[\cos(x+y)\left((1 - \tan(x+y)(x+y)\delta_{x+y})(1 + \delta_{\cos(x+y)})\right)\]

    \[\approx \cos(x+y)\left(1 - \tan(x+y)(x+y)\delta_{x+y} + \delta_{\cos(x+y)}\right)\]

2. \delta_{\cos(x+y)}

From our table of error propagation above, we see that

    \[\delta_{\cos(x+y)} \approx -\tan(x+y)(x+y)\delta_{x+y}\]

so we’re just left with

    \[\widehat{\cos(x+y)} \approx \cos(x+y)\left(1 - 2\tan(x+y)(x+y)\delta_{x+y} \right)\]

3. \delta_{x+y}

From our table of error propagation, we see that

    \[\delta_{x+y} = \frac{x\delta_x + y\times 0}{x + y} = \frac{x}{x+y} \delta_x\]

which finally simplifies our computed value as

    \begin{align*} \widehat{\cos(x+y)} &\approx \cos(x+y)\left(1 - 2\tan(x+y)(x+y) \frac{x}{x+y} \delta_x \right)\\ &= \cos(x+y)\left(1 - 2\tan(x+y)x \delta_x \right)\\ &= \cos(x+y)(1 + 2\cot(1)\delta_x) \end{align*}

so that the final relative error is \delta = 2\cot(1)\delta_x. Note that the final relative error isn’t just \delta_{\cos(x+y)}, because we need to also take into account the error of computing \widehat{x+y}. Also note that we are not working with \delta_{\cos(\widehat{x+y})} above. To see this more concretely, we are essentially looking for \delta_{t_2} in the following system

    \begin{align*} \hat t_1 &= \widehat{x+y} \\ \hat t_2 &= \cos(\hat{t_1}) \end{align*}

which gives the same solution 2\cot(1)\delta_x.

This concludes our brief overview of error propagation. We’ll start on IEEE floating point encoding of rational numbers and how to avoid errors in computer arithmetic next time.

Posted By Lee

Woosh

  • Matt

    Thanks for the interesting writeup.

    Would you mind extending the arbitrary differentiable function case to multivariate scalar-valued functions? I tried to do so but when I applied it to f(x,y) = x+y, I got d(x+y) = (xdx + ydy)*||v||/(x+y) where v = (x,y) instead of your result of d(x+y) = (xdx + ydy)/(x+y).

    As far as I can see, this is because I defined dv = (v* – v)/||v|| in the multivariate case.

    Thanks again!
    Matt

    • http://phailed.me/ Phailure

      Hey Matt,

      Thanks for commenting. It turns out that you’re using a different metric to measure the relative error than what I am.

      Check out the derivation http://mathbin.net/188291
      which should get to the same expression for $delta_{x+y}$.

  • Shammah

    This was a great article, but I have one little remark. Somewhere in the middle of the article you say “if we allow d_x to have any sign”, probably too ease the calculation. This assumption was than kept for the entirety of the article when it really shouldn’t have. Let’s assume f(x) = cos x, then f'(x) = – sin x, which gives us an error d_f = – tan x * d_x. If we assume d_x to be positive, we would get a negative error which makes no sense at all. If we assume d_x to be negative, which makes no sense as well, we get a positive error. Either way, introducing negative errors makes no sense, and thus all errors, absolute or relative, should be positive values. That means that the last crucial step which you neglect would be to take the absolute value of the error, d_x = |d_x|.