Programming Math

Solving Equations with Newton

by James Taylor on Jun 22, 2011, last updated on Jul 2, 2011

How do we solve \(f(x) = 0\)?

This question is often answered in early mathematics courses with analytical methods such as the quadratic formula. But real equations are very hard to solve exactly. Fortunately, they can often be solved numerically, to any precision one wants.

To give an idea of how to do this, we have the following game below. The goal is to guess where the blue curve will cross the x-axis. Do this by dragging the point A around the circle until the line is lined up the way you want it. Then check your guess. For a little help from calculus, click on the tangent line.

There is a more advanced version of the game that allows for varying the function and starting point. It also allows for exploration of the Secant Method, which uses the previous guess to form a line. You can get an introduction to this version and what one can explore with it with the following videos:

The trick to this game is to set the line to fit the function at the given point as well as we can. The line’s intersection with the x-axis is our next guess. As we repeat, we generally end up with great accuracy very quickly. Notice that the tangent line at first takes it time to find the root, but once it takes to converging, it roughly doubles in accuracy with each iteration, which is just amazing.

One immediate question comes to mind. Why a line? The simple truth is that a line is easy to work with. But is it always appropriate? Not necessarily. Some initial points will behave poorly with a linear approximation. For those, we could use a quadratic function. But it is often easier and simpler to just pick another initial point if the first one does not work out.

Now that we understand the idea of the method, we can write a program to solve nonlinear equations. Our program will need to know how to select the right line, how to solve for its x-intersection, and then how to iterate until certain conditions are met.

The code clicks through to the live version at jsfiddle.net where you can experiment with it.

Selecting a Line

The line we want to use is a line that replicates the function as closely as possible at the point of interest. How do we do that?

In calculus, the line we want is the tangent line. And the derivative of the function is the slope of that line. So if we know calculus, we can compute the derivative of the function and evaluate it at the approximation point to obtain our slope. This is one way. This is Newton’s method.

But can we do this without the derivative rules of calculus? Yes, we can. What we want is for the line to have similar values to the function near the point of approximation. One way to do that is to make sure the line goes through at least two points of the function and it makes sense to make them close to each other. So choose two points on the function that are close to each other and near the point of approximation. The following code does exactly that. This is a finite difference version of Newton’s method.

//a function to compute the data for a line
var line = function(f, a, deltax) {
    var deltay = f(a + deltax) - f(a);
    return [a, f(a), deltax, deltay]
};

//testing the line function with f=x^2-2
l = line(function(x) {return x * x - 2;}, 2, 0.1);
console.log(l); //2,2,0.1,0.41000000000000014

The deltax variable controls how close the two points are, with the first point being \((a, f(a))\). We generally use \(1\times10^{-10}\) as that can generate a sufficiently high precision for our interests. It computes the change in the function values and then returns an array that contains the point on the function, the change in x and the change in y, all the ingredients needed for making a line.

Solving for a Zero in a Line

Given a line, we need to find its zero.

For the slope-point form, we have something of the form \(y = m (x-a) + c \) for the form of the line. Here \(m\) is the slope and \((a,c)\) is a point on the line. Solving the equation by setting \(y=0\), we see that we have \(x = -c/m+a\). Notice that if \(m\) is near zero then \(x\) is likely to be large.

For the point-point form, we replace the division by \(m = \frac{\Delta y}{\Delta x}\) with multiplication by \(\frac{\Delta x}{\Delta y}\). In particular, the formula we will use is \(x = (-c*\Delta x)/\Delta y + a\).

//a function for returning the zero
//apply the output of line to this
var xinter = function(a, c, deltax, deltay) {
    return (-c * deltax) / deltay + a;
};

//testing xinter with above line:
console.log(xinter.apply(null, l)); //1.5121951219512195

This is a direct implementation of the equation. Note the xinter.apply is being used to pass in the returned array from line into xinter as its parameters. Using .apply is not essential, but it seems like a nice technique for implementing mathematical formulas with very little code.

The Iteration

Now we can iterate. We start with a guess, fit a line to the curve, find the intersection of the line with the curve, and then use that as our new guess. The only question that remains is, when do we stop?

We need a notion of what we are seeking. If there is an \(x\) value that does solve this equation, most of the time we would like to be close to that \(x\) value. How do we do that? Well, a common practical method is to look at how close the successive guesses are. If they are within a certain distance, say the chosen precision, then we can hope that we are within that distance of the zero. We might also seek to have a function value for our guess that is very close to zero. Mathematically precise conditions for convergence can be used via Kantorovich’s theorem.

We will assume that we want both an \(x\) and a \(y\) convergence. Therefore, our algorithm will check both conditions and stop when both are achieved.

We will also put in conditions to stop the iteration to prevent it from getting out of hand. We will limit the number of times we will do the loop and also have boundaries for the \(x\)-locations at which the iteration will stop.

var rawNewton = function(guess, f, leftBound,
           rightBound, dx, minXdist, minYdist, exit) {
    var sequencePoints = [[guess, f(guess)]];
    var lineData, prevGuess, y;
    y = f(guess);
    //newton loop
    for (exit; exit > 0; exit -= 1) {
        lineData = line(f, guess, dx);
        prevGuess = guess;
        y = lineData[1];
        guess = xinter.apply(this, lineData);
        if ((guess < leftBound - dx) || (guess > rightBound + dx)) {
            return [false, "Out of Bounds:" + guess,
                       sequencePoints];
        }
        //add (a,f(a))
        sequencePoints.push([guess, f(guess)]);
        if ((Math.abs(guess - prevGuess)<minXdist) &&
            (Math.abs(y) < minYdist)) {
            return [true, guess, sequencePoints];
        }
    }
    //did not meet the required distances
    return [false, "Did not converge",sequencePoints];
};

//let's estimate the sqrt 2
var fun = function (x) {
    return x*x-2;
};

console.log(rawNewton(2, fun, -10, 10, 1e-2, 1e-15, 1e-15, 10));

In the function call, guess begins as the x-location to start at and then it stores the next guess, the previous one being stored in prevGuess to use as a condition with minXdist as to when we are close enough in the x space. f is our function. The leftBound and rightBound we use as our boundaries; if the guess goes beyond them, we terminate the loop. The parameter dx controls how close the second point should be to the first in defining the secant. The parameter minYdist determines when the y-coordinates are small enough to cause a break. Finally, exit is the number of times to do the loop before ending in case no convergence is in sight.

The rawNewton function returns an array whose first entry indicates whether we were successful or not, the second entry is either the last, best approximation or an error message, and, in all cases, the third entry is an array with the history of the guesses in the form of points [a, f(a)].

The code above when run to find the sqrt of 2 computes out 1.414213562373095 as its guess; this is correct for that number of digits which is the maximum JavaScript handles natively. Here is the sequence of guesses:

\(x\)	\(f(x)\)
2	2
1.501246882793004	0.25374220309571127
1.4170169315166223	0.007936984204783837
1.4142261826764102	0.00003569576749118397
1.4142136068911242	1.2591600295763783e-7
1.4142135625299364	4.4361403439552305e-10
1.4142135623736476	1.5627499294623703e-12
1.414213562373097	5.329070518200751e-15
1.4142135623730951	4.440892098500626e-16
1.414213562373095	-4.440892098500626e-16

The last three steps do not increase the accuracy all that much; this is a result purely of the computational environment, namely the number of digits that can be held in memory for a number.

As the last four parameters in the arguments of the function can generally be taken to be the same for most cases, we use a wrapper function called newton to hide away those parameters:

//a wrapper for list of defaults
var newton = function(g, f, lb, rb) {
    return rawNewton.call(this, g, f, lb, rb, 1e-10, 1e-10, 1e-10, 10);
};

Then we can run Newton’s method with a call such as

results = newton(1, function(x) {
    return Math.exp(x) - 10;
}, -10, 10);

This will yield the value of \(\ln(10) \approx 2.30259\).

Finding All Zeros in an Interval

The method above finds a single zero. What if we want all zeros? That is too ambitious for this post and is impossible in a general way though quite doable for, say, polynomial equations. Let’s be a bit more modest. What if we want all zeros in an interval that are at least a certain distance apart from each other?

The interval allows us to focus just on that interval and ignore the infinite reaches outside of the interval. We can do finite, not infinite. The distance apart tells us how to divide the interval up into samples. Again, without this, we would be dealing with the infinite set of real numbers in between any two real numbers, e.g., we have infinitely many zeros for \(\sin (1/x)\) near \(x=0\).

Armed with these parameters and the choice of precision as above, we can now write an algorithm to do this. Start at the left endpoint and add the minimal distance to it. That sets up the little interval in which we will check for a zero. We choose a random guess inside the little interval.

//searches for all roots between start and end,
//using a division increment of xDist
var search = function(f, start, end, xDist) {
    var guess, results, solutions = [];
    var leftx = start;
    while (leftx < end) {
        guess = leftx + Math.random() * xDist;
        results = rawNewton(guess, f, leftx - xDist, leftx + 2 * xDist,
                                          1e-10, 1e-10, 1e-10, 10);
        if (results[0]) {
            solutions.push(results[1]);
        }
        leftx += xDist;
    }
    return solutions;
};

Notice that we pass in as left and right bounds into rawNewton enough to cover the next little intervals. This is to allow for points near the boundary or on the boundary whose iteration might fail otherwise. We will filter the results to deal with repeats.

Even with this rather minimal setup, we need a function that cleans up our results. One problem is that if the solution is an integer, say 7, then it might be returned as 7.00000001. So we need a little function that rounds it off:

var round = function(current, xDist, toDigit) {
    var proposed = Math.round(current * toDigit) / toDigit;
    if (Math.abs(current - proposed) < xDist / 10) {
        return proposed;
    } else {
        return current;
    }
};

We use this rounding in another little function which also compares adjacent entries and discards numbers that are too close to each other. Just in case the method found roots in a non-ordered way, it sorts the array of points first.

var checkNear = function(points, xDist, toDigit) {
    points.sort(function(a, b) {
        return a - b;
    });
    var i, cur, n = points.length,
        prev = round(points[0], xDist, toDigit || 10),
        ret = [prev];
    for (i = 1; i < n; i += 1) {
        cur = round(points[i], xDist, toDigit || 10);
        if (Math.abs(cur - prev) > xDist) {
            ret.push(cur);
        }
        prev = cur;
    }
    return ret;
};

Trying this with \(\sin(\pi x)\) for the interval -10 to 10 yields 21 zeros, namely, each of the integers in that interval.

For the function \(\sin(1/x)\) between -2 and 2 with an interval length of 0.00001, this results in about 500 roots. There are, in fact, infinitely many roots as this function oscillates infinitely often around \(x=0\). We cannot expect to get them all. And one should not expect too much from this method in such highly oscillating functions.

Further Explorations

Does replacing the secant approximation of the tangent line with the tangent line itself improve the possible precision?

How does the Secant method (use previous guess for the second point of the secant) compare to Newton’s method?

What is limiting the precision?

Are there functions for which the derivative is sufficiently a mess that the secant approximation is either more efficient or more accurate?

In choosing the change in x for the secant, how much does that choice impact the number of steps and/or precision of the result and/or convergence of the method?

Can one have Newton’s method bouncing back and forth between different roots for many iterations?

How can we model finding roots if we start near a maxima or a minima? (Hint: Parabolas)

How do we find maxima or minima of a function? (Hint: Parabolas)

How do we find inflection points? (Hint: Cubics)

What happens if there are no zeros or that the function goes to zero at infinity?

Is there a way to be sure about the precision of our approximation?

How do we know how large an interval to search and how finely to search in it, at least with polynomials?

Are there other ways to fit lines that make more sense?

In solving \(f(x) = g(x) \) we could either solve \(f(x) – g(x) =0\) as above or using the linear approximations, find the intersection of the two generated lines and iterate. Is there any difference to these two methods?

Can we find the complex roots of polynomials? (Hint: Leads to the fractal Julia Sets)

Does this work in higher dimensions? When does it even make sense?