Newton's method: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Jitse Niesen
(→‎History: remove unnecessary links, use same notation as in the rest of the article)
imported>Caesar Schinas
m (Bot: Update image code)
 
(19 intermediate revisions by 5 users not shown)
Line 1: Line 1:
'''Newton's method''', also called the '''Newton-Raphson method''' or '''Newton iteration''', is a [[root-finding algorithm]]; that is, a method for finding where a function obtains the value zero.
{{subpages}}


==Description==
'''Newton's method''', also called the '''Newton-Raphson method''', is a numerical [[root-finding algorithm]]: a method for finding where a function obtains the value zero, or in other words, solving the equation <math>f(x) = 0</math>. Most root-finding algorithms used in practice are variations of Newton's method.


Newton's method is based on the insight that any smooth function ''f'' can be approximated locally by its tangent. If ''r'' is a root of ''f'', the tangent of ''f'' at any point close to ''r'' is a good approximation of ''f'', and hence the point where the tangent intercepts the ''x''-axis is a good approximation of ''r''. This suggests that if we know an approximation ''x''<sub>''k''</sub> for ''r'', we obtain an even better approximation ''x''<sub>''k''+1</sub> from the root of the tangent line that goes through (''x'', ''f''(''x'')). Expressing the equation for the tangent in the terms of <math>f'</math>, the derivative of ''f'', gives the solution
==Description of the algorithm==


:<math>x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}.</math>
Suppose the function <math>f(x)</math> has a root at <math>x = r</math>. The idea behind Newton's method is that, if <math>f(x)</math> is a smooth function, its graph can be approximated around a point <math>x = x_0</math> by its [[tangent]] at <math>x_0</math>. If the approximation is good enough, the point where the tangent crosses the <math>x</math>-axis must lie close to <math>r</math>. This suggests that if we choose the point <math>x_0</math> to lie somewhere close to <math>r</math>, the point where the tangent crosses the <math>x</math>-axis &mdash; we may call that point <math>x_1</math> &mdash; will lie even closer to <math>r</math>. The idea is illustrated in the following diagram:
 
[[Image:Newton's method.png|center|]]
 
Now suppose that <math>x_1</math> really does lie closer to <math>r</math> than <math>x_0</math>. Then we can determine the tangent of <math>f(x)</math> at <math>x = x_1</math> to obtain a second point <math>x_2</math> that lies closer still. More generally, given <math>x_k</math>, we can obtain a better approximation <math>x_{k+1}</math>.


The calculation of an improved approximation with this formula is called a ''Newton step'' or ''Newton update''. Newton's method consists of repeating this step for ''k'' = 0, 1, 2, ... to obtain successively better estimates ''x''<sub>1</sub>, ''x''<sub>2</sub>, ''x''<sub>3</sub>, ... for ''r''. Provided that the initial guess ''x''<sub>0</sub> is sufficiently close to ''r'' and that ''f'' is sufficiently well-behaved, the estimates will converge rapidly to ''r'' as ''k'' goes to infinity.
Newton's method hinges on the fact that we can calculate the root <math>x_{k+1}</math> of a tangent line directly from its straight-line equation <math>y = ax+b</math>, and that we can calculate <math>a</math> and <math>b</math> in terms of the function value <math>f(x_k)</math> and the function's [[derivative]] at the same point, <math>f'(x_k)\,\!</math>. Put together, the relation between <math>x_k</math> and <math>x_{k+1}</math> can be expressed as


[[Image:Newton's method.png|frame|center|Illustration of a single step with Newton's method. The function ''f'' has a root at ''x'' = ''r''. We start with an initial guess ''x''<sub>0</sub> and calculate the tangent line of ''f'' in that point. The root of the tangent, ''x''<sub>1</sub>, is a better approximation of ''r'' than the initial guess.]]
:<math>x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}.</math>


==Example: calculating the golden ratio==
This formula defines Newton's method. Using it to obtain an improved approximation is called to perform a ''Newton step'', ''Newton update'' or ''Newton iteration''. Newton's method consists of repeatedly using this formula to obtain a sequence of successively better estimates <math>x_1, x_2, x_3, ...</math> for <math>r</math>.


The [[golden ratio]] (&phi; &asymp; 1.618) is the largest root of the polynomial <math>f(x) = x^2 - x - 1</math>. To calculate this root, we can use the Newton iteration
Let us illustrate Newton's method with a concrete numerical example. The [[golden ratio]] (&phi; &asymp; 1.618) is the largest root of the polynomial <math>f(x) = x^2 - x - 1</math>; to calculate this root, we can use the Newton iteration


:<math>x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)} = x_k - \frac{{x_k}^2 - x_k - 1}{2x_k-1} =  \frac{x_k^2 + 1}{2x_k - 1}</math>
:<math>x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)} = x_k - \frac{{x_k}^2 - x_k - 1}{2x_k-1} =  \frac{x_k^2 + 1}{2x_k - 1}</math>


with the initial estimate ''x''<sub>0</sub> = 1. Using double-precision floating-point arithmetic, which allows a precision of roughly 16 digits, we obtain the following approximations:
with the initial estimate <math>x_0=1</math>. Using double-precision [[floating-point arithmetic]], which amounts to a precision of roughly 16 decimal digits, Newton's method produces the following sequence of approximations:


:''x''<sub>0</sub> = 1.0
:''x''<sub>0</sub> = 1.0
Line 29: Line 33:
:''x''<sub>8</sub> = 1.6180339887498949
:''x''<sub>8</sub> = 1.6180339887498949


Since the last update does not change the value, we can be reasonably sure to have obtained a value for the golden ratio that is correct to the available precision. It can be seen that each iteration roughly doubles the number of correct digits. We say that the rate of convergence is ''quadratic''. Newton's method typically &mdash; but not always &mdash; achieves such fast convergence.
Since the last iteration does not change the value, we can be reasonably sure to have obtained a value for the golden ratio that is correct to the precision used.


==Convergence analysis==
==Convergence analysis==


Suppose we have a function ''f'' with ''f''(''r'') = 0. To analyze the convergence properties of Newton's method, we can use the [[Taylor series]] of ''f'' around ''x''<sub>''k''</sub>. This gives
<!--[[Image:Newton's method sine.png|thumb|250px|Outcome of Newton's method applied to finding a root of the sine function. Initial values near 0 (blue) lead to 0 while initial values near <math>\pi</math> (green) lead to <math>\pi</math>. Values close to <math>\pi/2</math> may lead anywhere on the real line; the initial value <math>x_0 = 1.475</math> (red) depicted here leads to <math>-3\pi</math>.]] -->
 
In the description above, we relied on geometrical intuition to argue that Newton's method ought to produce a sequence of points <math>x_k</math> that [[convergence|converge]] to the root <math>r</math>: that is, using the language of [[limit (mathematics)|limit]]s, we should be able to expect that


:<math>f(r) = 0 = f(x_k) + f'(x_k)(r-x_k) + \frac{1}{2} f''(\xi_k)(r-x_k)^2</math>
:<math>\lim_{k\to\infty} |x_k - r| = 0.</math>


where <math>\xi_k \in [r, x_k]</math>. Assuming <math>f'(x_k) \neq 0</math>, dividing through gives
We will show that this indeed is the case, under suitable conditions. In fact, we will be able to give a formula for how quickly the <math>x_k</math>'s approach <math>r</math>.


:<math>0 = \frac{f(x_k)}{f'(x_k)} - x_k + r + \frac{1}{2} \frac{f''(\xi_k)}{f'(x_k)}(r-x_k)^2.</math>
There is an important catch, however. In the geometrical description, we made the assumption that <math>f(x)</math> closely resembles a straight line at least ''locally'' (close to the root), and this assumption is in fact essential to the functioning of Newton's method. Most functions do not ''globally'' resemble lines: in general, function graphs contain curvature, [[stationary point]]s, [[inflection point]]s, and even [[discontinuity|discontinuities]]. Does Newton's method work in these cases?


Moving terms to the left hand side and substituting the Newton update expression for ''x''<sub>''k''+1</sub> then gives
The answer is ''maybe''. If <math>f(x)</math> is ill-behaved or <math>x_0</math> lies very far from <math>r</math>, Newton's method may produce a sequence of numbers that jump back and forth erratically or even gradually move away from the root. If we are lucky, a value may eventually end up close to a root by chance. But if we are unlucky, Newton's method will fail to converge at all. For example, if Newton's method is used to solve the equation <math>x^3 - 5x = 0</math> with the initial guess <math>x_0 = 1</math>, it will produce the endlessly repeating sequence <math>1, -1, 1, -1, 1, ...</math>, as shown in the figure below:


:<math>x_{k+1} - r = \frac{1}{2} \frac{f''(\xi_k)}{f'(x_k)}(r-x_k)^2</math>
{{Image|Newton's method cycle.png|center|300px|}}


and a final division shows that
It may also happen that Newton's method converges to a root, but the "wrong" one. For example, if we attempt to calculate <math>\pi</math> as the smallest positive solution to <math>\sin(x) = 0</math>, we must choose an initial value close to 3.14. Newton's method started near 0 would converge to 0; if started near 6.28, it would converge to <math>2\pi</math>, and so on. If we start with <math>x_0</math> close to <math>\pi/2</math>, where the sine curve is horizontal, <math>x_1</math> may end up close to <math>n \pi</math> for some enormous <math>n</math>.


:<math>\frac{|x_{k+1}-r|}{|x_k-r|^2} = \frac{1}{2} \left| \frac{f''(\xi_k)}{f'(x_k)} \right|.</math>
But let us assume that the root <math>r</math> and the initial guess <math>x_0</math> both lie in a region where <math>f(x)</math> is well-behaved. In particular, we assume that <math>f(x)</math> is a repeatedly [[differentiable function|differentiable]] function in this region. Expanding <math>f(x)</math> in a [[Taylor series]] around <math>x = x_k</math> gives


Hence, the rate of convergence is at least quadratic. However, if <math>f'(r) = 0</math>, meaning that ''f'' has a double root, it can be shown that the convergence is only linear.
:<math>f(r) = 0 = f(x_k) + f'(x_k)(r-x_k) + \frac{1}{2} f''(\xi_k)(r-x_k)^2</math>


* ''More detail needed''
for some <math>\xi_k \in [r, x_k]</math>. Assuming <math>f'(x_k) \neq 0</math>, dividing through gives


==Examples of non-quadratic convergence==
:<math>0 = \frac{f(x_k)}{f'(x_k)} - x_k + r + \frac{1}{2} \frac{f''(\xi_k)}{f'(x_k)}(r-x_k)^2.</math>


* ''Insert some example of linear convergence and/or convergence failure here''
Moving terms to the left hand side and substituting the Newton update expression for <math>x_{k+1}</math> then gives


In rare cases, Newton's method may converge faster than quadratically. One such situation appears if we attempt to calculate &pi; as the smallest positive number ''x'' for which sin ''x'' = 0. This equation leads to the Newton iteration
:<math>|x_{k+1}-r| = \frac{1}{2} \left| \frac{f''(\xi_k)}{f'(x_k)} \right| |x_k-r|^2.</math>


:<math>x_{k+1} = x_k - \frac{\sin x_k}{\cos x_k} = x_k - \tan{x_k}.</math>
If the second derivative is bounded and the first derivative does not approach 0, the error at step <math>k+1</math> is hence roughly proportional to the square of the error at step <math>k</math>. Under these conditions, if for any <math>k</math> we have that <math>|x_k-r| < 1</math> and <math>|f''(\xi_n)/f'(\xi_n)|</math> is small enough for <math>n \ge k</math>, subsequent iterations will reduce the error  least quadratically (or at a ''second-order'' rate).


Knowing that &pi; is a little larger than 3, we may start with the initial value ''x''<sub>0</sub> = 3. Three iterations give a value of &pi; that is correct to full double precision:
In other words, if the error at step <math>k</math> is <math>10^{-ck}</math>, the error at the next step is roughly <math>(10^{-ck})^2 = 10^{-2ck}</math>: each step roughly ''doubles'' the number of correct digits. This fast rate of convergence is characteristic for Newton's method. Looking back at the example of calculating the golden ratio, we can verify that the rate of convergence is quadratic in practice: the number of correct digits after the decimal point is 0, 0, 1, 2, 5, 12, and thereafter the convergence is limited by the finite precision which we used to carry out the arithmetic.


:''x''<sub>0</sub> = 3.0
The second-order or quadratic convergence depends on <math>|f''(\xi_k)/f'(\xi_k)|</math> being small. There are two ways in which it may not be. The first possibility is that the second derivative in the numerator is large, which simply means that the function graph has a large amount of curvature and hence is poorly approximated by its tangent &mdash; we have already illustrated with examples how this may cause trouble.
:''x''<sub>1</sub> = 3.1425465430742778
:''x''<sub>2</sub> = 3.141592653300477
:''x''<sub>3</sub> = 3.1415926535897931


In this case, the convergence is ''cubic''; that is, every iteration roughly triples the number of correct digits.
The second possibility is that the first derivative in the denominator is very small; that is, if the tangent is nearly horizontal. In particular, the derivative becomes arbitrarily small if <math>f'(r) = 0</math>. This is an important case: it corresponds to <math>f(x)</math> having a [[multiple root]] at <math>x = r</math>. Newton's method can be proved to still converge if the root is a multiple root, but the rate of convergence in that case is merely ''linear'': each iteration roughly increments, rather than multiplies, the number of correct digits by a fixed amount.


It is important to note that the choice of initial value determined the outcome of this calculation. If we had started with a number too close to 0, the iteration would have converged to 0. More generally, if we had started with an approximation close to ''n''&pi; for some integer ''n'' other than ''n'' = 1, the iteration would have converged to ''n''&pi;. If we had started with a value ''x''<sub>0</sub> close to (''n'' + 1/2)&pi;, tan ''x''<sub>0</sub> would have been a very large number and ''x''<sub>1</sub> would have turned out far away from the origin.
==Applications==
So far we have only considered solving the equation <math>f(x) = 0</math> in the real numbers, but Newton's method can be applied or generalized to various more complicated problems. To begin with, it can of course be used to solve the more general equation <math>f(x) = g(x)</math> by rewriting it as <math>f(x) - g(x) = 0</math>. It can also be used to calculate [[inverse function]]s. If <math>f</math> and <math>f^{-1}</math> are inverse functions of each  other, calculating <math>x = f^{-1}(y)</math> amounts to finding an <math>x</math> that solves the equation


==Complex functions==
:<math>f(x) - y = 0.</math>


Leads to Newton fractals
In many cases, Newton's method is a good choice for solving this equation. For example, the well known and very efficient [[Babylonian method]] for calculating [[square root]]s is equivalent to using Newton's method to solve the equation <math>x^2 - y = 0</math> for <math>x</math> (<math>x^2</math> being the inverse function of <math>\sqrt x</math>).


==Newton's method as an optimization algorithm==
Newton's method also works in the [[complex number]]s; for example, it can be used to find complex roots of polynomials. Its behavior for complex [[analytic function]]s is similar to that for differentiable real functions; in particular, the rate of convergence is quadratic sufficiently close to a root. In the case of a polynomial with real coefficients, or any functions that only assumes real values for real input, the initial value must however be complex, or else the values generated by Newton's method never leave the real line. When Newton's method is applied to a polynomial with complex roots, the region of convergence around each root has a complicated boundary, which defines a [[fractal]] called a [[Newton fractal]].


Instead iterate
Newton's method can also be used as an [[optimization algorithm]], to determine where a function obtains a local minimum or maximum. This amounts to solving the equation <math>f'(x)=0\,\!</math>, which gives the Newton iteration


:<math>x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)}.</math>
:<math>x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)}.</math>


==Multidimensional version==
==Variations and hybrid methods==
 
''Gauss-Newton's method''
 
==Variations and practical implementation concerns==
 
===Damped, bracketed and hybrid methods===
 
The fact that Newton's method may converge slowly or fail to converge for functions with horizontal tangents, or when given poor initial estimates, makes it unsuitable in raw form as a general-purpose root-finding algorithm. It is therefore typically combined with some mechanism for detecting and correcting convergence failure.
 
* ''Bracketed'', hybrid


Newton's method can be combined with a slower but safer algorithm, such as the [[bisection algorithm]].
The fact that Newton's method may converge slowly or fail to converge for functions with horizontal tangents, or when given poor initial values, makes it unsuitable in raw form as a general-purpose root-finding algorithm. It is therefore typically combined with some mechanism for detecting and correcting convergence failure.


limit number of iterations
One way to detect failure is count the number of iterations and break if the number exceeds some specified limit. Another technique is ''bracketing'': the root is supplied along with known bounds for its location. Failure is signaled if the Newton iteration produces a value that lies outside the bounds.


* ''Damped''
[[Image:Newton's method damped.png|thumb|250px|Newton step with 50% damping.]]


===Quasi-Newton methods===
In case of failure, it may be possible to recover by switching to a slower but safer algorithm, such as the [[bisection algorithm]]. After taking a few steps with the safe method, one can switch back to Newton's method and try again. This is called a ''hybrid method''. Another option is the ''damped'' Newton's method, in which the derivative is multiplied by a damping factor <math>\alpha</math>, with <math>0 < \alpha < 1</math>.


Newton's method requires that the derivative of the object function be known, but in some situations the derivative or Jacobian may be unavailable or prohibitively expensive to calculate. The cost can be higher still when Newton's method is used as an optimization algorithm, in which case the second derivative or Hessian is also needed.
Newton's method requires that the derivative of the object function be known, but in some situations the derivative or Jacobian may be unavailable or prohibitively expensive to calculate. The cost can be higher still when Newton's method is used as an optimization algorithm, in which case the second derivative or Hessian is also needed.
Line 108: Line 102:
Modifications of Newton's method can also lead to more specialized algorithms, such as the [[Durand-Kerner method]] which is used to find simultaneous roots of a polynomial.
Modifications of Newton's method can also lead to more specialized algorithms, such as the [[Durand-Kerner method]] which is used to find simultaneous roots of a polynomial.


==Calculating inverse functions==
==Computational complexity==
Using Newton's method as described above, the time complexity of finding a simple root of a function ''f'' with ''n''-digit precision is <math>O((\log n) F(n))</math> where ''F'' is the cost of calculating <math>f/f'</math> with ''n''-digit precision. If ''f'' can be evaluated with variable precision, the algorithm can be improved. Because of the "self-correcting" nature of Newton's method, it is only necessary to use ''m''-digit precision at a step where the approximation has ''m''-digit accuracy. Hence, the first iteration can be performed with a precision twice as high as the accuracy of ''x''<sub>0</sub>, the second iteration with a precision four times as high, and so on. If the precision levels are chosen suitably, only the final iteration requires ''f'' and its derivative to be evaluated at full ''n''-digit precision. Provided that ''F'' grows superlinearly, which is the case in practice, the cost of finding a root is thus only <math>O(F(n))</math>.
Using Newton's method as described above, the [[time complexity]] of calculating a root of a function <math>f(x)</math> with <math>n</math>-digit precision, provided that a good initial approximation is known, is <math>O((\log n) F(n))</math> where <math>F(n)</math> is the cost of calculating <math>f(x)/f'(x)\,</math> with <math>n</math>-digit precision.
 
Root-finding is essentially the same thing as calculating an [[inverse function]]. Hence, due to the aforementioned result, two differentiable functions that are inverse functions of each other have equivalent computational complexity and can be calculated in terms of each other in practice using Newton's method. In particular, it can be shown that:


* The exponential function, the natural logarithm, the trigonometric functions, and the inverse trigonometric functions, are all equivalent
If <math>f(x)</math> can be evaluated with variable precision, the algorithm can be improved. Because of the "self-correcting" nature of Newton's method, meaning that it is unaffected by small perturbations once it has reached the stage of quadratic convergence, it is only necessary to use <math>m</math>-digit precision at a step where the approximation has <math>m</math>-digit accuracy. Hence, the first iteration can be performed with a precision twice as high as the accuracy of <math>x_0</math>, the second iteration with a precision four times as high, and so on. If the precision levels are chosen suitably, only the final iteration requires <math>f(x)/f'(x)\,</math> to be evaluated at full <math>n</math>-digit precision. Provided that <math>F(n)</math> grows superlinearly, which is the case in practice, the cost of finding a root is therefore only <math>O(F(n))</math>, with a constant factor close to unity.
* The algebraic operations of multiplication, division, and square root extraction are equivalent


During the second half of the 20th century, very efficient algorithms were found for multiplying large numbers and calculating high-precision natural logarithms with the aid of computers. Combined with Newton's method, all of the functions listed above can be calculated with this efficiency.
This property of Newton's method makes it an essential tool for [[arbitrary-precision arithmetic]]. Division of large numbers is a good example. Multiplication of large numbers can be done efficiently using the [[fast Fourier transform]], but there is no corresponding fast algorithm for division. However, division is the inverse operation of multiplication, and there is a Newton iteration for finding this inverse that uses only multiplication and subtraction. Therefore, by means of Newton's method, division can be performed nearly as fast as multiplication (in practice, about three times slower). Likewise, the time cost for computing square roots (as the inverse of <math>x \cdot x</math>) with Newton's method is proportional to that of multiplication.


==History==
==History==
Line 127: Line 118:
* Michael T. Heath (2002), ''Scientific Computing: An Introductory Survey'', Second Edition, McGraw-Hill
* Michael T. Heath (2002), ''Scientific Computing: An Introductory Survey'', Second Edition, McGraw-Hill
* Jonathan M. Borwein & Peter B. Borwein (1987), ''Pi and the AGM: A Study in Analytic Number Theory and Computational Complexity'', Wiley Interscience
* Jonathan M. Borwein & Peter B. Borwein (1987), ''Pi and the AGM: A Study in Analytic Number Theory and Computational Complexity'', Wiley Interscience
* Tjalling J. Ypma (1995), Historical development of the Newton-Raphson method, ''SIAM Review'' '''37''' (4), 531&ndash;551.
* Tjalling J. Ypma (1995), Historical development of the Newton-Raphson method, ''SIAM Review'' '''37''' (4), 531&ndash;551. [http://links.jstor.org/sici?sici=0036-1445(199512)37%3A4%3C531%3AHDOTNM%3E2.0.CO%3B2-4 online at JSTOR]
 
[[Category:CZ Live]]
[[Category:Mathematics Workgroup]]

Latest revision as of 11:40, 11 June 2009

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Newton's method, also called the Newton-Raphson method, is a numerical root-finding algorithm: a method for finding where a function obtains the value zero, or in other words, solving the equation . Most root-finding algorithms used in practice are variations of Newton's method.

Description of the algorithm

Suppose the function has a root at . The idea behind Newton's method is that, if is a smooth function, its graph can be approximated around a point by its tangent at . If the approximation is good enough, the point where the tangent crosses the -axis must lie close to . This suggests that if we choose the point to lie somewhere close to , the point where the tangent crosses the -axis — we may call that point — will lie even closer to . The idea is illustrated in the following diagram:

Newton's method.png

Now suppose that really does lie closer to than . Then we can determine the tangent of at to obtain a second point that lies closer still. More generally, given , we can obtain a better approximation .

Newton's method hinges on the fact that we can calculate the root of a tangent line directly from its straight-line equation , and that we can calculate and in terms of the function value and the function's derivative at the same point, . Put together, the relation between and can be expressed as

This formula defines Newton's method. Using it to obtain an improved approximation is called to perform a Newton step, Newton update or Newton iteration. Newton's method consists of repeatedly using this formula to obtain a sequence of successively better estimates for .

Let us illustrate Newton's method with a concrete numerical example. The golden ratio (φ ≈ 1.618) is the largest root of the polynomial ; to calculate this root, we can use the Newton iteration

with the initial estimate . Using double-precision floating-point arithmetic, which amounts to a precision of roughly 16 decimal digits, Newton's method produces the following sequence of approximations:

x0 = 1.0
x1 = 2.0
x2 = 1.6666666666666667
x3 = 1.6190476190476191
x4 = 1.6180344478216817
x5 = 1.618033988749989
x6 = 1.6180339887498947
x7 = 1.6180339887498949
x8 = 1.6180339887498949

Since the last iteration does not change the value, we can be reasonably sure to have obtained a value for the golden ratio that is correct to the precision used.

Convergence analysis

In the description above, we relied on geometrical intuition to argue that Newton's method ought to produce a sequence of points that converge to the root : that is, using the language of limits, we should be able to expect that

We will show that this indeed is the case, under suitable conditions. In fact, we will be able to give a formula for how quickly the 's approach .

There is an important catch, however. In the geometrical description, we made the assumption that closely resembles a straight line at least locally (close to the root), and this assumption is in fact essential to the functioning of Newton's method. Most functions do not globally resemble lines: in general, function graphs contain curvature, stationary points, inflection points, and even discontinuities. Does Newton's method work in these cases?

The answer is maybe. If is ill-behaved or lies very far from , Newton's method may produce a sequence of numbers that jump back and forth erratically or even gradually move away from the root. If we are lucky, a value may eventually end up close to a root by chance. But if we are unlucky, Newton's method will fail to converge at all. For example, if Newton's method is used to solve the equation with the initial guess , it will produce the endlessly repeating sequence , as shown in the figure below:

Newton's method cycle.png

It may also happen that Newton's method converges to a root, but the "wrong" one. For example, if we attempt to calculate as the smallest positive solution to , we must choose an initial value close to 3.14. Newton's method started near 0 would converge to 0; if started near 6.28, it would converge to , and so on. If we start with close to , where the sine curve is horizontal, may end up close to for some enormous .

But let us assume that the root and the initial guess both lie in a region where is well-behaved. In particular, we assume that is a repeatedly differentiable function in this region. Expanding in a Taylor series around gives

for some . Assuming , dividing through gives

Moving terms to the left hand side and substituting the Newton update expression for then gives

If the second derivative is bounded and the first derivative does not approach 0, the error at step is hence roughly proportional to the square of the error at step . Under these conditions, if for any we have that and is small enough for , subsequent iterations will reduce the error least quadratically (or at a second-order rate).

In other words, if the error at step is , the error at the next step is roughly : each step roughly doubles the number of correct digits. This fast rate of convergence is characteristic for Newton's method. Looking back at the example of calculating the golden ratio, we can verify that the rate of convergence is quadratic in practice: the number of correct digits after the decimal point is 0, 0, 1, 2, 5, 12, and thereafter the convergence is limited by the finite precision which we used to carry out the arithmetic.

The second-order or quadratic convergence depends on being small. There are two ways in which it may not be. The first possibility is that the second derivative in the numerator is large, which simply means that the function graph has a large amount of curvature and hence is poorly approximated by its tangent — we have already illustrated with examples how this may cause trouble.

The second possibility is that the first derivative in the denominator is very small; that is, if the tangent is nearly horizontal. In particular, the derivative becomes arbitrarily small if . This is an important case: it corresponds to having a multiple root at . Newton's method can be proved to still converge if the root is a multiple root, but the rate of convergence in that case is merely linear: each iteration roughly increments, rather than multiplies, the number of correct digits by a fixed amount.

Applications

So far we have only considered solving the equation in the real numbers, but Newton's method can be applied or generalized to various more complicated problems. To begin with, it can of course be used to solve the more general equation by rewriting it as . It can also be used to calculate inverse functions. If and are inverse functions of each other, calculating amounts to finding an that solves the equation

In many cases, Newton's method is a good choice for solving this equation. For example, the well known and very efficient Babylonian method for calculating square roots is equivalent to using Newton's method to solve the equation for ( being the inverse function of ).

Newton's method also works in the complex numbers; for example, it can be used to find complex roots of polynomials. Its behavior for complex analytic functions is similar to that for differentiable real functions; in particular, the rate of convergence is quadratic sufficiently close to a root. In the case of a polynomial with real coefficients, or any functions that only assumes real values for real input, the initial value must however be complex, or else the values generated by Newton's method never leave the real line. When Newton's method is applied to a polynomial with complex roots, the region of convergence around each root has a complicated boundary, which defines a fractal called a Newton fractal.

Newton's method can also be used as an optimization algorithm, to determine where a function obtains a local minimum or maximum. This amounts to solving the equation , which gives the Newton iteration

Variations and hybrid methods

The fact that Newton's method may converge slowly or fail to converge for functions with horizontal tangents, or when given poor initial values, makes it unsuitable in raw form as a general-purpose root-finding algorithm. It is therefore typically combined with some mechanism for detecting and correcting convergence failure.

One way to detect failure is count the number of iterations and break if the number exceeds some specified limit. Another technique is bracketing: the root is supplied along with known bounds for its location. Failure is signaled if the Newton iteration produces a value that lies outside the bounds.

Newton step with 50% damping.

In case of failure, it may be possible to recover by switching to a slower but safer algorithm, such as the bisection algorithm. After taking a few steps with the safe method, one can switch back to Newton's method and try again. This is called a hybrid method. Another option is the damped Newton's method, in which the derivative is multiplied by a damping factor , with .

Newton's method requires that the derivative of the object function be known, but in some situations the derivative or Jacobian may be unavailable or prohibitively expensive to calculate. The cost can be higher still when Newton's method is used as an optimization algorithm, in which case the second derivative or Hessian is also needed.

An alternative in these situations is to use an approximation of the derivative or second-derivative, which leads to so-called quasi-Newton methods. The most common strategy is to use the function values from two successive iterations to calculate a finite difference approximation for the derivative. This is equivalent to the secant method and reduces the rate of convergence from 2 to 1.618 (more precisely, the golden ratio). An alternative is to compute a value for the derivative or second derivative accurately, but reusing the same value for several successive iterations.

Modifications of Newton's method can also lead to more specialized algorithms, such as the Durand-Kerner method which is used to find simultaneous roots of a polynomial.

Computational complexity

Using Newton's method as described above, the time complexity of calculating a root of a function with -digit precision, provided that a good initial approximation is known, is where is the cost of calculating with -digit precision.

If can be evaluated with variable precision, the algorithm can be improved. Because of the "self-correcting" nature of Newton's method, meaning that it is unaffected by small perturbations once it has reached the stage of quadratic convergence, it is only necessary to use -digit precision at a step where the approximation has -digit accuracy. Hence, the first iteration can be performed with a precision twice as high as the accuracy of , the second iteration with a precision four times as high, and so on. If the precision levels are chosen suitably, only the final iteration requires to be evaluated at full -digit precision. Provided that grows superlinearly, which is the case in practice, the cost of finding a root is therefore only , with a constant factor close to unity.

This property of Newton's method makes it an essential tool for arbitrary-precision arithmetic. Division of large numbers is a good example. Multiplication of large numbers can be done efficiently using the fast Fourier transform, but there is no corresponding fast algorithm for division. However, division is the inverse operation of multiplication, and there is a Newton iteration for finding this inverse that uses only multiplication and subtraction. Therefore, by means of Newton's method, division can be performed nearly as fast as multiplication (in practice, about three times slower). Likewise, the time cost for computing square roots (as the inverse of ) with Newton's method is proportional to that of multiplication.

History

Newton's method was described by Isaac Newton in De analysi per aequationes numero terminorum infinitas (written in 1669, published in 1711 by William Jones) and in De metodis fluxionum et serierum infinitarum (written in 1671, translated and published as Method of Fluxions in 1736 by John Colson). However, his description differs substantially from the modern description given above: Newton applies the method only to polynomials. He does not compute the successive approximations xk, but computes a sequence of polynomials and only at the end, he arrives at an approximation for the root r. Finally, Newton views the method as purely algebraic and fails to notice the connection with calculus. Isaac Newton probably derived his method from a similar but less precise method by François Viète. The essence of Viète's method can be found in the work of Sharaf al-Din al-Tusi.

Newton's method was first published in 1685 in A Treatise of Algebra both Historical and Practical by John Wallis. In 1690, Joseph Raphson published a simplified description in Analysis aequationum universalis. Raphson again viewed Newton's method purely as an algebraic method and restricted its use to polynomials, but he describes the method in terms of the successive approximations xk instead of the more complicated sequence of polynomials used by Newton. Finally, in 1740, Thomas Simpson described Newton's method as an iterative methods for solving general nonlinear equations using fluxional calculus, essentially giving the description above. In the same publication, Simpson also gives the generalization to systems of two equations and notes that Newton's method can be used for solving optimization problems by setting the gradient to zero.

References

  • Michael T. Heath (2002), Scientific Computing: An Introductory Survey, Second Edition, McGraw-Hill
  • Jonathan M. Borwein & Peter B. Borwein (1987), Pi and the AGM: A Study in Analytic Number Theory and Computational Complexity, Wiley Interscience
  • Tjalling J. Ypma (1995), Historical development of the Newton-Raphson method, SIAM Review 37 (4), 531–551. online at JSTOR