Levenberg–Marquardt algorithm

In mathematics and computing, the Levenberg–Marquardt algorithm (LMA or just LM), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems.The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent.The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum.For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA.The algorithm was first published in 1944 by Kenneth Levenberg,[1] while working at the Frankford Army Arsenal.It was rediscovered in 1963 by Donald Marquardt,[2] who worked as a statistician at DuPont, and independently by Girard,[3] Wynne[4] and Morrison.[5] The LMA is used in many software applications for solving generic curve-fitting problems.The primary application of the Levenberg–Marquardt algorithm is in the least-squares curve fitting problem: given a set ofTo start a minimization, the user has to provide an initial guess for the parameter vector ⁠will work fine; in cases with multiple minima, the algorithm converges to the global minimum only if the initial guess is already somewhat close to the final solution.square matrix and the matrix-vector product on the right hand side yields a vector of size⁠ can be increased, giving a step closer to the gradient-descent direction.⁠ or the reduction of sum of squares from the latest parameter vector ⁠⁠ fall below predefined limits, iteration stops, and the last parameter vector ⁠Fletcher in his 1971 paper A modified Marquardt subroutine for non-linear least squares simplified the form, replacing the identity matrix ⁠⁠: A similar damping factor appears in Tikhonov regularization, which is used to solve linear ill-posed problems, as well as in ridge regression, an estimation technique in statistics.Various more or less heuristic arguments have been put forward for the best choice for the damping parameter ⁠The absolute values of any choice depend on how well-scaled the initial problem is.If both of these are worse than the initial point, then the damping is increased by successive multiplication by ⁠⁠ (and the new optimum location is taken as that obtained with this damping factor) and the process continues; if using ⁠An effective strategy for the control of the damping parameter, called delayed gratification, consists of increasing the parameter by a small amount for each uphill step, and decreasing by a large amount for each downhill step.The idea behind this strategy is to avoid moving downhill too fast in the beginning of optimization, therefore restricting the steps available in future iterations and therefore slowing down convergence.along a geodesic path in the parameter space, it is possible to improve the method by adding a second order term that accounts for the accelerationis the solution of Since this geodesic acceleration term depends only on the directional derivative[9] Since the second order derivative can be a fairly complex expression, it can be convenient to replace it with a finite difference approximation where[8] Since the acceleration may point in opposite direction to the velocity, to prevent it to stall the method in case the damping is too small, an additional criterion on the acceleration is added in order to accept a step, requiring that where[8] The addition of a geodesic acceleration term can allow significant increase in convergence speed and it is especially useful when the algorithm is moving through narrow canyons in the landscape of the objective function, where the allowed steps are smaller and the higher accuracy due to the second order term gives significant improvements.using the Levenberg–Marquardt algorithm implemented in GNU Octave as the leasqr function.Only when the parameters in the last graph are chosen closest to the original, are the curves fitting exactly.One reason for this sensitivity is the existence of multiple minima — the function