Quasi-Newton method

In numerical analysis, a quasi-Newton method is an iterative numerical method used either to find zeroes or to find local maxima and minima of functions via an iterative recurrence formula much like the one for Newton's method, except using approximations of the derivatives of the functions in place of exact derivatives.Newton's method requires the Jacobian matrix of all partial derivatives of a multivariate function when used to search for zeros or the Hessian matrix when used for finding extrema.Strictly speaking, any method that replaces the exact JacobianMore recently quasi-Newton methods have been applied to find the solution of multiple coupled systems of equations (e.g. fluid–structure interaction problems or interaction problems in physics).Therefore, quasi-Newton methods can be readily applied to find extrema of a function.corresponds to the search for the extrema of the scalar-valued functionIn optimization, quasi-Newton methods (a special case of variable-metric methods) are algorithms for finding local maxima and minima of functions.Quasi-Newton methods for optimization are based on Newton's method to find the stationary points of a function, points where the gradient is 0.Newton's method assumes that the function can be locally approximated as a quadratic in the region around the optimum, and uses the first and second derivatives to find the stationary point.In higher dimensions, Newton's method uses the gradient and the Hessian matrix of second derivatives of the function to be minimized.The Hessian is updated by analyzing successive gradient vectors instead.In multiple dimensions the secant equation is under-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian.The first quasi-Newton algorithm was proposed by William C. Davidon, a physicist working at Argonne National Laboratory.He developed the first quasi-Newton algorithm in 1959: the DFP updating formula, which was later popularized by Fletcher and Powell in 1963, but is rarely used today.The most common quasi-Newton algorithms are currently the SR1 formula (for "symmetric rank-one"), the BHHH method, the widespread BFGS method (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension L-BFGS.The Broyden's class is a linear combination of the DFP and BFGS methods.The SR1 formula does not guarantee the update matrix to maintain positive-definiteness and can be used for indefinite problems.The Broyden's method does not require the update matrix to be symmetric and is used to find the root of a general system of equations (rather than the gradient) by updating the Jacobian (rather than the Hessian).Newton's method, and its derivatives such as interior point methods, require the Hessian to be inverted, which is typically implemented by solving a system of linear equations and is often quite costly.As in Newton's method, one uses a second-order approximation to find the minimum of a function) is and setting this gradient to zero (which is the goal of optimization) provides the Newton step: The Hessian approximationis chosen to satisfy which is called the secant equation (the Taylor series of the gradient itself).and applying the Newton's step with the updated value is equivalent to the secant method.The various quasi-Newton methods differ in their choice of the solution to the secant equation (in one dimension, all the variants are equivalent).is often sufficient to achieve rapid convergence, although there is no general strategy to chooseis updated applying the Newton's step calculated using the current approximate Hessian matrixThis is the Compact quasi-Newton representation, which is particularly effective for constrained and/or large problems.is a convex quadratic function with positive-definite Hessiangenerated by a quasi-Newton method to converge to the inverse HessianThis is indeed the case for the class of quasi-Newton methods based on least-change updates.
Graph of a strictly concave quadratic function with unique maximum.
Optimization computes maxima and minima.
numerical analysisiterative numerical methodfind zeroesfind local maxima and minimarecurrence formulaNewton's methodderivativesJacobian matrixpartial derivativesHessian matrixfor finding extremaiterative methodssequential quadratic programmingsecant methodsBroyden's "good" and "bad" methodsgradientoptimizationmaxima and minimafunctionsstationary pointsquadraticsecant methodunder-determinedWilliam C. DavidonArgonne National LaboratoryDFP updating formulaSR1 formulaBFGS methodL-BFGSpositive-definitenessBroyden's methodJacobianinterior point methodssystem of linear equationsTaylor seriesunderdeterminedpositive-definite matrixWolfe conditionsSherman–Morrison formulaBroydenCompact quasi-Newton representationGNU Octavetrust regionGNU Scientific LibraryALGLIBPythonMathematicaNAG LibraryOptimization ToolboxOWL-QNNewton's method in optimizationLemaréchal, C.Sagastizábal, C. A.John Wiley & SonsArtificial intelligence (AI)HistorytimelineParameterHyperparameterLoss functionsRegressionBias–variance tradeoffDouble descentOverfittingClusteringGradient descentConjugate gradient methodBackpropagationAttentionConvolutionNormalizationBatchnormActivationSoftmaxSigmoidRectifierGatingWeight initializationRegularizationDatasetsAugmentationPrompt engineeringReinforcement learningQ-learningImitationPolicy gradientDiffusionLatent diffusion modelAutoregressionAdversaryUncanny valleySelf-supervised learningRecursive self-improvementWord embeddingHallucinationMachine learningArtificial neural networkDeep learningLanguage modelLarge language modelArtificial general intelligenceAlexNetWaveNetHuman image synthesisSpeech synthesisElevenLabsSpeech recognitionWhisperFacial recognitionAlphaFoldText-to-image modelsAuroraDALL-EFireflyIdeogramMidjourneyStable DiffusionText-to-video modelsDream MachineMusic generationSuno AIWord2vecSeq2seqChinchilla AIChatGPTClaudeGeminichatbotProject DebaterIBM WatsonIBM WatsonxGranitePanGu-ΣDeepSeekAlphaGoAlphaZeroOpenAI FiveSelf-driving carMuZeroAction selectionAutoGPTRobot controlAlan TuringWarren Sturgis McCullochWalter PittsJohn von NeumannClaude ShannonMarvin MinskyJohn McCarthyNathaniel RochesterAllen NewellCliff ShawHerbert A. SimonOliver SelfridgeFrank RosenblattBernard WidrowJoseph WeizenbaumSeymour PapertSeppo LinnainmaaPaul WerbosJürgen SchmidhuberYann LeCunGeoffrey HintonJohn HopfieldYoshua BengioLotfi A. ZadehStephen GrossbergAlex GravesAndrew NgFei-Fei LiAlex KrizhevskyIlya SutskeverDemis HassabisDavid SilverIan GoodfellowAndrej KarpathyNeural Turing machineDifferentiable neural computerTransformerVision transformer (ViT)Recurrent neural network (RNN)Long short-term memory (LSTM)Gated recurrent unit (GRU)Echo state networkMultilayer perceptron (MLP)Convolutional neural network (CNN)Residual neural network (RNN)Highway networkAutoencoderVariational autoencoder (VAE)Generative adversarial network (GAN)Graph neural network (GNN)CompaniesProjectsAlgorithmsmethodsheuristicsUnconstrained nonlinearGolden-section searchPowell's methodLine searchNelder–Mead methodSuccessive parabolic interpolationGradientsConvergenceBerndt–Hall–Hall–HausmanBroyden–Fletcher–Goldfarb–ShannoDavidon–Fletcher–PowellSymmetric rank-one (SR1)Other methodsConjugate gradientGauss–NewtonMirrorLevenberg–MarquardtPowell's dog leg methodTruncated NewtonHessiansConstrained nonlinearBarrier methodsPenalty methodsAugmented Lagrangian methodsSuccessive linear programmingConvex optimizationConvex minimizationCutting-plane methodReduced gradient (Frank–Wolfe)Subgradient methodLinearAffine scalingEllipsoid algorithm of KhachiyanProjective algorithm of KarmarkarBasis-exchangeSimplex algorithm of DantzigRevised simplex algorithmCriss-cross algorithmPrincipal pivoting algorithm of LemkeActive-set methodCombinatorialApproximation algorithmDynamic programmingGreedy algorithmInteger programmingBranch and boundGraph algorithmsMinimum spanning treeBorůvkaKruskalShortest pathBellman–FordDijkstraFloyd–WarshallNetwork flowsEdmonds–KarpFord–FulkersonPush–relabel maximum flowMetaheuristicsEvolutionary algorithmHill climbingLocal searchParallel metaheuristicsSimulated annealingSpiral optimization algorithmTabu searchSoftware