Bayesian optimization

[4][5] The term is generally attributed to Jonas Mockus [lt] and is coined in his work from a series of publications on global optimization in the 1970s and 1980s., is continuous and takes the form of some unknown structure, referred to as a "black box".The posterior distribution, in turn, is used to construct an acquisition function (often also referred to as infill sampling criteria) that determines the next query point.Another less expensive method uses the Parzen-Tree Estimator to construct two distributions for 'high' and 'low' points, and then finds the location that maximizes the expected improvement.The maximum of the acquisition function is typically found by resorting to discretization or by means of an auxiliary optimizer.The approach has been applied to solve a wide range of problems,[12] including learning to rank,[13] computer graphics and visual design,[14][15][16] robotics,[17][18][19][20] sensor networks,[21][22] automatic algorithm configuration,[23][24] automatic machine learning toolboxes,[25][26][27] reinforcement learning,[28] planning, visual attention, architecture configuration in deep learning, static program analysis, experimental particle physics,[29][30] quality-diversity optimization,[31][32][33] chemistry, material design, and drug development.[36] The performance of the Histogram of Oriented Gradients (HOG) algorithm, a popular feature extraction method, heavily relies on its parameter settings.[36] A novel approach to optimize the HOG algorithm parameters and image size for facial recognition using a Tree-structured Parzen Estimator (TPE) based Bayesian optimization technique has been proposed.
Bayesian optimization of a function (black) with Gaussian processes (purple). Three acquisition functions (blue) are shown at the bottom. [ 8 ]
Graph of a strictly concave quadratic function with unique maximum.
Optimization computes maxima and minima.
sequential designglobal optimizationblack-boxartificial intelligencemachine learningoptimizing hyperparameter valuesdimensionsderivativesposterior distributionGaussian processeskrigingThompson samplingexploration and exploitationNewton's methodBroyden–Fletcher–Goldfarb–Shanno algorithmlearning to rankcomputer graphicsroboticssensor networksautomatic machine learningreinforcement learningdeep learningparticle physicsMulti-armed banditBayesian experimental designProbabilistic numericsPareto optimumActive learning (machine learning)Multi-objective optimizationNando de FreitasWayback MachineOptimizationAlgorithmsmethodsheuristicsUnconstrained nonlinearFunctionsGolden-section searchPowell's methodLine searchNelder–Mead methodSuccessive parabolic interpolationGradientsConvergenceTrust regionWolfe conditionsQuasi–NewtonBerndt–Hall–Hall–HausmanBroyden–Fletcher–Goldfarb–ShannoL-BFGSDavidon–Fletcher–PowellSymmetric rank-one (SR1)Other methodsConjugate gradientGauss–NewtonGradientMirrorLevenberg–MarquardtPowell's dog leg methodTruncated NewtonHessiansConstrained nonlinearBarrier methodsPenalty methodsAugmented Lagrangian methodsSequential quadratic programmingSuccessive linear programmingConvex optimizationConvex minimizationCutting-plane methodReduced gradient (Frank–Wolfe)Subgradient methodLinearquadraticAffine scalingEllipsoid algorithm of KhachiyanProjective algorithm of KarmarkarBasis-exchangeSimplex algorithm of DantzigRevised simplex algorithmCriss-cross algorithmPrincipal pivoting algorithm of LemkeActive-set methodCombinatorialApproximation algorithmDynamic programmingGreedy algorithmInteger programmingBranch and boundGraph algorithmsMinimum spanning treeBorůvkaKruskalShortest pathBellman–FordDijkstraFloyd–WarshallNetwork flowsEdmonds–KarpFord–FulkersonPush–relabel maximum flowMetaheuristicsEvolutionary algorithmHill climbingLocal searchParallel metaheuristicsSimulated annealingSpiral optimization algorithmTabu searchSoftware