Multiply–accumulate operation

Modern computers may contain a dedicated MAC, consisting of a multiplier implemented in combinational logic followed by an adder and an accumulator register that stores the result.A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products: Fused multiply–add can usually be relied on to give more accurate results.[8] If x2 − y2 is evaluated as ((x × x) − y × y) (following Kahan's suggested notation in which redundant parentheses direct the compiler to round the (x × x) term first) using fused multiply–add, then the result may be negative even when x = y due to the first multiplication discarding low significance bits.However, standard industrial implementations based on the original IBM RS/6000 design require a 2N-bit adder to compute the sum properly.The Digital Equipment Corporation (DEC) VAX's POLY instruction is used for evaluating polynomials with Horner's rule using a succession of multiply and add steps.
computingdigital signal processingaccumulatorfloating-pointroundingscombinational logicmethod of shifting and addingPercy Ludgatedigital signal processorsintegersmodulopower of twoprecisionassociativedistributiveIEEE 754-2008fused operationDot productMatrix multiplicationPolynomial evaluationHorner's ruleNewton's methodConvolutionsartificial neural networksWilliam Kahanmicroprocessordivisiondivision algorithmsquare rootmethods of computing square rootsDigital Equipment Corporation1999 standardC programming languagePOWER1PA-8000HitachiToshibaEmotion EngineItaniumFujitsuSPARC64 VILoongsonElbrus-8SVFMA3 and/or FMA4 instruction setBulldozerPiledriverSteamrollerExcavatorIntel HaswellSkylakeARM Cortex-M4FARM Cortex-A5ARM Cortex-A7ARM Cortex-A15Qualcomm KraitApple A6Fujitsu A64FXz/ArchitectureAMD GPUsGraphics Core NextNvidia GPUsKeplerMaxwellPascalIntel MICARM Mali T600 SeriesNEC SX-Aurora TSUBASARISC-VCompound operatorKahan, WilliamCiteSeerXGraphics processing unitNvidiaGeForceQuadroRadeonRadeon ProInstinctMatroxInfiniteRealityNEC µPD72203dfx VoodooGlaze3DApple siliconJingjia MicroAdrenoPowerVRVideoCoreVivanteImageonIntel 2700GCompute kernelFabricationFinFETMOSFETGraphics pipelineGeometryVertexHDR renderingRasterisationShadingRay-tracingTessellationTiled renderingUnified shader modelBlitterGeometry processorInput–output memory management unitRender output unitShader unitStream processorTensor unitTexture mapping unitVideo display controllerVideo processing unitFramebufferHBM-PIMMemory bandwidthMemory controllerShared graphics memoryTexture memoryIP coreDiscrete graphicsClusteringSwitchingExternal graphicsIntegrated graphicsSystem on a chipClock rateDisplay resolutionFillratePixel/sTexel/sFLOP/sFrame ratePerformance per wattTransistor countScrollingSpriteTextureGraphics libraryHardware accelerationImage processingCompressionParallel computingVector processorVideo coding