Von Neumann architecture

The document describes a design architecture for an electronic digital computer made of "organs" that were later understood to have these components: The attribution of the invention of the architecture to von Neumann is controversial, not least because Eckert and Mauchly had done a lot of the required design work and claim to have had the idea for stored programs long before discussing the ideas with von Neumann and Herman Goldstine.[3] The term "von Neumann architecture" has evolved to refer to any stored-program computer in which an instruction fetch and a data operation cannot occur at the same time (since they share a common bus).In the First Draft of a Report on the EDVAC,[1], the architecture was composed of "a high-speed memory M, a central arithmetic unit CA, an outside recording medium R, an input organ I, an output organ O, and a central control CC" [6] On a large scale, the ability to treat instructions as data is what makes assemblers, compilers, linkers, loaders, and other automated programming tools possible.On a smaller scale, some repetitive operations such as BITBLT or pixel and vertex shaders can be accelerated on general purpose processors with just-in-time compilation techniques.[9] Independently, J. Presper Eckert and John Mauchly, who were developing the ENIAC at the Moore School of Electrical Engineering of the University of Pennsylvania, wrote about the stored-program concept in December 1943.As part of that group, he wrote up a description titled First Draft of a Report on the EDVAC[1] based on the work of Eckert and Mauchly.[12] The paper was read by dozens of von Neumann's colleagues in America and Europe, and influenced[vague] the next round of computer designs.Jack Copeland considers that it is "historically inappropriate to refer to electronic stored-program digital computers as 'von Neumann machines'".Many people have acclaimed von Neumann as the "father of the computer" (in a modern sense of the term) but I am sure that he would never have made that mistake himself.Although Turing knew from his wartime experience at Bletchley Park that what he proposed was feasible, the secrecy surrounding Colossus, that was subsequently maintained for several decades, prevented him from saying so.These tubes were expensive and difficult to make, so von Neumann subsequently decided to build a machine based on the Williams memory.The equipment so far erected at the Laboratory is only the pilot model of a much larger installation which will be known as the Automatic Computing Engine, but although comparatively small in bulk and containing only about 800 thermionic valves, as can be judged from Plates XII, XIII and XIV, it is an extremely rapid and versatile calculating machine.He was joined by Dr. Turing and a small staff of specialists, and, by 1947, the preliminary planning was sufficiently advanced to warrant the establishment of the special group already mentioned.According to Backus: Surely there must be a less primitive way of making big changes in the store than by pushing vast numbers of words back and forth through the von Neumann bottleneck.]: The problem can also be sidestepped somewhat by using parallel computing, using for example the non-uniform memory access (NUMA) architecture—this approach is commonly employed by supercomputers.Researchers expect that increasing the number of simultaneous instruction streams with multithreading or single-chip multiprocessing will make this bottleneck even worse.
A von Neumann architecture scheme
Single system bus evolution of the architecture
Stored-program computercomputer architectureFirst Draft of a Report on the EDVACJohn von NeumannJohn MauchlyJ. Presper EckertMoore School of Electrical Engineeringdigital computerprocessing unitarithmetic logic unitprocessor registerscontrol unitinstruction registerprogram counterMemoryinstructionsmass storageInput and outputHerman Goldstineinstruction fetchHarvard architectureprogram instructionsplugboardimplementationColossusswitchespatch cablescachescalculatormathematicsword processorflowchartsinstruction setprogramcomputationself-modifying codeindex registersindirect addressingimmediate addressingcompilerslinkersloadershigh-level languagesexecutable codejust-in-time compilationJava virtual machineweb browsersBITBLTpixel and vertex shadersAlan TuringMax NewmanUniversity of CambridgeEntscheidungsproblemUniversal Turing machineInstitute for Advanced StudyPrinceton, New JerseyKonrad ZuseUniversity of Pennsylvaniadelay-line memoryManhattan ProjectLos Alamos National LaboratoryJack CopelandStan Frankelreduction to practiceAutomatic Computing Engine (ACE)National Physical Laboratoryvacuum tubeSelectronWilliams memoryBirkbeck, University of LondonManchester BabyVictoria University of ManchesterManchester Mark 1University of ManchesterCSIRACCouncil for Scientific and Industrial ResearchKiev Institute of ElectrotechnologyUkrainian SSRBallistic Research LaboratoryAberdeen Proving GroundIAS machineORDVACMANIAC ILos Alamos Scientific LaboratoryILLIACUniversity of IllinoisBESM-1AVIDACArgonne National LaboratoryORACLEOak Ridge National LaboratoryJOHNNIACRAND CorporationWEIZACWeizmann Institute of ScienceRehovotSILLIACIBM SSECUS patentelectromechanicalpaper tapeAndrew BoothKathleen Boothrotating drum storage devicerelatively primeAdele GoldstineCSIR Mk IPilot ACEWhirlwindERA Atlassystem busmemory-mapped I/Odevicesmicrocontrollersthroughputcentral processing unitforced to waitJohn BackusTuring Awardmain memoryModified Harvard architecturebranch predictorscratchpad memorymemory hierarchysystem on chiplocality of referenceparallel computingnon-uniform memory accesssupercomputersfunctional programmingobject-oriented programmingFORTRANmultithreadingmultiprocessingmulti-core processorsoverheadcache coherenceoperating systembounds checkingMemory protectionCARDboard Illustrative Aid to ComputationInterconnect bottleneckLittle man computerRandom-access machineTuring machinevon Neumann, JohnTuring, Alan M.BibcodeLukoff, HermanCopeland, JackRandell, BrianEdinburgh University PressGrosch, Herbert R. J.British Computer SocietyUniversity of LondonBell, C. GordonJones & BartlettBackus, John W.Dijkstra, Edsger W.Microprocessor ReportPatt, YaleMIT PressDavis, MartinW. W. Norton & Company Inc.W. W. Norton & CompanyBackus, JohnWayback MachineMcGraw-Hill Book CompanyOxford University PressGoldstine, Herman H.Princeton University PressProcessor technologiesModelsAbstract machineFinite-state machinewith datapathHierarchicalDeterministic finite automatonQueue automatonCellular automatonQuantum cellular automatonAlternating Turing machineUniversalPost–TuringQuantumNondeterministic Turing machineProbabilistic Turing machineHypercomputationZeno machineStack machineRegister machinesCounterPointerRandom-accessRandom-access stored programArchitectureMicroarchitectureHarvardmodifiedDataflowTransport-triggeredCellularEndiannessMemory accessLoad–storeRegister/memoryCache hierarchyVirtual memorySecondary storageHeterogeneousFabricCognitiveNeuromorphicInstruction setarchitecturesOrthogonal instruction setApplication-specificVISC architectureQuantum computingComparisonAddressing modesMotorola 68000 seriesPDP-11Stanford MIPSMIPS-XPowerPCPower ISAClipper architectureSuperHDEC AlphaETRAX CRISUnicoreItaniumOpenRISCRISC-VMicroBlazez/ArchitectureOthersExecutionInstruction pipeliningPipeline stallOperand forwardingClassic RISC pipelineHazardsData dependencyStructuralControlFalse sharingOut-of-orderScoreboardingTomasulo's algorithmReservation stationRe-order bufferRegister renamingWide-issueSpeculativeBranch predictionMemory dependence predictionParallelismBit-serialInstructionPipeliningScalarSuperscalarThreadProcessVectorDistributedTemporalSimultaneousHyperthreadingSimultaneous and heterogenousPreemptiveCooperativeFlynn's taxonomyArray processing (SIMT)ProcessorperformanceTransistor countInstructions per cycleCycles per instructionInstructions per secondFloating-point operations per secondTransactions per secondSynaptic updates per secondPerformance per wattCache performance metricsComputer performance by orders of magnitudeGraphics processing unitBarrelStreamTile processorCoprocessorMulti-chip moduleSystem in a packagePackage on a packageEmbedded systemMicroprocessorMicrocontrollerMobileUltra-low-voltageSoft microprocessorSystem on a chipMultiprocessorCypress PSoCNetwork on a chipHardwareacceleratorsAI acceleratorImage processorVision processing unitPhysics processing unitDigital signal processorTensor Processing UnitSecure cryptoprocessorNetwork processorBaseband processorWord size12-bit15-bit16-bit24-bit32-bit48-bit64-bit128-bit256-bit512-bitbit slicingSingle-coreMulti-coreManycoreHeterogeneous architectureCPU cacheData cacheInstruction cachereplacement policiescoherenceClock rateClock signalFunctionalunitsAddress generation unitFloating-point unitMemory management unitLoad–store unitTranslation lookaside bufferBranch target predictorIntegrated memory controllerInstruction decoderCombinationalSequentialLogic gateRegistersProcessor registerStatus registerStack registerRegister fileMemory bufferMemory address registerHardwired control unitInstruction unitData bufferWrite bufferMicrocodeDatapathMultiplexerDemultiplexerMultiplierBinary decoderAddress decoderSum-addressed decoderBarrel shifterCircuitryIntegrated circuitMixed-signalPower managementBooleanDigitalAnalogPowermanagementDynamic frequency scalingDynamic voltage scalingClock gatingHistory of general-purpose CPUsMicroprocessor chronologyProcessor designDigital electronicsHardware security moduleSemiconductor device fabricationTick–tock modelPin grid arrayChip carrier