The test chip lacked a floating point unit and only had 1 KB caches.The test chip, along with simulators and emulators, was also used to bring up firmware and the various operating systems that the company supported.The Alpha 21064 was fabricated at Digital's Hudson, Massachusetts and South Queensferry, Scotland facilities.It remained the highest performing single-chip microprocessor until the 275 MHz 21064A was introduced in October 1993.[2] The Alpha 21064 is a superpipelined dual-issue superscalar microprocessor that executes instructions in-order.Which instructions could be paired was determined by the number of read and write ports in the integer register file.The first four stages are also stalled in the event that no instruction can be issued due to resource unavailability, dependencies, or similar conditions.[9] The E-box contains an adder, a logic unit, barrel shifter, and multiplier.The barrel shifter is pipelined, but shift and byte manipulation instructions are not completed by the end of stage six, and thus have a latency of two cycles.The write buffer improved performance by reducing the number of writes on the system bus by merging data from adjacent stores and by temporarily delaying stores, enabling loads to be serviced quicker as the system bus is not utilized as often.The F-box contained a floating-point pipeline and a non-pipelined divide unit which retired one bit per cycle.The floating-point register file is read and the data formatted into fraction, exponent, and sign in stage four.If executing add instructions, the adder calculates the exponent difference, and a predictive leading one or zero detector using input operands for normalizing the result is initiated.In stages five and six, alignment or a normalization shift and sticky-bit calculations are performed for adds and subtracts.The caches are built with six-transistor static random access memory (SRAM) cells that have an area of 98 μm2.An optional external secondary cache, known as the B-cache, with capacities of 128 KB to 16 MB was supported.[13] The B-cache is direct-mapped and has a 128-byte line size by default that could be configured to use larger quantities.[14] The original EV4 was fabricated by Digital in its CMOS-4 process, which has a 0.75 μm feature size and three levels of aluminium interconnect.The Alpha 21066 was intended for use in low-cost applications, specifically personal computers running Windows NT.Digital used various models of the Alpha 21066 in their Multia clients, AXPpci 33 original equipment manufacturer (OEM) motherboards and AXPvme single-board computers.Outside of Digital, users included Aspen Systems in its Alpine workstation, Carrera Computers in its Pantera I workstation, NekoTech used a 166 MHz model in its Mach 1-166 personal computer, and Parsys in its TransAlpha TA9000 Series supercomputers.Due to the process shrink, it was able to include features that were desirable in cost-sensitive embedded systems.These features include an on-die B-cache and memory controller with ECC support, a functionally limited graphics accelerator supporting up to 8 MB of VRAM for implementing a framebuffer, a PCI controller and a phase-locked loop (PLL) clock generator for multiplying a 33 MHz external clock signal to the desired internal clock frequency.A feature specific to the 21066A was power management – the microprocessor's internal clock frequency could be adjusted by software.It was identical to the 21066 but had a lower clock rate to reduce power dissipation and cost.Digital's computers used custom application-specific integrated circuits (ASICs) to interface the microprocessor to the system.21070 users included Carrera Computers for its Pantera workstations and Digital in some models of its AlphaStations and uniprocessor AlphaServers.
The 21064 microprocessor mounted on a business card