So far, the Alpha CPU family tree spans three generations: it all began with the 21064 chip. At the time of its introduction, it was the highest performing CPU and it still makes for ta nice workstation, though it's no longer competitive with the latest generation CPUs. This chip branched off into a version that was called the ``Low-Cost Alpha'' (LCA), also known as 21066 or 21068. The chip core was identical to the 21064 but it had an integrated memory- and PCI bus-controller. This high integration made it possible to build Alpha-based systems at relatively low cost and for the embedded market. Unfortunately, the design had a major weakness: the memory system was seriously under-powered. This created the paradox situation where a system based on this chip performed on average no better than a 100MHz Pentium, but on select problems outperformed a P6 running at 200MHz! As a result, the reaction to this chip was highly mixed and probably resulted in quite a few disappointed customers for Digital. On the other hand, there is no doubt that the low-cost at which 21066-based systems eventually were sold caused a quantum leap in the number of Linux/Alpha users.
Around June 1994, the 21164 chip was announced. It had dramatically improved performance over the 21064 and was the first, and so far only, Alpha CPU to feature a three level cache hierarchy: the first- and second-level caches were both on chip and only the third level cache was on the motherboard. This chip, in slightly improved versions, is still going strong. At the Fall 1996 Comdex in Las Vegas, such a chip, coupled with a liquid cooling system, was demonstrated running at 767MHz! Another version, called 21164PC, is scheduled to become available around Spring 1997. It omits the relatively expensive second-level on-chip cache but adds multi-media extensions and other performance-enhancing features. As the name indicates, this chip is designed to be price-competitive with PC processors; specifically the forthcoming Intel Klamath (an improved P6). While price-competitive, the 21164PC is supposed to deliver over 50% better performance than the Klamath. For this second-generation low-cost Alpha implementation, it certainly looks like Digital and its co-designer Mitsubishi are not going to repeat the mistakes of the past. The 21164PC promises to be cheap and fast.
If you happen to have a deep pocket or want to take a glance at what PC processors might look like in two or three years, the 21264 might be of interest. It is scheduled to become available in high-end machine during the second half of 1997. With this chip, CPU performance is expected to take another giant leap. Current estimates call for a performance level that is three to four times faster than the fastest CPUs available today.
Between each major chip generation, there are typically ``half-generation'' CPUs which have improvements that derive primarily from a shrink of the chip manufacturing process. For example, the 21064 chip was followed by the 21064A and similarly the 21164 was followed by the 21164A. In the former case, the core of the chip remained virtually identical to the 21064 but the primary caches doubled in size from 8KB to 16KB. In the latter case, instructions for byte and word accesses were added and the maximum clock frequency increased from 333 to 500MHz.
A summary of the performance attributes of the current Alpha chip family is presented in Table 1.
SPEC | mem | clock | i-cache | d-cache | s-cache | TLB | issue | ISA ex- | trans | ||
CPU | int | fp | [MB/s] | [MHz] | sz/assoc | sz/assoc | sz/assoc | i/d | rate/dyn | tension | [106] |
21066 | n/a | n/a | 30 | 166 | 8KB/1 | 8KB/1 | n/a | 8+4/32 | 2/no | !R | n/a |
21066A | n/a | n/a | 30 | 233 | 8KB/1 | 8KB/1 | n/a | 8+4/32 | 2/no | !R | n/a |
21064 | 2 | 2 | 80 | 175 | 8KB/1 | 8KB/1 | n/a | 8+4/32 | 2/no | !R | 1.7 |
21064A | 4 | 5 | 80 | 275 | 16KB/1 | 16KB/1 | n/a | 8+4/32 | 2/no | !R | 2.8 |
21164 | 9 | 8 | 150 | 333 | 8KB/1 | 8KB/1 | 96KB/3 | 48/64 | 4/no | none | 9.3 |
21164A | 15 | 12 | 300 | 500 | 8KB/1 | 8KB/1 | 96KB/3 | 48/64 | 4/no | B | 9.3 |
21164PC* | 15 | 20 | n/a | >500 | 16KB/1 | 8KB/1 | n/a | 48/64 | 4/no | M+B | 3.5 |
21264* | 40 | 60 | 1600 | >600 | 64KB/2 | 64KB/2 | n/a | 128/128 | 4/yes | C+M+B | 15.2 |
* Performance figures are estimates. |
- SPEC:
- The approximate SPECint95 (column ``int'') and SPECfp95 (column ``fp'') results. The results are approximate since this benchmark is a system-level benchmark. This means that the benchmark results depend not only on the CPU but also on the memory system and other attributes of the machine. The SPEC results are quoted only to give a feel of what performance level can be achieved with the respective CPU. There are no SPEC95 results available for the 21066 or 21066A. The SPECint92/SPECfp92 performance for a 233MHz 21066A is around 100/112, i.e., ``not very fast.''
- mem:
- The approximate memory bandwidth as reported by the McCalpin STREAM benchmark. Since this, too, is a system-level benchmark, the same caveats as for SPEC95 apply.
- clock:
- The maximum clock rate at which the chip is available (not counting over-clocking).
- i-cache:
- The size of the first-level instruction cache. The number behind the slash is the associativity of the cache. A number of 1 means direct mapped, 2 means 2-way set associate, and so on.
- d-cache:
- The size and associativity of the first-level data cache.
- s-cache:
- The size and associativity of the unified second-level on-chip cache where applicable.
- TLB:
- The number of translation lookaside buffer (TLB) entries. The number in front of the slash is the instruction TLB size, the number behind it the size of the data TLB. 8+4 means that the TLB can hold 8 regularly sized (8KB) page table entries and 4 large entries (mapping 4MB each).
- issue rate:
- The maximum number of instructions that can be issued per cycle. The number is followed by ``no'' if instructions are always issued in order and by ``yes'' if out-of-order issue is possible.
- ISA extensions:
- The instruction-set architecture extensions that are supported by the chip. ``!R'' is an anti-extension: the first generation Alpha chips did not have hardware support for rounding towards infinity or minus-infinity. The missing functionality is emulated by the operating system, but if you do interval-arithmetic for a living, these chips probably should not be your first choice. ``B'' means byte/word-load/store support, ``M'' means multimedia support (vector minimum/maximum, pixel error, pack/unpack), and ``C'' means count support (population count, count trailing zeroes, move between integer and floating-point registers, and floating point square root).
- trans:
- The number of transistors on the chip (in millions).