As RISC moves to the mainstream, vendors are broadening their offerings to appeal to various users
Bob Ryan and Tom Thompson
Ever since Sun Microsystems popularized RISC workstations in the mid-1980s, the goal of RISC chip designers has been unvarying--better performance. Having the most powerful chip on the market meant more than bragging rights; it meant sales. Portable operating systems and the explosive growth of the workstation market meant that many people bought workstations based on performance alone.
Today, the possibility of RISC making inroads into the desktop computing market has blunted the hell-bent pursuit of performance. Suddenly, price/performance, features, and ease of integration have assumed greater importance as companies such as DEC, Sun Microsystems, IBM, Motorola, and Mips go head to head
with the Intel 80x86 juggernaut. Raw performance will get you only so far if it prices you out of 95 percent of the market.
The possibilities of RISC on the desktop has had a direct effect on RISC design. RISC designers are beginning to broaden their product offerings. This trend toward product-line diversification has manifest itself in a number of new chips from RISC vendors.
Alpha Attacks System Costs
In 1992, DEC entered the RISC market in a big way, with the Alpha, a 64-bit RISC architecture that the company claims will carry it well into the next century. Alpha hit the scene with a splash. At introduction, it was the world's most powerful microprocessor, and it remains the world's fastest single-chip microprocessor. Offered at 133, 150, and 200 MHz, the DECchip 21064 is ideal for high-end workstations and multiprocessing servers. (Alpha should retake the world's fastest bar-none crown from the IBM Power2 later this year with the release of the DECchip 21064A, a 275-MHz implementation of
the Alpha architecture.)
Last year, DEC introduced the first variant of the Alpha architecture. Dubbed the DECchip 21066, the chip is designed to be the centerpiece of DEC's RISC PC strategy. It will be used in systems that run Windows NT and thus compete directly with Intel's high-end 486 and Pentium processors.
To compete with the high-end 80x86 machines, you need more than an inexpensive chip; you need an inexpensive system. The 21066 is designed with system costs in mind. It uses the 21064 core, so it is fast. It includes a memory interface--to SRAM (static RAM), DRAM, and VRAM (video RAM)--on the chip and a PCI (Peripheral Component Interconnect) controller; therefore, it has most of the logic a systems designer requires to implement a complete system. This is important since unlike the 80x86 machines, a huge support-chip industry doesn't exist around the Alpha architecture or any other RISC architecture.
In a further attempt to keep system costs down, the 21066's memory interface i
s 64 bits wide, which is half the width of the external memory bus of the 21064. Even though this narrower bus has a negative impact on performance, it makes it simpler to design a system around the 21066.
The 21066 is manufactured using DEC's 0.68-micron, three-layer-metal CMOS technology. The chip's size is 209 mm superscript 2, and it operates internally at 3.3 V, although it can connect seamlessly to 5-V peripherals. Initially clocked at 166 MHz, the chip will dissipate over 20 watts of power, making it unsuitable for notebook implementations. The 21066 is priced at $424 each in quantities of 1000.
Based on simulations, DEC expects about 70 SPECint92 and 105 SPECfp92 performance from the 21066, which is a bit higher than the Pentium's 66-MHz integer performance (64.5) and nearly twice its floating-point performance. With a high degree of integration that will result in lower system cost, the 21066 will find its way into many NT servers and high-end desktops.
Integration, SPARC Style
Another company aiming to keep system costs down is Sun Microsystems, which, in conjunction with Fujitsu, has developed the MicroSparc II, a follow-on to the original MicroSparc I architecture. The MicroSparc II is an implementation of version 8 of the SPARC architecture. As such, it is compatible with the thousands of applications available for SPARC systems.
The MicroSparc II is the low end of an expanding SPARC product line. It is designed for low-cost implementations, both desktop and portable. Above MicroSparc comes SuperSparc, a superscalar SPARC implementation built by Texas Instruments for desktop systems. At the top of the line, Sun has recently announced UltraSparc, a 64-bit implementation of SPARC that Sun hopes will help the company regain some of the technical and performance luster it has lost in recent years to DEC and Mips. Like SuperSparc and the original MicroSparc, UltraSparc is being developed in conjunction with Texas Instruments.
As with the 21066, the MicroSparc II uses
a high level of integration on the processor. In addition to the CPU core, it includes a DRAM controller, a graphics system interface, and an SBus controller. The primary distinction between the MicroSparc II and the 21066 is in the choice of I/O bus. DEC chose PCI, because it wants to make inroads into industry-standard desktops; PCI is establishing itself as a high-end standard, and it can be bridged to ISA. Sun chose SBus, which is found in SPARC systems from several manufacturers.
Sun is more interested in expanding its Solaris-based business than in joining the Windows NT bandwagon. The company is supporting Intergraph's efforts to port NT to SPARC, but it has announced no intention of offering NT on its own machines.
The MicroSparc II is built with Fujitsu's 0.5-micron, three-level-metal CMOS technology. It is a fully static design that operates at 3.3 V internally, and, like the 21066, it can interface to 5-V peripherals. It is designed to operate between 50 and 125 MHz. It is a large chi
p, packing 2.3-million transistors onto a die that measures 233 mm superscript 2.
The MicroSparc II is a single-issue CPU, with instructions executing in either the integer or floating-point pipelines. To help keep floating-point instructions from blocking the integer pipeline, the FPU contains a three-entry instruction queue. The FPU is IEEE 754-compliant and can execute floating-point multiply instructions in parallel with other floating-point instructions. The integer pipeline consists of five stages and is preceded by a four-entry prefetch buffer.
Besides the integrated memory and bus controllers, the biggest difference between the MicroSparc I and II is the size of their respective caches. The MicroSparc I has a 4-KB instruction cache and a 2-KB data cache, where the MicroSparc II has a 16-KB instruction cache and an 8-KB data cache. Unlike most other new RISC chips, the caches are virtually addressed, meaning that lookup occurs using the virtual address, not the physical address generated
by the MMU (memory management unit). In other words, the MMU is downstream from the caches.
This method eliminates any latency the MMU introduces before cache lookup, but it does require special logic to handle coherency problems when two or more virtual addresses map to the same physical address. In fact, this arrangement is a holdover from when the SPARC architecture was implemented on several chips. Then, the penalty for going off-chip to access the MMU was too high to implement physical caches (where cache lookup occurs after address translation).
In addition to using a 3.3-V power supply, the MicroSparc II is fully static. It also uses power management to conserve power. It can cut power to the caches by 75 percent when they are not being accessed, and in standby mode, it can stop the clock to all logic blocks. At 85 MHz, it is expected to consume about 5 W.
Sun expects the MicroSparc II to power both low-cost, high-volume desktop systems and SPARC portable systems. With the highest
degree of integration yet seen in a SPARC processor, the MicroSparc II should significantly reduce costs to system vendors, while making it easier for them to design a system. The chip will sell for less than $500 each in quantity.
At the Microprocessor Forum last fall, IBM and Motorola announced that they had produced first silicon of the PowerPC 603, the second member of the PowerPC family. The goal of the PowerPC 603 is to provide high performance while consuming little power, making it ideal for notebook computer designs.
The 603 uses 3.3-V, 0.5-micron, four-level-metal static CMOS technology to pack 1.6-million transistors onto a die that's 85.1 mm superscript 2. By contrast, the PowerPC 601 uses 3.6-V, 0.6-micron static CMOS technology to place 2.8-million transistors onto a die that's 132 mm superscript 2. Like the 601, the 603 implements a 32-bit version of the 64-bit PowerPC architecture, with a 32-bit address bus and a 32- or 64-bit data bus. The 603 uses the same s
uperscalar design with a three-instruction dispatch.
However, the 603 differs from the 601 in a number of areas. First, the 603 uses a Harvard architecture: It has two separate 8-KB caches, one for instructions and one for data. Each cache has its own MMU. Both caches are two-way set-associative and use a least recently used algorithm.
Next, the 603 has five independent execution units. As with the 601, the 603 has a BPU (branch-prediction unit), IU (integer unit), and FPU. However, the 603 features a new load/store unit and an SRU (system-register unit) that is used to implement dynamic power management. The load/store unit handles data transfers between the data cache and the GPRs (general-purpose registers) and FPRs (floating-point registers). The SRU executes special-purpose-register and condition-register instructions.
The 603 will be available as 66- and 80-MHz parts. Maximum power consumption should be only 3 W at 80 MHz. A variety of power-saving techniques incorporated in the des
ign should actually enable typical power consumption to hover around 1 to 1.5 W. This compares well with popular notebook CPUs such as the Intel 486DX/33, which can dissipate up to 3.2 W. The power-saving techniques used include a PLL (phase-locked loop) clock multiplier circuit. The PLL allows the processor to run at frequencies higher than the system clock, using a multiplier of 1´, 2´, 3´, and 4´. The PLL also enables the 603 to operate properly when slower system clock speeds (e.g., 33 and 50 MHz) are used to reduce the processor's power consumption.
Because the 603 uses static logic, the contents of the registers and caches are preserved when the processor kicks into low-power modes. The 603 provides three software-controllable power-saving modes: doze, nap, and sleep. The doze mode switches off most of the processor, except for the external bus-snooping logic. The bus interface processes external snoops and maintains coherency of the internal caches. The time-base register continues to operate. T
he PLL is also powered so that it remains locked to the system clock and can bring the processor into the full-powered mode in only a few clock cycles.
The nap mode disables the bus snooping, so cache coherency is not maintained. The PLL and time-base register are still active. Return to a full-power active state takes several clock cycles. In the sleep mode, the time-based register is switched off, leaving no internal units operating. External logic can disable the PLL for further power savings. This mode consumes minimum power, but it takes a number of clock cycles for the PLL to resynchronize before the processor can be placed into full power mode.
The 603 also uses dynamic power management techniques to reduce power consumption. Dynamic power management works by switching off the clock to certain processor subsystems when they are idle. The dispatch logic monitors the instruction stream, and if a certain subsystem--say the FPU--is idle and no floating-point instructions are forthcoming, the
dispatch logic has the FPU clock disabled. Conversely, if the dispatch logic detects an incoming floating-point instruction, it can enable the FPU clock before issuing the instruction to it. This also explains the two additional execution units: Both the LSU and SRU can be disabled as necessary to save power.
Either cache can be switched off if it is inactive. For example, the 603 might be constantly fetching instructions but no data, so the data cache would be powered down. The dual-cache design also requires smaller on-chip buffers and eliminates the arbitration logic required for the 601's unified cache.
Also, the cache protocol has been reduced from four states (i.e., modified, exclusive, shared, and invalid) to three states (i.e., modified, exclusive, and invalid). The cache protocol is compatible with the four-state protocol. It was anticipated that the 603 would be used for stand-alone designs, so the sharing state was removed. These changes to the overall cache design use fewer transisto
rs, which also translates into power savings.
Preliminary SPECmarks (obtained from simulations) indicate that a 66-MHz 603 should post 60 SPECint92 and 70 SPECfp92. That compares favorably to a 66-MHz 601's performance of 60.6 SPECint92 and 72.2 SPECfp92, as obtained on the RS/6000 Model 250. The 603's comparable RISC performance, combined with its modest power consumption, makes it ideally suited to become the heart of future notebook computers.
The 603 will be manufactured at IBM's microelectronics facility in Burlington, Vermont, and Motorola's MOS-11 facility in Austin, Texas. Pricing was not available at this writing.
SGI Gets Small
Like DEC, Silicon Graphics is trying to ride into the desktop market on the back of Windows NT and is aiming to reduce system costs so that systems using the 64-bit Mips III architecture can offer a significant price/performance advantage over industry-standard 80x86 systems. Unlike DEC, however, the latest Mips design does not aim to integrate a lot o
f system logic on the microprocessor; instead, it goes for straightforward price reduction while maintaining RISC performance levels.
The R4200 is the result; a small (just 81 mm superscript 2), powerful (estimated 55 SPECint92), and inexpensive processor that can offer a significant price/performance advantage over any 80x86 chip. NEC, which has a one-year exclusive license to produce the chip, estimates that the R4200 will sell for 8000 yen--well under $100 at current exchange rates.
Unlike most RISC processors, the R4200 is neither superscalar nor superpipelined. It uses a fairly standard five-stage pipeline as opposed to the eight-stage superpipeline used in the other members of the R4x00 family. In addition, it combines its integer and floating-point pipeline into a single unit, creating a pipeline that can perform both types of operations.
Combining the two units into one degrades performance--floating-point performance is estimated at 30 SPECfp92--but saves a huge number of transis
tors. Another savings comes from reducing the number of TLB (translation look-aside buffer) entries in the MMU from 48 to 32. This might not seem like much compared to combining the fixed- and floating-point pathways, but considering that the TLB is fully associative, it is significant. Like other R4x00 processors, the R4200 retains a separate two-entry instruction TLB so that most simultaneous data and instruction accesses don't result in one access being blocked while the other makes use of the MMU.
Another factor in reducing the size--and thus the cost--of the R4200 is the manufacturing process used to make it. NEC uses a 0.6-micron, three-layer-metal CMOS technology to produce the R4200. The chip operates at 3.3 V, and, unlike the 21066, requires 3.3-V peripherals. In addition, it incorporates a number of power management techniques. It can power down unused functional blocks and prevent switching in unused execution units. The chip isn't a static design, however, so you must save the state of the
processor before powering down completely. NEC expects the chip to draw about 1.5 W, making it ideal for notebook and portable applications.
The R4200 stacks up quite well against both the Pentium and the high-end 486s. It provides 80 percent of the Pentium's integer performance at about 10 percent of the price. It betters the integer performance of the 486DX2, at 20 percent to 25 percent of the price. As an economical platform for NT, the R4200 will be hard to beat.
Coming of Age
The chips previously described make one thing perfectly clear: RISC is no longer a fringe technology. All the major RISC vendors offer a range of solutions with different features, performance levels, and prices. True, some architectures have only a couple of representatives, but in these cases--Alpha and PowerPC especially--the vendors are committed to providing an ever-growing choice of CPUs.
Vendors are also offering embedded solutions based on desktop CPUs. IBM has announced a family of embedded processor
s based on the PowerPC--the PowerPC 400 series--and Motorola is expected to do the same shortly. DEC sells an embedded version of the 21066 called the 21068. Embedded processor sales help ameliorate the design costs of desktop CPUs, letting companies like DEC and IBM compete more effectively with Intel. These developments are necessary if RISC is to garner a significant share of the desktop computing market.
With prices below $500, these RISC chips can compete head-on with the top end of the 80x86 line.
NUMBER MAXIMUM PRICE SIZE
OF POWER (QUANTITY IN MM super-
TRANSISTORS DISSIPATION 1000) script 2
DECchip 21066 1.75 million 20+ W (166 MHz) $424 209
PowerPC 603 1.6 million 3 W (80 MHz) N/A 85
MicroSparc II 2.3 million 5 W (85 MHz) $500 233
ips/NEC R4200 1.3 million 2 W (40/80 MHz) $75 (8000 yen) 81
SPECINT92 SPECFP92 OPERATING VOLTAGE
DECchip 21066 70* (166 MHz) 1051 (166 MHz) 3.3 (5-V peripherals)
PowerPC 603 75* (80 MHz) 851 (80 MHz) 3.3 (5-V peripherals)
MicroSparc II 57.2 (85 MHz) 49.5 (85 MHz) 3.3 (5-V peripherals)
Mips/NEC R4200 55* (40/80 MHz) 301 (40/80 MHz) 3.3
*Based on simulations N/A = not available.
Illustration: Integrating an Alpha core with both memory and PCI controllers yields a powerful chip that is easy and inexpensive to integrate into a system. Despite its added functionality, the DECchip 21066 requires 144 fewer pins than the 21064.
Illustration: The MicroSparc II brings SPARC integration to new levels. With four times the cache memory of the original MicroSparc, it promises to at least double its performance.
he PowerPC 603 introduces a Harvard architecture and dynamic power management to the PowerPC line. Expected to dissipate 2 to 3 W at 80 MHz, it is ideal for notebooks and energy-efficient desktop systems.
Illustration: The R4200 integrates a complete RISC pipeline and 24-KB cache on a die 82 mm superscript 2. Its low power consumption and high performance make it ideal for notebook systems.
Bob Ryan is a BYTE technical editor. You can contact him on the Internet or BIX at
. Tom Thompson is a BYTE senior technical editor. You can contact him on the Internet or BIX at