Nintendo's successor to the N64 arrived late in 2001 with the Gamecube. For the Gamecube, Nintendo's partners consisted of IBM and ATI, formerly ArtX. IBM provided a 485 MHz PowerPC 750 CPU codenamed Gekko while the 162 MHz ATI/ArtX GPU and I/O chip was dubbed Flipper. The CPU features a 64 KB L1 cache (32 KB Instruction, 32 KB Data), a 256 KB 2-way set-associative L2 cache, and vector extensions consisting of roughly 50 new instructions. There have been further modifications beyond the instruction extensions, however, as the FPU has been modified to process two single-precision values per cycle instead of one double-precision value. Correspondingly, each floating-point register can store either one 64-bit value or two 32-bit values. The FPU has a peak throughput of 1.9 GFLOPS with two multiply-add operations per cycle. At 485 MHz, the CPU has a typical power dissipation of 4.9W on an 0.18µ Cu process.
Metroid Prime (Gamecube)
The Gekko CPU is actually something of a hybrid 750CXe/750FX processor, boasting features from both the CXe and the FX. Specifically, it has the same cache structure as the CXe and the enhanced 60x bus of the FX. Half of the L1 data cache (16 KB) can be locked as well. Another trait of the FX that can be found in Gekko is a dual-reservation station FPU pipe, allowing for full pipelining in addition to the packed single-precision capability mentioned above.
It also offers instructions that can load 8/16-bit integers into the FP registers, converting them to FP on the fly. According to Gabriele Svelto, a long-time Ace's Hardware reader who has researched the Gamecube and Gekko CPU, this is a significant advantage for gaming applications compared to other non-AltiVec PowerPCs as integer-to-float and float-to-integer conversions can be expensive due to the fact that they incur load-from-store penalties.
The 51 million transistor Flipper serves not only as the Gamecube's GPU, but also as an I/O controller and sound processor. The chip also incorporates 3 MB of on-chip 1T-SRAM. 2 MB is reserved for frame (back) and z-buffers while the other 1 MB is used for texture memory. The on-chip memory provides low-latency access, on the order of 6.2 ns, as well as high bandwidth (10.4 GB/s texture bandwidth, 7.6 GB frame buffer bandwidth). S3TC texture compression is used to minimize the impact of the relatively small 1 MB texture buffer.
The GPU itself has 4 pixel pipelines with one TMU each, providing a peak fillrate of 648 MPixel/s. It is capable of multitexturing and can apply as many as 8 layers in a single pass.
The system's memory configuration is unique in that it uses 24 MB of MoSys 1T-SRAM (codenamed Splash) running at 324 MHz for a peak bandwidth of 2.6 GB/s across a 64-bit interface. It also features 16 MB of DRAM on an 8-bit interface running at 81 MHz. This 81 MB/s memory is used for audio, though creative developers may also find other uses like caching program code or data.