- Jan 17, 2013
- 639
- 607
- 136
Theregister said:http://www.theregister.co.uk/2013/08/27/ibm_power8_server_chip/?page=1
The Power8 chip is implemented in IBM's familiar high-k metal gate processes, which include copper and silicon-on-insulator technologies in a 22-nanometer process. The precise transistor count was not given during the presentation, but the Power8 chip weighs in at 650 square millimetres; this is a bit bigger than Power7+, which used a 32-nanometer process, had 2.1 billion transistors, and a surface area of 567 square millimetres.
The Power8 core has a total of sixteen execution pipes. These include two load store units (LSUs) and a condition register unit (CRU), a branch register unit (BRU), and two instruction fetch units (IFUs). There are two fixed-point units (FXUs), two vector math units (VMXs), a decimal floating unit (DFU), and one cryptographic unit (not labeled in the core diagram above).
Each core now has eight threads implemented using simultaneous multithreading (what IBM calls SMT8), instead of four threads per core with the Power7 and Power7+ chips. And like earlier Power chips, this SMT is dynamically tuneable so a core can have one, two, four, or eight threads fired up.
Each core has 512KB of SRAM memory etched right near it. A segmented NUMA-like L3 cache using what IBM calls a "non-uniform cache architecture" or NUCA for short, spans all twelve cores on the die, for a total of 96MB of L3 cache. That's only 8MB of L3 cache per core, compared to 10MB per core for the Power7+ chip announced last year, but the Power8 has a much more sophisticated main memory subsystem and an L4 cache that obviates the need for so much L3 cache on the die. (More on that in a second.) The L3 cache is implemented using embedded DRAM, as was the case with the Power7 and Power7+ processors.
At a 4GHz clock speed, you can move data into L3 cache from the external L4 cache at 128GB/sec and from the L3 cache out to L4 at 64GB/sec. Data can be crammed into L2 cache from L3 at 128GB/sec (or back out at the same bandwidth). The pipe from L2 cache into the cores has 256GB/sec of bandwidth, but only 64GB/sec in the other direction. Add it all up, across a twelve-core Power8 chip that works out to 4TB/sec of L2 cache bandwidth and 3TB/sec of L3 cache bandwidth.
Hopefully this will be some competition in the top end to Intel's server products, I can't imagine how many transistors make up 650+mm^2 on a 22nm process.