- Nov 14, 2011
- 10,377
- 5,517
- 136
A former AMD chief architect has been posting some interesting bits and pieces:
Interesting to think what might have been, if they'd chased this speed demon instead of going to multicore versions of Opteron.
K9 was an AMD design (mine) that was targeting 5 GHz frequency in 65nm process.
To get to 5GHz, we had to use no more than 8 gates of logic per clock cycle.
This, in turn, mandated a 3 cycle register file read; basically: 1 cycle to
drive the renamed register into the decoder, 1 cycle to assert the select line
across the data path, and 12 cycle to read and fire the sense amplifier.
Oh, and BTW, it had 14 read ports every cycle. ...
A consequence of what we learned in K9 was than when you have an N-stage
pipeline of K-gates per cycle and you want to (about) double the clock
frequency, instead of ending up with a 2×N stage pipeline of K/2 gates
per cycle, you end up with 2.5×N and K/2 gates per stage.
The above is Mitch's 2nd law of pipelining.
A note on frequency:: in advanced processes, even when the clock tree is
exquisitely engineered*, your flip-flops have 4.5-to-5.5× gates of delay.
So, a 16 gate machine (Athlon) is operating at 21 gates per cycle, 16-
logic gates and 5 clock gates. So the 8-gate machine is operating at
(16+5)/8+5) = 1.6× faster.
Real World Technologies - Forums - Thread: Holy Grail computer architecture quests
www.realworldtech.com
Interesting to think what might have been, if they'd chased this speed demon instead of going to multicore versions of Opteron.