Info Some interesting tidbits about AMD's cancelled K9 CPU

NTMBK

Lifer
Nov 14, 2011
10,377
5,517
136
A former AMD chief architect has been posting some interesting bits and pieces:

K9 was an AMD design (mine) that was targeting 5 GHz frequency in 65nm process.
To get to 5GHz, we had to use no more than 8 gates of logic per clock cycle.
This, in turn, mandated a 3 cycle register file read; basically: 1 cycle to
drive the renamed register into the decoder, 1 cycle to assert the select line
across the data path, and 12 cycle to read and fire the sense amplifier.

Oh, and BTW, it had 14 read ports every cycle. ...

A consequence of what we learned in K9 was than when you have an N-stage
pipeline of K-gates per cycle and you want to (about) double the clock
frequency, instead of ending up with a 2×N stage pipeline of K/2 gates
per cycle, you end up with 2.5×N and K/2 gates per stage.

The above is Mitch's 2nd law of pipelining.

A note on frequency:: in advanced processes, even when the clock tree is
exquisitely engineered*, your flip-flops have 4.5-to-5.5× gates of delay.
So, a 16 gate machine (Athlon) is operating at 21 gates per cycle, 16-
logic gates and 5 clock gates. So the 8-gate machine is operating at
(16+5)/8+5) = 1.6× faster.


Interesting to think what might have been, if they'd chased this speed demon instead of going to multicore versions of Opteron.
 

zir_blazer

Golden Member
Jun 6, 2013
1,206
502
136
Real World Technologies forum interface with the tree with branching posts looks more archaic than the traditional forums that I know and closer to Mailing List archive. Between that and the fact than the poster has an aol.com address you could have fooled me to believe than this was necroed from the turn of the millenium if it wasn't dated 31 December 2024. Yet I was always curious about why you have Linus Torvalds and other gurus posting on RWT than say, Hacker News which is where the current crop seems to lurk. Go figure...

Interesting to think what might have been, if they'd chased this speed demon instead of going to multicore versions of Opteron.
Don't see the relationship here, they would eventually do multicore whatever architecture they were using, since both AMD and Intel took what they were already doing (K8 and Prescott) when they began to make Dual Cores. I think Conroe was the first with internal shared resources since both Cores shared Cache L2, but AMD Dual Cores were pretty much two K8s with a crossbar for Memory Controller/Hyper Transport access, and Intel either sawed two continuous Prescott dies (Smithfield) or put two separate dies on same MCM package (Presler) and pretty much implemented two independent Buses to Chipset onto the same Socket.
If anything, the question is how the K9 would have compared to other similar long pipeline high clock speed designs like Willamate, Northwood, Prescott, Tejas, and why not, Bulldozer.
 
Jul 27, 2020
22,306
15,561
146
I had a class on digital logic design. It seemed easy enough at the start but got complicated really fast. I'm grateful that I don't have to do this for a living. But as a hobby, I guess it's a nice way to keep brain cells functioning at peak capacity.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,788
1,277
136
Interesting to think what might have been, if they'd chased this speed demon instead of going to multicore versions of Opteron.
By the time it was approaching K9 was going to be multicore.

Single-core 3 GHz (K8) -> Dual Core 3 GHz (K8) -> Dual-core 5 GHz (K9) -> Quad-core 9 GHz (K10).

The correct path was going all in with Bobcat. Which was Keller's K8 project, rather than Weber's K8 project.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,788
1,277
136
Were the architects of that time smoking something or was there clear evidence that such high speeds were attainable at that time?
There was clear evidence that such high speeds were attainable at that time. Since, low-voltage-swing design and domino gates were still used on IBM's PDSOI to 22nm. It is likely that the only thing that blew up K9 was that it also had high-IPC design methodology. Which blew up its power budget. If they wanted to beat Pentium 4's clocks/perf, they didn't need to go wider than K8.
 
Reactions: igor_kavinski

Doug S

Diamond Member
Feb 8, 2020
3,005
5,168
136
Real World Technologies forum interface with the tree with branching posts looks more archaic than the traditional forums that I know and closer to Mailing List archive. Between that and the fact than the poster has an aol.com address you could have fooled me to believe than this was necroed from the turn of the millenium if it wasn't dated 31 December 2024. Yet I was always curious about why you have Linus Torvalds and other gurus posting on RWT than say, Hacker News which is where the current crop seems to lurk. Go figure...

Well I can only speak for myself (and I'm DEFINITELY not a guru!) but I think the reason a lot of us have been on RWT so long is because it has a pretty high S/N ratio. Not as good as it did back in the day, but what does? And I suspect the archaic interface might have something to do with it - most millennials let alone Gen Z would take one look at it and run off screaming. For us Gen Xers and older it bears a comforting resemblence with Usenet threading from our early days on the net, perhaps.

Not that people under 40 can't contribute just as well as older posters but between the subject matter often dealing with stuff older than they are (heck sometimes older than I am, and I'm on the older end of Gen X) and the interface one step above a green screen terminal it probably just holds little interest for most, and with its small audience, little interest for spammers.

I would say there's easily under 100 active posters - and "active" counts people like Linus who might show up and post mutliple times a day for several days in a row on threads that hold interest for him, then nothing for weeks. So despite having the most rudimentary spam "protection" you can imagine it isn't really a problem, nor is impersonation even though nothing stops anyone from posting as Linus or Mitch or Maynard or myself since there is no login. But everyone has their own writing style, and the longtime posters know either others stances on certain things so even if AI could help someone ape my style (being overly comfortable with starting run on sentences with 'and' and 'but', and using parentheses way too much) if it took an opposite position to one I've been pushing in posts for years people are gonna know it isn't me.
 

Doug S

Diamond Member
Feb 8, 2020
3,005
5,168
136
For us Gen Xers and older it bears a comforting resemblence with Usenet threading from our early days on the net, perhaps.

I should add, a lot of us used to frequent comp.arch on Usenet back in the 90s. Sadly the giants that frequented that group have mostly retired or otherwise moved on but I suspect those of us who have been on David's little board the longest are mostly refugees from when Google dealt the death blow to Usenet with Google Groups.
 

NTMBK

Lifer
Nov 14, 2011
10,377
5,517
136
There was clear evidence that such high speeds were attainable at that time. Since, low-voltage-swing design and domino gates were still used on IBM's PDSOI to 22nm. It is likely that the only thing that blew up K9 was that it also had high-IPC design methodology. Which blew up its power budget. If they wanted to beat Pentium 4's clocks/perf, they didn't need to go wider than K8.
From elsewhere in that thread, it seems they were targeting roughly the same IPC as Opteron: https://www.realworldtech.com/forum/?threadid=221600&curpostid=222252
 
Reactions: lightmanek

Tuna-Fish

Golden Member
Mar 4, 2011
1,557
2,218
136
Intel either sawed two continuous Prescott dies (Smithfield) or put two separate dies on same MCM package (Presler) and pretty much implemented two independent Buses to Chipset onto the same Socket.

Not even that. The FSB Intel used back then was a single shared bus. You could attach multiple agents into the single physical connection, and they would co-ordinate access when collisions happened. A single-core system just has the north bridge and a single cpu on it, their early dual cores were just two cpu agents on a single chip.

Were the architects of that time smoking something or was there clear evidence that such high speeds were attainable at that time?

Frequencies had continuously scaled like that for two decades. If you had bet that you can't double frequency every two generations at any time in the past twenty years, you would have lost. There was nothing weird about believing you could scale from 500MHz to 1GHz, or from 1GHz to 2GHz, or from 2GHz to 4GHz, or from 4GHz to 8GHz.

In the real world all exponential trends are eventually logarithmic, but when you are living in one, you don't know you are about to hit the transition until you do.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,788
1,277
136
From elsewhere in that thread, it seems they were targeting roughly the same IPC as Opteron: https://www.realworldtech.com/forum/?threadid=221600&curpostid=222252
[Our simulations indicated that we were IPC competitive with Opteron while running at 1.8× the clock speed; (unlike the IPC drop of P6[8]*).]
Wikichip:
[According to Alsup, it was designed to be close to 95% of original K8 IPC but reach 5GHz frequency in a 35 nm process.]
[The K9 pipeline was dual-quad issue. It was described by Alsup as: "K9 fetched 8 instructions every other cycle and made 2 branch predictions associated with 3 next fetch addresses every other cycle. K9 issued 4 instructions per cycle and took 2 cycles to issue a fetch width."]

~95% IPC + 5 GHz capability, but has a huge front-end relative to prior designs. K8 = 3-wide issue, K9 = 4-wide issue. Which generally always has wider OoO resources attached to it.

K9 likely had the more efficient retire/rename/scheduler setup.
Where K8 had "the maximum reordering depth is 24 integer macro-operations" * 3 to get to 72, to 3 * 8-entry schedulers.
Where K9 would likely have "the maximum reordering depth is 128 integer macro-operations" for example, with large discrete ALU/Memory schedulers.

If AMD brute-forced the launch of K9, it would probably have come out having a higher effective IPC compared to K8. As with Bobcat(14h) vs Lion(11h) having a unified retire/rename of 56-entries + 16-entry ALU scheduler/10+22 Memory Queues is better than having a split-retire/rename of 24*3 + 8*3 Schedulers setup.
 
Last edited:

Thibsie

Senior member
Apr 25, 2017
973
1,122
136
I should add, a lot of us used to frequent comp.arch on Usenet back in the 90s. Sadly the giants that frequented that group have mostly retired or otherwise moved on but I suspect those of us who have been on David's little board the longest are mostly refugees from when Google dealt the death blow to Usenet with Google Groups.

Still regret Aceshardware
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |