AMD post-Bulldozer x86 CPU architecture

Fjodor2001

Diamond Member
Feb 6, 2010
3,938
407
126
So we know AMD is designing a new non-Bulldozer-based x86 uArch CPU.

http://www.computing.co.uk/ctg/news...re-for-2015-launch-under-chip-guru-jim-keller

"AMD is planning to launch a new x86 micro-architecture before the end of next year in a development led by Jim Keller, the lead developer behind the K7 and K8 AMD micro-architectures.
[...]
Details of the new micro-architecture will be unveiled during 2015, with parts expected to appear in 2016. Bulldozer will make its final appearance in the form of the Excavator cores that will appear in the 2014 accelerated processing units code-named Carrizo and Toronto."


I.e. Carrizo and Toronto will be the last Bulldozer-based big core AMD x86 CPUs.

So what can we expect from the next AMD x86 big core CPU generation? If there should be any point in designing a new uArch generation from scratch, doesn't it have to be quite a lot better than the previous Bulldozer-based one?

And if so, is it likely to catch up with (or come very close to) the Intel x86 CPUs? Intel has not announced any similar completely new x86 architecture, so if we assume the average ~5% CPU performance increase per year from Intel, isn't there a chance AMD will catch up?
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
I am confident that AMD could catch up in IPC in no time, bulldozer just wasn't designed in the same manner as intel's, they designed it for high core count and clocks...
a design for a future that isnt here yet.

If they can design the tiny jaguar core to have ~80-90% of steamroller core IPC then they must be able to desing a big core with more IPC -that isnt steamroller.
 

SAAA

Senior member
May 14, 2014
541
126
116
If they can design the tiny jaguar core to have ~80-90% of steamroller core IPC then they must be able to desing a big core with more IPC -that isnt steamroller.

That's indeed a good proof of how messed the design was from start, not because it was totally bad, but simply because it has no sense for a core that large to reach higher performance just by clocks. That's a route where Intel failed too long ago, and if the same story of core architecture happens for AMD we should have finally a decent competitor.
I wonder if they can using much less funds tough...
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
That's indeed a good proof of how messed the design was from start, not because it was totally bad, but simply because it has no sense for a core that large to reach higher performance just by clocks. That's a route where Intel failed too long ago, and if the same story of core architecture happens for AMD we should have finally a decent competitor.
I wonder if they can using much less funds tough...

I don't think funds are a problem, they have been reacquiring some pros that jumped ship a while back, like keller.
 

Centauri

Golden Member
Dec 10, 2002
1,655
51
91
Anybody with half a brain would have gobbled up some AMD stock as soon as they started rehiring talent from the golden years, like Keller.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,224
136
Bulldozer:
32 byte Fetch Window(Core A and Core B 2*16B or Core A or B 1*32B)
Pick(Predecode) buffer
16 byte Decode(Up to 4 macro-ops; 8 micro-ops)

Steamroller:
64(2*32) byte Fetch Window(Core A and Core B 2*32B or Core A or B 1*64B)
Pick(Predecode) buffer
32 byte Decode(Up to 2*4 macro-ops; 2*8 micro-ops)

Jaguar:
32 byte Fetch Window(Core A 1*32B)
Pick(Predecode) buffer
16 byte Decode(Up to 4 macro-ops; 8 micro-ops)

We haven't seen the fully enabled Steamroller module, so I wouldn't judge.

My speculation;
The next x86-64 µarch and K12 will be using a more complex and modularized CMT.
 
Last edited:

Sequences

Member
Nov 27, 2012
124
0
76
We haven't seen the fully enabled Steamroller module, so I wouldn't judge.

I don't quite understand what this means, can you explain it? How do you know the current Steamroller modules aren't "fully enabled" and in terms of performance, how far from "fully enabled" are current Steamroller modules? Are we likely to see a complete Steamroller release?
 

SAAA

Senior member
May 14, 2014
541
126
116
It's should be about some half-disabled module he has been speculating since steamroller released. I'm not certain how much this can be a dream or reality given Kaveri huge increase in transistor count, the chip having other disabled parts (gddr5 controller) and so on. But twice the resources it's far too much to be true, maybe they are disabling some cu inside for yield (and the poor die shot image doesn't help in counting them) but half the core...
 

jpiniero

Lifer
Oct 1, 2010
14,842
5,457
136
Anybody with half a brain would have gobbled up some AMD stock as soon as they started rehiring talent from the golden years, like Keller.

Except they have not much going on in mobile... and that's where the money is going. Maybe the K12 will change that, but it's still a stretch. I would not expect much out of this x86 processor.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
Except they have not much going on in mobile... and that's where the money is going. Maybe the K12 will change that, but it's still a stretch. I would not expect much out of this x86 processor.

Ask TI how "mobile" is going for them...
There isn't any money is mobile for new comers without large contracts.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,938
407
126
Except they have not much going on in mobile... and that's where the money is going. Maybe the K12 will change that, but it's still a stretch. I would not expect much out of this x86 processor.

Why would they spend R&D money on creating a completely new non-Bulldozer based architecture, if it's not quite a lot better than the existing Bulldozer-based one?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,224
136
I don't quite understand what this means, can you explain it?
Simple; The front-end's bandwidth does not match the units bandwidth.
How do you know the current Steamroller modules aren't "fully enabled" and in terms of performance, how far from "fully enabled" are current Steamroller modules?
- Branch Predictor is fully enabled.
- L1 instruction cache is fully enabled
- Instruction Fetch is fully enabled
- Pick(Predecode) Buffer is fully enabled
- Decode is fully enabled.
- Dispatch(Postdecode) buffer is over half enabled. (2 * 20-entires vs 2 * 32 entries)
- Integer/Memory scheduler is over half enabled. (2 * 48 entries vs 2 * 80 entries)
- Integer/Memory registers is over half enabled. (2 * 112 entries vs 2 * 192 entries)
- Floating Point scheduler is half enabled. (1 * 60 entries vs 1 * 120 entries)
- Floating Point registers is half enabled to over half enabled. (176 entries vs 352 to 320 entries)
- Integer/Memory/Floating Point logic is half enabled. (GP core has half of the ALUs and AGLUs enabled, half of the bit width for the floating point unit is disabled.)
- Write Coalescing Cache has been improved but those improvements are disabled.
- Most units are about half disabled.
Are we likely to see a complete Steamroller release?
With the information I gathered the complete Steamroller design will be Mobile and Dense Server, only.

--
Note that the Kaveri mobile launching soon has no xx5xs SKUs or any SKUs higher than 7600.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
I am confident that AMD could catch up in IPC in no time, bulldozer just wasn't designed in the same manner as intel's, they designed it for high core count and clocks...
a design for a future that isnt here yet.

If they can design the tiny jaguar core to have ~80-90% of steamroller core IPC then they must be able to desing a big core with more IPC -that isnt steamroller.

To get the IPC that Intel has, doesn't AMD need better cache and a wider core?

Two areas they have not demonstrated themselves as having the technical expertise to accomplish to date?

It is one thing to identify the problem, quite another to actually be able to rectify it.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I don't quite understand what this means, can you explain it? How do you know the current Steamroller modules aren't "fully enabled" and in terms of performance, how far from "fully enabled" are current Steamroller modules? Are we likely to see a complete Steamroller release?

He thinks that Steamroller was originally going to have twice the IPC and half the clock speed, then at the last minute AMD disabled half the stuff in the module. This is all based on that leaked die shot that he somehow thinks is Steamroller is for some reason.

In other words, he makes up crazy theories based on zero evidence and talks about them like they're fact. You should ignore him.
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
To get the IPC that Intel has, doesn't AMD need better cache and a wider core?

Two areas they have not demonstrated themselves as having the technical expertise to accomplish to date?

It is one thing to identify the problem, quite another to actually be able to rectify it.

I give you the cache one, but why do you think they cant effectively double the width of the execution units?

I would rather add their branch predictor to the "thinks that AMD have yet to be able to correct" than the core's width. They at least could accomplish a leaner core that accomplishes the same IPC with fewer execution units than K10, I think now it's time to see a 2x wider core from AMD with Carrizo (if the lagging process node allows it to, at least at decent clocks).
 
Aug 11, 2008
10,451
642
126
Why would they spend R&D money on creating a completely new non-Bulldozer based architecture, if it's not quite a lot better than the existing Bulldozer-based one?

The same reason I guess that they brought out bulldozer when it was barely faster than Phenom. Seriously, all we have now is speculation, I would not be raising any great expectations just yet.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,224
136
He thinks that Steamroller was originally going to have twice the IPC and half the clock speed, then at the last minute AMD disabled half the stuff in the module.
IPC is a finicky number in relationship to Superscalar/Vector architectures. You can have a design with 8 instructions being executed while having the same performance as another design that only needs 1 instruction being executed.

The double in "IPC" comes from the 32B decode window for each individual decoder.
The additional instruction decoder increases the instruction decode capacity to eight macro instructions per
clock cycle, providing up to twice the decode and dispatch bandwidth compared to models 00h–1Fh.
Bulldozer/Piledriver;
Cycle A; 16B Decode for Core A, up to 4 macro-ops
Cycle B; 16B Decode for Core B, up to 4 macro-ops
up to 4 macro-ops per core in a "whole" decode partition.

Steamroller/Excavator
Cycle A; 16Bs0p0 Decode for Core A and 16Bs0p1 Decode for Core B, up to 8 macro-ops.
Cycle B; 16Bs1p0 Decode for Core A and 16Bs1p1 Decode for Core B, up to 8 macro-ops.
up to 8 macro-ops per core in a "whole" decode partition.

Jaguar/Puma
Cycle A; 8Bs0 Decode for Core A, up to 2 macro-ops.
Cycle B; 8Bs1 Decode for Core A, up to 2 macro-ops.
up to 4 macro-ops per core in a "whole" decode partition.

The decision to disable the logic might not have been a last minute decision. It might been a more strategic decision to increase the yields of the products. Not all of the improvements were disabled which imply there is more logic to be seen.
This is all based on that leaked die shot that he somehow thinks is Steamroller is for some reason.
The leaked die shot is an early hand designed Excavator that would lead to SteamrollerB. The 2011 and 2012, SteamrollerA and Excavator was designed by hand before switching to the compiled-synthetic SteamrollerB.

KaveriA - 2012 Tapeout TSMC28HP
CarrizoA - 2013 Tapeout GF28HP
KaveriB - 2013 Tapeout GF28SHP
CarrizoB - 2014 Tapeout GF20LPM

In other words, he makes up crazy theories based on zero evidence and talks about them like they're fact. You should ignore him.
I used the leaked die as evidence which means I had evidence.

You are also ignoring that I'm also pulling information from the;
Software Optimization Guide
AMD64 Programmer's Manual
From SteamrollerB slides.

All of which is evidence showing that AMD increased the Fetch/Dispatch/Retire by two times over Bulldozer/Piledriver. While seemingly not increasing the execution capabilities with it.
 
Last edited:
Mar 10, 2006
11,715
2,012
126
Anybody with half a brain would have gobbled up some AMD stock as soon as they started rehiring talent from the golden years, like Keller.

Anybody who bought AMD stock the day Jim Keller moved to AMD saw the stock price cut in more than 1/2 before just now returning to breakeven ;-)
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
As Jim Keller recently have said, they took the best things from both the Bulldozer and Cat Architectures. Jim Keller specifically said they know how to use high frequency IC so im expecting a high frequency design than High IPC low frequency. Also the Cat family exhibits high IPC and high performance per watt (efficiency) so the combination could be very effective.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Well, assuming AMD has rebuilt most of the engineering know how they lost to Samsung and Apple they could definitely put out a competitive big core. Certainly there is room to improve even given the node differential between GF and Intel. I don't do the linkedin digging that some do, though, and have no idea who else besides Keller AMD has re-hired or fresh hired.

Hopefully they've added some cache expertise because that's been one of their weakest areas for a long time.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |