AMD post-Bulldozer x86 CPU architecture

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Thanks, you are right. I never realized intel released a separate die for 4C SB-E.

My point still stands that this is not the CPU to compare to the 8350 due to the massive cache and quad channel memory support. Not when identically performing and cheaper 4 core alternatives are available.

FX8350
4x Modules , 8 Cores/Threads
Die size = 315mm2
Transistor count = 1.2B

L2 Cache = 4x 2MB = 8MB
L3 Cache = 4x 2MB = 8MB

Memory Controller = 2x 72bit
HyperTransport = 4x 16-16bit


Core i7 3820
4 Cores + HT , 8 Threads
Die size = 297mm2
Transistor Count = 1.27B

L2 Cache = 4x 256kb = 2MB
L3 Cache = 4x 2.5MB = 10MB

Memory Controller = 4x 72bit

Those two are made for servers not desktop, the Bulldozer die has more Cache in total, 16MB of L2 and L3 when Sandybridge-E has 12MB of L2 and L3. Also the Bulldozer has 4x Hypertransports (three are disabled in the desktop) and SB-E has Quad Memory Channel.

If you compare those two in servers (and Desktop) you will see that they are very competitive against each other. One is better here, the other is better there but you cannot say BD or SB-E is an all around better solution than the other. And certainly you cannot say that Bulldozer is a looser that some people continue preaching the past 2+ years.

Now if you remove some of the Cache and the HyperTransports (Trinity only has L2 Cache and no HyperTransports) you end up with a much smaller die that has 95% of the performance. A Quad Module 8 Core Trinity would be close to 180-200mm2 including the PCH. They could actually make a Quad Module 8 Core Trinity + iGPU and still be smaller than Bulldozer die.
Just to remind you that Core i7 2600K is 216mm2 (including an anemic iGPU).
 

coercitiv

Diamond Member
Jan 24, 2014
6,400
12,859
136
A Quad Module 8 Core Trinity would be close to 180-200mm2 including the PCH. They could actually make a Quad Module 8 Core Trinity + iGPU and still be smaller than Bulldozer die.
Kaveri die area is 245mm2, out of which 47% is GPU (number form AMD). The CPU + Cache area is around 70mm2 (aproximation using die picture).

So, a 4 module Kaveri (corrected from 2) with no GPU would be around 130+70 ~ 200mm2
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Kaveri die area is 245mm2, out of which 47% is GPU (number form AMD). The CPU + Cache area is around 70mm2 (aproximation using die picture).

So, a 2 module Kaveri with no GPU would be around 130+70 ~ 200mm2

You mean a 4(quad) Module would be close to 200mm2. Actually a single Module + L2 = ~30mm2
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
How did AMD agree? They just can't do it. We've already seen cheap cores introduced years ago. Now their offerings are APUs which are anemic with system RAM. When there's little competition, they can easily cut features, etc. to segment the market. What do you think of the internet providers? There are people receiving packages as low as or lower than 1 megabit. We're falling behind on that to the rest of the world.

Just to preempt, it's not because the software isn't there. Who is going to develop for more than two cores if many people have bought Intel's two core offerings?

You make it sound like its so easy to code for more therads that the only thing holding us back is CPU core amounts. You couldnt be more wrong.

AMD, like Intel, have acknowledged that the wast majority of users want a mix of cores and GPU. In AMDs case with 47% of the die is now devoted to GPU. I am sure they could make a 6 core version for example with a slower GPU. But in short, there is no market for it. Same goes for Intel, the market is so small that its spinoffs from the server segment.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
FX8350
4x Modules , 8 Cores/Threads
Die size = 315mm2
Transistor count = 1.2B

L2 Cache = 4x 2MB = 8MB
L3 Cache = 4x 2MB = 8MB

Memory Controller = 2x 72bit
HyperTransport = 4x 16-16bit


Core i7 3820
4 Cores + HT , 8 Threads
Die size = 297mm2
Transistor Count = 1.27B

L2 Cache = 4x 256kb = 2MB
L3 Cache = 4x 2.5MB = 10MB

Memory Controller = 4x 72bit

Those two are made for servers not desktop, the Bulldozer die has more Cache in total, 16MB of L2 and L3 when Sandybridge-E has 12MB of L2 and L3. Also the Bulldozer has 4x Hypertransports (three are disabled in the desktop) and SB-E has Quad Memory Channel.

If you compare those two in servers (and Desktop) you will see that they are very competitive against each other. One is better here, the other is better there but you cannot say BD or SB-E is an all around better solution than the other. And certainly you cannot say that Bulldozer is a looser that some people continue preaching the past 2+ years.

Now if you remove some of the Cache and the HyperTransports (Trinity only has L2 Cache and no HyperTransports) you end up with a much smaller die that has 95% of the performance. A Quad Module 8 Core Trinity would be close to 180-200mm2 including the PCH. They could actually make a Quad Module 8 Core Trinity + iGPU and still be smaller than Bulldozer die.
Just to remind you that Core i7 2600K is 216mm2 (including an anemic iGPU).

I'm just saying that your comparison makes little sense. As a business you would only buy a i7 3820 if you needed quad channel support as the LGA 2011 platform is more expensive. You are picking an odd duck CPU for comparison why not simply use an E3 xeon? You lose 2 MB cache and quad channel support but this will make little difference in real world applications.

For 4 core performance LGA 2011 is not the way to go.

And as far as power goes being 4 core has its advantages, using way less power than a 8150 or 8350.

 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
I'm just saying that your comparison makes little sense. As a business you would only buy a i7 3820 if you needed quad channel support as the LGA 2011 platform is more expensive. You are picking an odd duck CPU for comparison why not simply use an E3 xeon? You lose 2 MB cache and quad channel support but this will make little difference in real world applications.

As i have said before, those two are made for servers. More Cache and more ram is perfect for servers, especially 2P and 4P. Those SKUs where specifically designed for Server use, they just found there way in Desktop simple because AMD doesnt have another SKU to sell at the High-End Desktop Segment.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
I really hope I don't need to remind you guys that this is an AMD thread. So I'm getting tired of moving out Intel posts. We have other threads for that, please use them.
-ViRGE
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
I'm just saying that your comparison makes little sense. As a business you would only buy a i7 3820 if you needed quad channel support as the LGA 2011 platform is more expensive. You are picking an odd duck CPU for comparison why not simply use an E3 xeon? You lose 2 MB cache and quad channel support but this will make little difference in real world applications.

That CPU you are picking is a Xeon chip, used in the cases when you need the highest ST performance possible and the highest memory bandwidth possible, and/or PCIe 3.0, and/or QPI links. It has features that BD/PD don't have and packs more raw performance by almost any metrics you use to measure, and it's more power efficient.

In any case, it's has a very specific scenario usage, not what you would pick for the kind of scenario you would consider *any* AMD processor. It's not even the same business case AMD makes to sell their BD/PD server chips, which is maximum throughput possible (the same case of Xeon 8C BTW).
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Now if you remove some of the Cache and the HyperTransports (Trinity only has L2 Cache and no HyperTransports) you end up with a much smaller die that has 95% of the performance. A Quad Module 8 Core Trinity would be close to 180-200mm2 including the PCH. They could actually make a Quad Module 8 Core Trinity + iGPU and still be smaller than Bulldozer die.
Just to remind you that Core i7 2600K is 216mm2 (including an anemic iGPU).

What's the point of making up some hypothetical CPU that doesn't exist in your battle against Intel?
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
Technical conversation, you know one of the reasons this forum exists

Indeed, I have always thought of a hypotetical 4 module variant without that slow and fat L3. If AMD was a company able to afford more die diversification in their product lines, it would be a really competitive product, perf/mm2 wise.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
Indeed, I have always thought of a hypotetical 4 module variant without that slow and fat L3. If AMD was a company able to afford more die diversification in their product lines, it would be a really competitive product, perf/mm2 wise.

Given that the Bulldozer failure was known since end of 2010, don't you think they would have pursued that route if it was really a viable path?
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Given that the Bulldozer failure was known since end of 2010, don't you think they would have pursued that route if it was really a viable path?

The road map for Steamroller was to be on 22nm SOI when Dirk Meyer was CEO. There would be a 5 Module 10 Cores Server and Desktop SKU. That design was canceled with the new administration of Rory Read.

At 22nm SOI a 10 Core 90W-125W TDP Desktop part would be more than enough to compete in Throughput with 4 Core 8 Threads Haswell. The Steamroller is very nice and efficient design, at 22nm SOI a 10 Core SKU could be smaller than Bulldozer/Vishera and much much faster with much better Performance per Watt.
Unfortunately for all of us enthusiasts, the PC Market dicline and Rory's new plans made that design a thing of the past. I also believe that a part to cancel that design was made due to GloFos change of 22nm process and their desire to pursue the Low Power Mobile market that was on the rise two-three years ago.

Now AMD will be without a high-End part for two more years, but at least they will make some money and have a chance to fight another day in the future.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
The road map for Steamroller was to be on 22nm SOI when Dirk Meyer was CEO. There would be a 5 Module 10 Cores Server and Desktop SKU. That design was canceled with the new administration of Rory Read.

At 22nm SOI a 10 Core 90W-125W TDP Desktop part would be more than enough to compete in Throughput with 4 Core 8 Threads Haswell. The Steamroller is very nice and efficient design, at 22nm SOI a 10 Core SKU could be smaller than Bulldozer/Vishera and much much faster with much better Performance per Watt.
Unfortunately for all of us enthusiasts, the PC Market dicline and Rory's new plans made that design a thing of the past. I also believe that a part to cancel that design was made due to GloFos change of 22nm process and their desire to pursue the Low Power Mobile market that was on the rise two-three years ago.

Now AMD will be without a high-End part for two more years, but at least they will make some money and have a chance to fight another day in the future.

Thanks for that explanation.

Sadly, the WSA is going to continue to handicap AMD in x86. Whatever core AMD comes up with next is going to have to run on a 14nm LP node. But, if this next core is a desktop APU, I'm sure it will offer up much better perf/watt than EX. AMD really needs to have EDRAM by then, or something better to at least stay ahead in iGPU.

Though I thought that ARM was a mistake for AMD, it could be a much better bet now with AMD doing a custom core and having access to Samsung's 14nm node @ GF. AMD has some significant experience with servers and CPU design that could help them make some in-roads. 14nn FinFet could meet their throughput needs at very low power. The more I'm reading, the more I see that allot of the big DOTCOMs want to get off Intel - if they can. So there is reason to have some hope that AMD will continue as a multi-billion dollar enterprise.

AMD's debt obligations are serious threat to the company and I don't know how they will manage that.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
Thanks for that explanation.

Sadly, the WSA is going to continue to handicap AMD in x86. Whatever core AMD comes up with next is going to have to run on a 14nm LP node. But, if this next core is a desktop APU, I'm sure it will offer up much better perf/watt than EX. AMD really needs to have EDRAM by then, or something better to at least stay ahead in iGPU.

Though I thought that ARM was a mistake for AMD, it could be a much better bet now with AMD doing a custom core and having access to Samsung's 14nm node @ GF. AMD has some significant experience with servers and CPU design that could help them make some in-roads. 14nn FinFet could meet their throughput needs at very low power. The more I'm reading, the more I see that allot of the big DOTCOMs want to get off Intel - if they can. So there is reason to have some hope that AMD will continue as a multi-billion dollar enterprise.

AMD's debt obligations are serious threat to the company and I don't know how they will manage that.

I would expect HBM, not EDRAM.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
AMD needs to do something smarter with their cores. Like a big.Little approach: two really big cores mated with 4 cat cores. The big cores would have double the FPU and INT ports, maybe even triple. The decoders, OoE and branch predictor would all be doubled in size as well. And most importantly, they need to cut the cache latency in half. They need to profile the crap out of javascript and make those instruction combinations execute extremely fast. Maybe even add a fixed function javascript dsp or something like that. And of course one for DirectX too. There is no reason why their cpus shouldnt execute gaming code faster than intel given their expertise in graphics. They have the ability to profile millions of lines of gaming assembly code, but they clearly arent doing it at all.
 
Last edited:

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,425
8,388
126

I don't get it.

I guess what I'm saying is, JavaScript seems to be fast enough on big cores. I'd be really interested in profiling gaming code to make sure that runs fast (although we're going to be so graphics limited at 4k that processors almost won't matter).
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
The road map for Steamroller was to be on 22nm SOI when Dirk Meyer was CEO. There would be a 5 Module 10 Cores Server and Desktop SKU. That design was canceled with the new administration of Rory Read.

At 22nm SOI a 10 Core 90W-125W TDP Desktop part would be more than enough to compete in Throughput with 4 Core 8 Threads Haswell.

But hen Intel would come out with a 14nm octa core with a 150mm die for $156 that would blow the AMD chip away.

See how others can make up stuff too?
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,938
407
126
Sadly, the WSA is going to continue to handicap AMD in x86. Whatever core AMD comes up with next is going to have to run on a 14nm LP node. But, if this next core is a desktop APU, I'm sure it will offer up much better perf/watt than EX.

This WSA deal with Global Foundries keeps popping up in discussions from time to time. Just what exactly are the terms of that agreement, and won't the XBONE and PS4 chips be eating away a lot of the wafers AMD have signed up for?

How long until the WSA agreement is likely to come to an end, so AMD is free to select other options for their next-gen x86 CPUs?

And what other options for a high performance process tech are actually available, if AMD could choose freely without being bound by the WSA?
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,938
407
126
WSA last to 2024.

Ok, and what about the other issues I raised in my previous post? E.g. what are the terms, does AMD have to buy a certain number of wafers per year? Are they free to also buy wafers from other suppliers? What options exist in that case?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |