First Steamroller processor core exposure

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

itsmydamnation

Platinum Member
Feb 6, 2011
2,863
3,413
136
also L1i

the only thing that has moved from shared to dedicated is decode. If AMD really wanted to they could have expanded decode and still kept it shared. By beefing up the execution resources they actually show the value of CMT. Go look at piledrivers dieshot then look at this, we are talking about the doubling of lots of resources but its nowhere near double the diesize.
 
Mar 10, 2006
11,715
2,012
126
also L1i

the only thing that has moved from shared to dedicated is decode. If AMD really wanted to they could have expanded decode and still kept it shared. By beefing up the execution resources they actually show the value of CMT. Go look at piledrivers dieshot then look at this, we are talking about the doubling of lots of resources but its nowhere near double the diesize.

AMD may see an increase in cost per chip by increasing die size, but they will at least get far better sales than they currently do, and may even have improved pricing power.

It will be interesting to see how SR compares with Haswell.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86


Maybe this diagram is off or my interpretation but it looks like fetch and L1I are separate here.

Makes business sense to go directly for most single thread performance, the 8 "core" approach didn't really make a big splash in the server market, and worrying less about die size (GF WSA), rolling back CMT is a R&D light way of increasing ST performance.

In another thread I did some straightforward (perfect scaling) +15% IPC +10% clocks on a FX 6300 and that would be just a bit behind a 3570K in ST but ~20% faster in MT. Stands to reason a 2 Module version would get pretty close to a stock 3570K in both ST and MT (bit higher clocks than 3 module). http://forums.anandtech.com/showthread.php?t=2321195
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,863
3,413
136
quite a few of the structures that are individual in fetch-0 and fetch-1 have two simlar structures in the piledriver dieshotswhere they are next to each other . The problem is the piledriver/bulldozer dieshots are way lower rez then this dieshot so its quite hard to compare.
 

blastingcap

Diamond Member
Sep 16, 2010
6,654
5
76
Is Steamroller 970 chipset (AM3+) compatible? I haven't been keeping track of latest rumors.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Is Steamroller 970 chipset (AM3+) compatible? I haven't been keeping track of latest rumors.

There are no Steamroller AM3+ chips on any roadmaps. It seems the socket is dead in favour of FM2+. Single socket servers will use FM2+ as well.
 

del42sa

Member
May 28, 2013
65
65
91


Maybe this diagram is off or my interpretation but it looks like fetch and L1I are separate here.

Makes business sense to go directly for most single thread performance, the 8 "core" approach didn't really make a big splash in the server market, and worrying less about die size (GF WSA), rolling back CMT is a R&D light way of increasing ST performance.

In another thread I did some straightforward (perfect scaling) +15% IPC +10% clocks on a FX 6300 and that would be just a bit behind a 3570K in ST but ~20% faster in MT. Stands to reason a 2 Module version would get pretty close to a stock 3570K in both ST and MT (bit higher clocks than 3 module). http://forums.anandtech.com/showthread.php?t=2321195


Compare original Steamroller as presented on Hot chips with this "new one". They doesn´t look the same....

 

blastingcap

Diamond Member
Sep 16, 2010
6,654
5
76
There are no Steamroller AM3+ chips on any roadmaps. It seems the socket is dead in favour of FM2+. Single socket servers will use FM2+ as well.

Wow. What an F U to AMD's current customers. That just seals the deal as far as my not going back to AMD for CPUs.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Compare original Steamroller as presented on Hot chips with this "new one". They doesn´t look the same....


Yes, no idea if this is actually Steamroller but it does seem to have regressed more in CMT choices than the Hot Chips one.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
Yes, no idea if this is actually Steamroller but it does seem to have regressed more in CMT choices than the Hot Chips one.

Slowly but surely, AMD is undoing all of the mistakes they made with Bulldozer, and admitting that CMT has too much of a single threaded performance penalty for it to be worth it.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,863
3,413
136
Slowly but surely, AMD is undoing all of the mistakes they made with Bulldozer, and admitting that CMT has too much of a single threaded performance penalty for it to be worth it.

But it doesn't have one, the only people who say that are people who cant seperate what CMT is vs what bulldozer is. CMT's bottlenecks only occurred with Multithreaded workloads but even that was caused by design choices not CMT itself.

please name one restriction that CMT imposes on single thread performance.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
But it doesn't have one, the only people who say that are people who cant seperate what CMT is vs what bulldozer is. CMT's bottlenecks only occurred with Multithreaded workloads but even that was caused by design choices not CMT itself.

please name one restriction that CMT imposes on single thread performance.

Sorry, I worded that badly. Single threaded performance suffered because they stripped out some of its integer prowess and increased the pipeline length. Multi threaded performance suffered because of this resource sharing idea, which now even AMD admits was a mistake.

It looks like, to redeem FailDozer, AMD is moving away from CMT. Less and less is being shared.
 

Arzachel

Senior member
Apr 7, 2011
903
76
91
Sorry, I worded that badly. Single threaded performance suffered because they stripped out some of its integer prowess and increased the pipeline length. Multi threaded performance suffered because of this resource sharing idea, which now even AMD admits was a mistake.

It looks like, to redeem FailDozer, AMD is moving away from CMT. Less and less is being shared.

And again, AMD is not "moving away" from CMT, they're moving to a different implementation. To have a properly inane comparison, you don't hear that Intel is moving away from x86 decode when they power down the frontend on micro op cache hits.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
And again, AMD is not "moving away" from CMT, they're moving to a different implementation. To have a properly inane comparison, you don't hear that Intel is moving away from x86 decode when they power down the frontend on micro op cache hits.

So they are sharing less between cores?

Calling it a different implementation, to me, is an attempt to not own up to the fact that it sucked, badly. Yes, its a different implementation, and the implementation is closer to full cores than CMT, compared to FailDozer.

Read the Anandtech article on Steamroller's changes.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
and the implementation is closer to full cores than CMT, compared to FailDozer.

No, if they can execute 4 threads within a single Module.

This implementation could be like a Single Module, 2 Cores (CMT) with 4 Threads (SMT) or,
Single Module, 2 Cores 4 Threads (CMT) like BD/PD.I’m leaning towards this one.

Either way, they continue evolving the CMT design.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
No, if they can execute 4 threads within a single Module.

This implementation could be like a Single Module, 2 Cores (CMT) with 4 Threads (SMT) or,
Single Module, 2 Cores 4 Threads (CMT) like BD/PD.I’m leaning towards this one.

Either way, they continue evolving the CMT design.

Please post a link supporting this idea that a single module will execute 4 threads?
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Please post a link supporting this idea that a single module will execute 4 threads?

The die pic shows 4 ALUs + 4 AGUs per Integer Core. I don’t believe that they will use 8 pipes per Thread. The utilization of all 8 pipes from a single thread will be very low, not to mention the performance gains to die area ratio used will be even lower. This implementation is surely a 4 Threads design.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
The die pic shows 4 ALUs + 4 AGUs per Integer Core. I don’t believe that they will use 8 pipes per Thread. The utilization of all 8 pipes from a single thread will be very low, not to mention the performance gains to die area ratio used will be even lower. This implementation is surely a 4 Threads design.

That would be contrary to everything AMD has said about Steamroller. I'll believe it when Anandtech writes a detailed article on it.
 

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
Wow. What an F U to AMD's current customers. That just seals the deal as far as my not going back to AMD for CPUs.
I suspected this, thus my purchase of a 8350. However, in fairness Intel does this. It was 1366, then 1156, them 1155 and within a week 1150. Progress. And implementation of faster overall systems.

I would prefer AMD to use a new chipset with all the bells and whistles and release a Steamroller that is powerful enough that even Elvis would sing "I'm a Steamroller Baby, and I'm going to roll over you!":awe:
 

inf64

Diamond Member
Mar 11, 2011
3,763
4,221
136
Well you cannot deny there are 4 ALUs and 4AGUs in the die shot. And to expect that all of these resources (+ the FP ones) would be effectively utilized without some form of SMT/CMT is very optimistic. 4 thread per module sounds about right for this configuration of exec. resources.
 

del42sa

Member
May 28, 2013
65
65
91
That would be contrary to everything AMD has said about Steamroller. I'll believe it when Anandtech writes a detailed article on it.

Anandtech talked about SR core. We don´t know if this is SR or XV or anything else. Look at Jaguar. One Compute Unit (CU) has 4 cores/4threads now.
Bulldozer brought a module with 2 INT clusters processing two threads. Evolving the idea of high throughput/thread parallelization and you´ll get one module with ability processing four threads CMT/SMT all the way.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |