First Steamroller processor core exposure

Riek · May 28, 2013

Vesku said:
So what's left of CMT in this, shared L2? Branch predictor?

floating point unit and fetch also

itsmydamnation · May 28, 2013

also L1i

the only thing that has moved from shared to dedicated is decode. If AMD really wanted to they could have expanded decode and still kept it shared. By beefing up the execution resources they actually show the value of CMT. Go look at piledrivers dieshot then look at this, we are talking about the doubling of lots of resources but its nowhere near double the diesize.

Arachnotronic · May 28, 2013

itsmydamnation said:
also L1i

the only thing that has moved from shared to dedicated is decode. If AMD really wanted to they could have expanded decode and still kept it shared. By beefing up the execution resources they actually show the value of CMT. Go look at piledrivers dieshot then look at this, we are talking about the doubling of lots of resources but its nowhere near double the diesize.

AMD may see an increase in cost per chip by increasing die size, but they will at least get far better sales than they currently do, and may even have improved pricing power.

It will be interesting to see how SR compares with Haswell.

Vesku · May 28, 2013

Maybe this diagram is off or my interpretation but it looks like fetch and L1I are separate here.

Makes business sense to go directly for most single thread performance, the 8 "core" approach didn't really make a big splash in the server market, and worrying less about die size (GF WSA), rolling back CMT is a R&D light way of increasing ST performance.

In another thread I did some straightforward (perfect scaling) +15% IPC +10% clocks on a FX 6300 and that would be just a bit behind a 3570K in ST but ~20% faster in MT. Stands to reason a 2 Module version would get pretty close to a stock 3570K in both ST and MT (bit higher clocks than 3 module). http://forums.anandtech.com/showthread.php?t=2321195

itsmydamnation · May 28, 2013

quite a few of the structures that are individual in fetch-0 and fetch-1 have two simlar structures in the piledriver dieshotswhere they are next to each other . The problem is the piledriver/bulldozer dieshots are way lower rez then this dieshot so its quite hard to compare.

blastingcap · May 28, 2013

Is Steamroller 970 chipset (AM3+) compatible? I haven't been keeping track of latest rumors.

ShintaiDK · May 28, 2013

blastingcap said:
Is Steamroller 970 chipset (AM3+) compatible? I haven't been keeping track of latest rumors.

There are no Steamroller AM3+ chips on any roadmaps. It seems the socket is dead in favour of FM2+. Single socket servers will use FM2+ as well.

del42sa · May 28, 2013

Vesku said:
Maybe this diagram is off or my interpretation but it looks like fetch and L1I are separate here.

Makes business sense to go directly for most single thread performance, the 8 "core" approach didn't really make a big splash in the server market, and worrying less about die size (GF WSA), rolling back CMT is a R&D light way of increasing ST performance.

In another thread I did some straightforward (perfect scaling) +15% IPC +10% clocks on a FX 6300 and that would be just a bit behind a 3570K in ST but ~20% faster in MT. Stands to reason a 2 Module version would get pretty close to a stock 3570K in both ST and MT (bit higher clocks than 3 module). http://forums.anandtech.com/showthread.php?t=2321195

Compare original Steamroller as presented on Hot chips with this "new one". They doesn´t look the same....

blastingcap · May 28, 2013

ShintaiDK said:
There are no Steamroller AM3+ chips on any roadmaps. It seems the socket is dead in favour of FM2+. Single socket servers will use FM2+ as well.

Wow. What an F U to AMD's current customers. That just seals the deal as far as my not going back to AMD for CPUs.

Vesku · May 28, 2013

del42sa said:
Compare original Steamroller as presented on Hot chips with this "new one". They doesn´t look the same....

Yes, no idea if this is actually Steamroller but it does seem to have regressed more in CMT choices than the Hot Chips one.

Ancalagon44 · May 28, 2013

Vesku said:
Yes, no idea if this is actually Steamroller but it does seem to have regressed more in CMT choices than the Hot Chips one.

Slowly but surely, AMD is undoing all of the mistakes they made with Bulldozer, and admitting that CMT has too much of a single threaded performance penalty for it to be worth it.

itsmydamnation · May 28, 2013

Ancalagon44 said:
Slowly but surely, AMD is undoing all of the mistakes they made with Bulldozer, and admitting that CMT has too much of a single threaded performance penalty for it to be worth it.

But it doesn't have one, the only people who say that are people who cant seperate what CMT is vs what bulldozer is. CMT's bottlenecks only occurred with Multithreaded workloads but even that was caused by design choices not CMT itself.

please name one restriction that CMT imposes on single thread performance.

ShintaiDK · May 28, 2013

blastingcap said:
Wow. What an F U to AMD's current customers. That just seals the deal as far as my not going back to AMD for CPUs.

I dont see it as a bad thing, on the contrary. Its better for everyone and leads to better products.

Greenlepricon · May 28, 2013

ShintaiDK said:
I dont see it as a bad thing, on the contrary. Its better for everyone and leads to better products.

I agree with this. If AMD wanted to keep AM2 around to appeal to that market, they would be in a far worse mess.

Ancalagon44 · May 28, 2013

itsmydamnation said:
But it doesn't have one, the only people who say that are people who cant seperate what CMT is vs what bulldozer is. CMT's bottlenecks only occurred with Multithreaded workloads but even that was caused by design choices not CMT itself.

please name one restriction that CMT imposes on single thread performance.

Sorry, I worded that badly. Single threaded performance suffered because they stripped out some of its integer prowess and increased the pipeline length. Multi threaded performance suffered because of this resource sharing idea, which now even AMD admits was a mistake.

It looks like, to redeem FailDozer, AMD is moving away from CMT. Less and less is being shared.

Arzachel · May 28, 2013

Ancalagon44 said:
Sorry, I worded that badly. Single threaded performance suffered because they stripped out some of its integer prowess and increased the pipeline length. Multi threaded performance suffered because of this resource sharing idea, which now even AMD admits was a mistake.

It looks like, to redeem FailDozer, AMD is moving away from CMT. Less and less is being shared.

And again, AMD is not "moving away" from CMT, they're moving to a different implementation. To have a properly inane comparison, you don't hear that Intel is moving away from x86 decode when they power down the frontend on micro op cache hits.

Ancalagon44 · May 28, 2013

Arzachel said:
And again, AMD is not "moving away" from CMT, they're moving to a different implementation. To have a properly inane comparison, you don't hear that Intel is moving away from x86 decode when they power down the frontend on micro op cache hits.

So they are sharing less between cores?

Calling it a different implementation, to me, is an attempt to not own up to the fact that it sucked, badly. Yes, its a different implementation, and the implementation is closer to full cores than CMT, compared to FailDozer.

Read the Anandtech article on Steamroller's changes.

AtenRa · May 28, 2013

Ancalagon44 said:
and the implementation is closer to full cores than CMT, compared to FailDozer.

No, if they can execute 4 threads within a single Module.

This implementation could be like a Single Module, 2 Cores (CMT) with 4 Threads (SMT) or,
Single Module, 2 Cores 4 Threads (CMT) like BD/PD.Im leaning towards this one.

Either way, they continue evolving the CMT design.

Ancalagon44 · May 28, 2013

AtenRa said:
No, if they can execute 4 threads within a single Module.

This implementation could be like a Single Module, 2 Cores (CMT) with 4 Threads (SMT) or,
Single Module, 2 Cores 4 Threads (CMT) like BD/PD.Im leaning towards this one.

Either way, they continue evolving the CMT design.

Please post a link supporting this idea that a single module will execute 4 threads?

AtenRa · May 28, 2013

Ancalagon44 said:
Please post a link supporting this idea that a single module will execute 4 threads?

The die pic shows 4 ALUs + 4 AGUs per Integer Core. I dont believe that they will use 8 pipes per Thread. The utilization of all 8 pipes from a single thread will be very low, not to mention the performance gains to die area ratio used will be even lower. This implementation is surely a 4 Threads design.

Ancalagon44 · May 28, 2013

AtenRa said:
The die pic shows 4 ALUs + 4 AGUs per Integer Core. I dont believe that they will use 8 pipes per Thread. The utilization of all 8 pipes from a single thread will be very low, not to mention the performance gains to die area ratio used will be even lower. This implementation is surely a 4 Threads design.

That would be contrary to everything AMD has said about Steamroller. I'll believe it when Anandtech writes a detailed article on it.

guskline · May 28, 2013

blastingcap said:
Wow. What an F U to AMD's current customers. That just seals the deal as far as my not going back to AMD for CPUs.

I suspected this, thus my purchase of a 8350. However, in fairness Intel does this. It was 1366, then 1156, them 1155 and within a week 1150. Progress. And implementation of faster overall systems.

I would prefer AMD to use a new chipset with all the bells and whistles and release a Steamroller that is powerful enough that even Elvis would sing "I'm a Steamroller Baby, and I'm going to roll over you!":awe:

inf64 · May 28, 2013

Well you cannot deny there are 4 ALUs and 4AGUs in the die shot. And to expect that all of these resources (+ the FP ones) would be effectively utilized without some form of SMT/CMT is very optimistic. 4 thread per module sounds about right for this configuration of exec. resources.

del42sa · May 28, 2013

Ancalagon44 said:
That would be contrary to everything AMD has said about Steamroller. I'll believe it when Anandtech writes a detailed article on it.

Anandtech talked about SR core. We don´t know if this is SR or XV or anything else. Look at Jaguar. One Compute Unit (CU) has 4 cores/4threads now.
Bulldozer brought a module with 2 INT clusters processing two threads. Evolving the idea of high throughput/thread parallelization and you´ll get one module with ability processing four threads CMT/SMT all the way.

del42sa · May 28, 2013

sorry for double post

First Steamroller processor core exposure

Senior member

Platinum Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Lifer

Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Lifer

Senior member

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Member

Member