Steamroller core

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
No the article doesn't state that. AT has a good coverage. FP unit is "streamlined" so that it's smaller and yet performs the same as old one. SO basically from execution POV they didn't expand it. Whether the rest of the core changes(which are significant) may help the FP execution ,remains to be seen. Clock frequency may suffer due to other factors (teh core is more complex now).

Any info on the Uncore or the interconnect used?
 

inf64

Diamond Member
Mar 11, 2011
3,763
4,221
136
It seems AMD didn't want to talk about that at this year's HC . AT article states that L3 remains unchanged(it's a part of uncore).
 

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
It seems AMD didn't want to talk about that at this year's HC . AT article states that L3 remains unchanged(it's a part of uncore).

You suppose that they start dumping hypertransport for Desktop CPUs and APUs and using the new Seamicro interconnect from 2013 onwards?
 

inf64

Diamond Member
Mar 11, 2011
3,763
4,221
136
Having two FMA units also helps code without FMA instructions (namely by allowing two additions or two multiplications to execute each cycle).
You expect that FMA units will be able to be fully utilized without specially compiled binaries? I find that hard to believe(look at Bulldozer to see why).

You suppose that they start dumping hypertransport for Desktop CPUs and APUs and using the new Seamicro interconnect from 2013 onwards?
I doubt they will dump HT altogether but I think that current APUs don't use HT(IIRC AMD uses PCIe for Trinity and Ontario).
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
Arggg WTF AMD that's a front loader dude.


That's marketing for you, can't expect the person responsible for this slide to know jack squat about construction equipment or CPU's.

If you want to know about steamroller you have to find yourself an engineer who works on it and gain their trust so they are willing to talk to you about it.

Look at how much was lost in translation between the engineers and JFAMD when JFAMD tried to communicate to us what the engineers had attempted to communicate to him (and I don't just mean the "IPC doesn't go down" part, remember all that "5% increase in die size for an 80% increase in throughput" commentary?)...the folks who are in marketing are in marketing and not in the CPU design division for a reason, a very good reason, and it mostly has something to do with them not knowing jack about CPU designs :whiste: (otherwise they'd be getting paid to design them rather than getting paid to try and sell them)
 

Mr. Pedantic

Diamond Member
Feb 14, 2010
5,039
0
76
I'm confused about the high-density cell libraries. I thought the problem with Bulldozer was that it used a lot of automated transistor placement, and that caused the die size and power problems. But here the article is saying that by using a (possibly different) type of automation they can get lower power consumption and smaller dies? Aren't those two exact opposites? What gives?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
I'm confused about the high-density cell libraries. I thought the problem with Bulldozer was that it used a lot of automated transistor placement, and that caused the die size and power problems. But here the article is saying that by using a (possibly different) type of automation they can get lower power consumption and smaller dies? Aren't those two exact opposites? What gives?
The placement of structures was hand tweaked for high clocks for Bulldozer. Post-Steamroller AMD will not hand tweak but instead automate for maximum density and power savings.
 

Mr. Pedantic

Diamond Member
Feb 14, 2010
5,039
0
76
I see. So high clocks at the expense of power consumption and die size was something that they deliberately aimed for with Bulldozer?

Something else I noticed:



"Bulldozer"
Part of the Floating Point Unit. Hand-drawn for maximum speed and density in 32nm.

With High Density Library
The same blocks again, but built with a High-Density cell library to achieve 30% power and area reductions

I'm still confused...
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
AMD probably compressed Bulldozer by hand as best as they can. The synthetic tweaker is just better than the human tweaker.
 

Mr. Pedantic

Diamond Member
Feb 14, 2010
5,039
0
76
AMD probably compressed Bulldozer by hand as best as they can. The synthetic tweaker is just better than the human tweaker.

I can accept that, but it doesn't gel with this Anandtech article...or is there something I'm missing?

So the problem with Bulldozer was apparently it was too automated.
Now AMD is saying the problem with Bulldozer was it wasn't automated enough.
???

EDIT: Oops, not that article. Some other article. Just got to find it...

EDIT2: This article. Damn, thought it was Anandtech.
 
Last edited:

Xpage

Senior member
Jun 22, 2005
459
15
81
www.riseofkingdoms.com
I really hoped they would have added a Uops cache like intel does, I think that is a major reason why they are currently beating AMD. That and their L2$ is so much faster

Edit: after reading the article it does seem that they added a Uops cahce, so maybe steamroller will give good performance
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
I can accept that, but it doesn't gel with this Anandtech article...or is there something I'm missing?

So the problem with Bulldozer was apparently it was too automated.
Now AMD is saying the problem with Bulldozer was it wasn't automated enough.
???

EDIT: Oops, not that article. Some other article. Just got to find it...

EDIT2: This article. Damn, thought it was Anandtech.

What isn't accurately captured or portrayed in those articles is that the state of the art in synthesis (automated layout) is not static, rather it is advancing at a blistering rate thanks solely due to the increasingly difficult requirements placed on fabless design houses by the foundries as the process nodes increase in complexity at every node (making hand-coded cells all the more arduous node after node).

The commentary regarding bulldozer's reliance on synthesis came from an engineer who left the company 2yrs before bulldozer came out, that is 3yrs ago. 3yrs is an eon in this industry. I'm sure his comments and experience were relevant to the state of synthesis in 2009 with 45nm, not so relevant to the state of synthesis in 2013 with 28nm.

Think of it like this...consider the game of Chess. In 1960 you would not want to bet on a computer competing against world-class chess players, the computer would stink. Same in 1970, and 1980. Computers were slow and not as good as humans.

But what happened in 1997 between Deep Blue and Kasparov? Computer won.

This is what has happened in pretty much every industry that involves engineering. Slowly but surely the software and hardware has evolved to the point where computers can run through millions of simulated models to find more optimal cases than humans could ever hope to achieve - be it with bridges, autos, skyscrapers, or integrated circuits.

It is not that the computers are smarter, its just that they are faster. So they can run through so many more test designs while filtering out the dead-ends faster than a team of humans ever could.

So the limits are not that of the CPU designers but now the limits are on the people who program the synthesis tools themselves. Very much like the limitations in programming that come at the hands of the people who create the compiler tools.

It was only a matter of time before computers would become better than human at designing CPU's. And it is a matter of budget as well. Looks like AMD is saying when you factor in the budget considerations, computers have reached that point now.
 

WhoBeDaPlaya

Diamond Member
Sep 15, 2000
7,414
401
126
I'm confused about the high-density cell libraries. I thought the problem with Bulldozer was that it used a lot of automated transistor placement, and that caused the die size and power problems. But here the article is saying that by using a (possibly different) type of automation they can get lower power consumption and smaller dies? Aren't those two exact opposites? What gives?
Another thing is there are ENORMOUS advances in EDA tools. Seriously - if you compared the QoR of Cadence's SoC Encounter 10.1 and 10.1_USR3, you'd see a marked difference.
<-- Just spent a bunch of time comparing QoR for Encounter 8, 10.1, 10.1 USR3 and 11.1.

Oh, and don't get me started on the f@#$*&ng binary Milkway databases that Synopsys' ICC uses. Damned things can be landmines
 
Last edited:
Mar 10, 2006
11,715
2,012
126
Looks like it'll be a winner. Makes me feel even better about spending 75% of my speculative stock budget on AMD today.
 

Mr. Pedantic

Diamond Member
Feb 14, 2010
5,039
0
76
What isn't accurately captured or portrayed in those articles is that the state of the art in synthesis (automated layout) is not static, rather it is advancing at a blistering rate thanks solely due to the increasingly difficult requirements placed on fabless design houses by the foundries as the process nodes increase in complexity at every node (making hand-coded cells all the more arduous node after node).

The commentary regarding bulldozer's reliance on synthesis came from an engineer who left the company 2yrs before bulldozer came out, that is 3yrs ago. 3yrs is an eon in this industry. I'm sure his comments and experience were relevant to the state of synthesis in 2009 with 45nm, not so relevant to the state of synthesis in 2013 with 28nm.

Think of it like this...consider the game of Chess. In 1960 you would not want to bet on a computer competing against world-class chess players, the computer would stink. Same in 1970, and 1980. Computers were slow and not as good as humans.

But what happened in 1997 between Deep Blue and Kasparov? Computer won.

This is what has happened in pretty much every industry that involves engineering. Slowly but surely the software and hardware has evolved to the point where computers can run through millions of simulated models to find more optimal cases than humans could ever hope to achieve - be it with bridges, autos, skyscrapers, or integrated circuits.

It is not that the computers are smarter, its just that they are faster. So they can run through so many more test designs while filtering out the dead-ends faster than a team of humans ever could.

So the limits are not that of the CPU designers but now the limits are on the people who program the synthesis tools themselves. Very much like the limitations in programming that come at the hands of the people who create the compiler tools.

It was only a matter of time before computers would become better than human at designing CPU's. And it is a matter of budget as well. Looks like AMD is saying when you factor in the budget considerations, computers have reached that point now.

Wow, ok. Thanks, that makes sense. Didn't consider this...
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Wait wait wait... I thought THIS was universally known/admitted to be the problem with Bulldozer - the fact that they moved away from hand-tuned/drawn logic and moved to automated crap which ended up costing them in terms of power, performance AND delays.

(A quick google = first link found)

I wrote a bit about the automation debate a while back; I disagree strongly with that guy.
Here's a post I wrote the last time this came up (the thread has some back-and-forth that you might find interesting... multiple posters there work in the industry).

Anyway, take this as a lesson in the credibility of marketing material and public disclosures from secretive orgamizations . Businesses can be amazingly 1984-like when it comes to anything imperfect, like projects slipping / being canceled... a chip targeted for 1Q2000 might change to "1H2000" (or "2000") with no mention of the 1 quarter (or 3 quarter) slip; a chip can simply disappear from a roadmap, never to be heard of again.

Another thing is there are ENORMOUS advances in EDA tools. Seriously - if you compared the QoR of Cadence's SoC Encounter 10.1 and 10.1_USR3, you'd see a marked difference.
<-- Just spent a bunch of time comparing QoR for Encounter 8, 10.1, 10.1 USR3 and 11.1.

Oh, and don't get me started on the f@#$*&ng binary Milkway databases that Synopsys' ICC uses. Damned things can be landmines

:thumbsup: I hate proprietary binary formats; I have experience being burned by a bug with SBPF a few years back. Even OA can be a pain due to version compatibilities...but at least you have a chance of figuring out if the problem is in the database vs. the tool because you can get OA source code and access to the OA bug database.
 

StrangerGuy

Diamond Member
May 9, 2004
8,443
124
106
Going back to independent decoders for each core.

The past is the future, or something like that. Guess it invalidates the major design ideas behind bulldozer.

4 issue wide decoder per core? Welcome back to 2006, AMD.

How funny that the company that delivers like clockwork has not even officially unveiled their next-gen details but yet the company that doesn't has already done it for their next-next gen.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
Be serious please. Bulldozer is faster than K8,it's around the level of first gen K10 core (65nm Barcelona). This core is ~10-15% faster than K8 in integer and 1.5-2x faster in SSE workloads. In FP stuff Bulldozer is just crushing K8 (not hard since K8 has 64bit FPU).

Clock for clock, K10 is significantly faster than BD especially in integer workloads. So perhaps maybe my comment should have mentioned the possibility of a return to K10 performance levels.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
People was super hyped about Pilediver too. That didnt go well...

Its a rerun every single time.
 

Medu

Member
Mar 9, 2010
149
0
76
This will probably be about 4 years too late. I assume AMD are starting to get back to where they had planned to be when they started to envisage this line of CPU's back in ~2005. As already pointed out they are back tracking on some of the major changes they made as they just didn't work.
 

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
:biggrin:On a comical note, If i've learned anything through this whole fiasco of Bulldozer, now PileDriver and possibly, if they don't kill it for financial reasons, Steamroller is that AMD has "cool" slides. But boy are they hyped with double speak! Example, "Steamroller feeds the core faster" Faster than what? Bulldozer? That's like saying I run faster when the only other contestant in the race is my late grandma in a wheelchair! I know I'm being somewhat cynical, but seriously, look at those slides!
"improve single-core execution" Hello?
 
Last edited:

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
People was super hyped about Pilediver too. That didnt go well...

Its a rerun every single time.
As far as I can tell, AMD hit the numbers that they were claiming for Piledriver. Piledriver's focus wasn't so much on performance: it was to put a clamp down the ridiculous power consumption. We don't know if that will translate to better overclocking, which is theoretically Vishera's biggest draw. At any rate, it will finally be an upgrade for AM3 users. There are some people expecting 15-20% performance improvement over Bulldozer... and I really think those expectations are ridiculous. 5-10% is my guess... like Ivy Bridge, but without the solder fiasco.

As far as Steamroller goes... there's absolutely no doubt that the numbers AMD is claiming are impressive. If they do hit these targets, this would bring AMD from "laughable" to "laudable." This would seriously bring AMD back into the game.

The problem with Bulldozer is that AMD's hype of it didn't have any hard numbers &#8212; they just pushed CMT being superior to HT. Here, pretty much every single one of the flaws that Johan DeGalas presented in his Bulldozer Aftermath article here on AT is being addressed. We've got specifics this time around, and these specifics are exciting.

I'm not so excited about SR being used in a "high end" desktop part. What really excites me is Kaveri. I'm going to assume 20% increase in CPU performance over Trinity. If that's the case, Kaveri will actually be a very enticing part next year.

Of course, with AMD, you can't take their word for anything. We'll see how GloFo and AMD handle their transition to 28nm bulk.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |