S/A AMD Trinity Die Pic

Flipped Gazelle · Jan 6, 2012

Idontcare said:
...
Doesn't this seem just a bit "penny wise but pound foolish" here? Why not widen out those castrated CMT cores back to being CMP demons, make your die a, gasp, 250mm^2 chip instead of the 240mm^2 it currently is, and put yourself in a position to field some competive IPC.

Is it possible that AMD is in such a resources bind that it can't even do this?

Flipped Gazelle · Jan 6, 2012

AtenRa said:
I have made a mistake, Llano and Trinity doesnt have HyperTransports. Those are the PCIe, PLL and Display IO.

So... are you the individual who provided S/A with the original pic?

AtenRa · Jan 6, 2012

Flipped Gazelle said:
So... are you the individual who provided S/A with the original pic?

Nope, i was referring to my first post

Vesku · Jan 6, 2012

Flipped Gazelle said:
Is it possible that AMD is in such a resources bind that it can't even do this?

It is expensive to do, certainly, but it would also take more than a year from start date. Judging from the unlocked Llanos I'm not confident any design would shine without improvements in GF 32nm.

wlee15 · Jan 6, 2012

Phynaz said:
AMD has said a lot of things :\

AMD is also quite good on the follow-up, and after all they did demo a working Trinity laptop in June. Plus much of the legwork for Trinity was laid with Llano, and there's no HT or L3 cache (not to mention two module) so Trinity is a comparatively more simple processor than Zambezi.

Troll Trolling · Jan 6, 2012

Idontcare said:
One thing that blows me away when I see the pic of Trinity, or Llano for that matter, is just how little real-estate is devoted to the core logic itself and yet it is the performance characteristics of that core logic that nearly entirely determines the selling price of the entire IC.

I mean just look at the ratio of CPU logic (difficult and complex to design) to the L2$ area (a dumb/easy copy-and-paste cell if there ever was one in design/layout).

Now obviously I am not saying this without consideration for all the obvious stuff (Pollack's Rule, Amdahl's Rule, Godwin's Rule, Hanlon's Rule, etc), but I can't help to feel that if they would only just throw their design/layout guys another 10mm^2/core budget then they'd be able to buy themselves some serious IPC improvements without necessarily making an already rather large IC become all that much larger.

Just look at the die area afforded to the hypertransport IO, or the DDR3 mem controller. Those parts of the CPU occupy nearly as much area as an entire bulldozer module, and yet it is the bulldozer module that is a performance-degrading CMT design by design just so a few mm^2 can be saved in a die that is already well over 200mm^2.

Doesn't this seem just a bit "penny wise but pound foolish" here? Why not widen out those castrated CMT cores back to being CMP demons, make your die a, gasp, 250mm^2 chip instead of the 240mm^2 it currently is, and put yourself in a position to field some competive IPC.

I think it is more about they wouldn't know what to do with it. I mean, increase cache's size and number of registers is one thing. Actually putting better algorithms, reducing cache's latency, reducing instruction's latency, putting more resources and actually managing to make use of them (aren't the Phenom's 3d ALU, or something like it, said to bring minimal amounts of performance because it wasn't actually being used most of the time?), all the while making it not a power hog is another completely different thing.

BallaTheFeared · Jan 7, 2012

Doesn't the current top APU have 400 shaders?

Is this a 5870 to 6970 type deal?

maddie · Jan 7, 2012

BallaTheFeared said:
Doesn't the current top APU have 400 shaders?

Is this a 5870 to 6970 type deal?

My thoughts also. Where is the large GPU increase?

Coup27 · Jan 7, 2012

antisocialmunky said:
What degree do you need to make sense out of this robot porn?

You're not the only one who looks at die shots as nothing more than a rainbow of pretty colours. Haven't a clue how people see all the different parts from them.

Idontcare · Jan 7, 2012

Coup27 said:
antisocialmunky said:

What degree do you need to make sense out of this robot porn?

Click to expand...

You're not the only one who looks at die shots as nothing more than a rainbow of pretty colours. Haven't a clue how people see all the different parts from them.

You don't need a degree, but it does help if you are a cyborg

Vesku · Jan 7, 2012

maddie said:
My thoughts also. Where is the large GPU increase?

Trinity should be a roughly 20-30% GPU increase, bringing it closer to a 6570. IMO, the next big jump in iGPU for AMD's Fusion products won't come until they start integrating the GCN architecture. With GCN the GPGPU talk from AMD will have full hardware support.

maddie · Jan 7, 2012

Vesku said:
Trinity should be a roughly 20-30% GPU increase, bringing it closer to a 6570. IMO, the next big jump in iGPU for AMD's Fusion products won't come until they start integrating the GCN architecture. With GCN the GPGPU talk from AMD will have full hardware support.

Present is 400 shader units. How can the earlier estimate of 384 in the die shot result in a 20-30% increase if still vliw4-5?

iCyborg · Jan 7, 2012

384/400 = 0.96
1536 (6970) / 1600 (5870) = 0.96

That's assuming Trinity GPU is VLIV4 of course, and number 384 kind of supports that.

Coup27 · Jan 7, 2012

My question is how do you estimate 384 shaders from a die shot? :\

Ajay · Jan 7, 2012

Some useful insights can be found @ B3D Trinity vs IVB.

IntelUser2000 · Jan 8, 2012

Idontcare said:
One thing that blows me away when I see the pic of Trinity, or Llano for that matter, is just how little real-estate is devoted to the core logic itself and yet it is the performance characteristics of that core logic that nearly entirely determines the selling price of the entire IC.

But the opposite might be true in power consumption, which adding logic without thought wouldn't help lower it.

Could be just inverse of how much die space it takes relative to the rest.

formulav8 · Jan 9, 2012

maddie said:
Present is 400 shader units. How can the earlier estimate of 384 in the die shot result in a 20-30% increase if still vliw4-5?

IF 384 is accurate, it will perform better than current 400 sp Llano. It should be using the more efficient vliw4 shaders plus the other optimizations that the upper 6k series brought. So you can't simply do a shader by shader comparison and get an accurate number. The 69xx series easily outperformed the 5k series even though the 5870 had more shaders.

IntelUser2000 · Jan 9, 2012

maddie said:
Present is 400 shader units. How can the earlier estimate of 384 in the die shot result in a 20-30% increase if still vliw4-5?

They moved from VLIW5 to VLIW4 and said there was almost no performance impact in doing so.

In Llano, it had 5 groups of 80 shaders, while Trinity moves to 6 groups of 64 shaders(each group being less because its VLIW4). If we take AMD's statement as true, Trinity is really like 6 groups of 80 shaders. The total is then 480, which is 20% more shaders than Llano.

My question is how do you estimate 384 shaders from a die shot? :\

Trinity was indicated to have VLIW4 architecture, which means each groups of shaders have 64 of them. Since there's 6 there, the total is 384.

Vesku · Jan 9, 2012

If the rumors of Trinity coming out better on 32nm than Llano are true, it should also launch with higher GPU clocks.

blckgrffn · Jan 9, 2012

Vesku said:
If the rumors of Trinity coming out better on 32nm than Llano are true, it should also launch with higher GPU clocks.

You'd really hope that they'd be nailing down yields on that process, wouldn't you? Especially since many of us (me, anyway) is going to take trinity x2 and view that as somewhat representative of Piledriver FX performance.

I am actually hoping that they'll have a 5 (or 6) module desktop Piledriver with decent clocks - it might as well play to its strengths. But that is another thread...

Vesku · Jan 9, 2012

Whether it's nailed down or not I think it is valid to use it as a bellwether for Piledriver FX. I have yet to see anyone get higher stable clocks on the new unlocked Llanos than a late 45nm C3 Phenom II/Thuban. If Trinity doesn't turn out to be good news for 32nm I would expect most of the future spending at AMD would be focused on making the next node, whether GF or TSMC, less of a disaster.

exar333 · Jan 9, 2012

formulav8 said:
IF 384 is accurate, it will perform better than current 400 sp Llano. It should be using the more efficient vliw4 shaders plus the other optimizations that the upper 6k series brought. So you can't simply do a shader by shader comparison and get an accurate number. The 69xx series easily outperformed the 5k series even though the 5870 had more shaders.

Um, not really accurate. There was almost no performance difference between the two, assuming the same core/memory clocks. The only real win for the 6x series was tesselation was much improved.

blckgrffn · Jan 9, 2012

ExarKun333 said:
Um, not really accurate. There was almost no performance difference between the two, assuming the same core/memory clocks. The only real win for the 6x series was tesselation was much improved.

Well, I guess that really depends.

http://www.anandtech.com/show/3987/amds-radeon-6870-6850-renewing-competition-in-the-midrange-market

So, there is 6870 lined up next to the 5870 missing a lot of shaders, slight clock speed advantage, way behind in TMUs and a modestly less memory bandwidth.

http://www.anandtech.com/bench/Product/294?vs=290

Bench says the 5870 and 6870 are basically equal performers.

There you go. VLIW4 isn't they only thing they sorted out from Cypress to Northern Islands, apart from tessellation, evidently.

formulav8 · Jan 9, 2012

The 6870 isn't VLIW4. Its VLIW5 iirc.

Also it appears to depends on the game whether the VLIW4 update was faster than VLIW5. http://www.anandtech.com/bench/Product/511?vs=509

Anyways, the new IGP should be quite a bit faster than Llano. But Llano is already bandwidth limited on memory. So... Unless AMD puts some Sideport type ram it could be hampered quite a bit...

iCyborg · Jan 9, 2012

ExarKun333 said:
Um, not really accurate. There was almost no performance difference between the two, assuming the same core/memory clocks. The only real win for the 6x series was tesselation was much improved.

I disagree - 6950 has 1408 cores vs 1600 on 5870, lower clock and slightly faster memory clock, and yet it beats 5870 by 5-10% on average according to anandtech review.

S/A AMD Trinity Die Pic

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Member

Diamond Member

Diamond Member

Platinum Member

Elite Member

Diamond Member

Diamond Member

Golden Member

Platinum Member

Lifer

Elite Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member