New Zen microarchitecture details

Page 94 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

KTE

Senior member
May 26, 2016
478
130
76
I just got to thinking... RX480 has also been found to be rather highly overvolted from the factory... The BIOS-derived curve is showing worse performance than the real-world curve.
High volts is a sign of poor process variation/clocking. Lowering these is one of the main aims of the process shrinks.

There could be other plausible reasons but I am not one bit surprised. Afterall, this is the LPP process, not the SHP. Focusing on average clocks but maximum efficiency.

Sent from HTC 10
(Opinions are own)
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Probably the biggest issue currently with the 14nm LPP is indeed the process variation. There appears to be a HUGE variation in the leakage alone. Since the variation is that high I expect that there is also a significant amount of partially defective parts, some harvestable and some not.

On GCN1 - GCN3 GPUs made on TSMC 28nm HPx the maximum variation in the leakage has been around 10%. Even after such short time 14nm LPP GCN GPUs have been available, we've already seen excess 20% variation in the same scale (SIDD). Increased variation is expected as the size decreases, however I don't think it was expected to be that much greater?
 

KTE

Senior member
May 26, 2016
478
130
76
Any ASIC should never vary that much for a given product unless binning is posing difficult for a given clock.

For which reason, tolerances are relaxed more and more, so leakages end up being distributed broader.

How are Zen Cores clocked? If any of it can be discussed.

100*40*2 / 2.62

Sent from HTC 10
(Opinions are own)
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Probably the biggest issue currently with the 14nm LPP is indeed the process variation. There appears to be a HUGE variation in the leakage alone. Since the variation is that high I expect that there is also a significant amount of partially defective parts, some harvestable and some not.

On GCN1 - GCN3 GPUs made on TSMC 28nm HPx the maximum variation in the leakage has been around 10%. Even after such short time 14nm LPP GCN GPUs have been available, we've already seen excess 20% variation in the same scale (SIDD). Increased variation is expected as the size decreases, however I don't think it was expected to be that much greater?

FinFet naturally has higher variation than planar designs.

From less than ideal sized and shaped fins, non-uniform doping, to minor variations in the covering layers, FinFets complicates the matter of uniformity greatly.

Once you look at how they carry current the outsized effect of these normally inconsequential variations becomes more obvious:



 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Core clock on Zen is generated the same way as it was on Bobcat core :sneaky: That's why I asked who was in charge of Bobcat, some time ago.

To allow much more accurate frequency control, Zen uses dividers applied on the main frequency, from which all of the frequencies are generated from.

On 10, 12 & 15h CPU designs the divider is 1-4² (1-16) while on Zen is is much more precise (2-³ granularity). That's why it has significantly more precise control over frequency and such odd frequencies as 3.05GHz can be easily generated
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
I asume we can expect the same kind of flat 14nm curve for zen?

What about hdl vs std cells on the zen?

As efficiency is king on servers hitting the p10 900Mhz spot (lowest v) for base seems to me like a crucial point. Its absolutely crucial if its 1400 or 1900 for a 32c part.
 

KTE

Senior member
May 26, 2016
478
130
76
Core clock on Zen is generated the same way as it was on Bobcat core :sneaky: That's why I asked who was in charge of Bobcat, some time ago.

To allow much more accurate frequency control, Zen uses dividers applied on the main frequency, from which all of the frequencies are generated from.

On 10, 12 & 15h CPU designs the divider is 1-4² (1-16) while on Zen is is much more precise (2-³ granularity). That's why it has significantly more precise control over frequency and such odd frequencies as 3.05GHz can be easily generated

Thanks. So it's FID and DID again basically

The worrying thing for me is, on Bobcat this allowed the PLLs to go whack and read frequencies that were not the actual clocks. Even CPU-Z would verify them. In that period, the CPU would be running slower than stock in performance. I'm sure I have 3-7GHz Bobcat somewhere, like 13GHz Duron.

Sent from HTC 10
(Opinions are own)
 
Last edited:

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
If we look at gf prior execution its imo a miracle we see finfet gfx from gf in about the same time as from tsmc.

I am the only one saying it but its imo the single most obvious observation to have.

That it comes of cost of high process variation is also quite evident. The problem/fortunately for amd is polaris is masking it a little bit. They certainly have to prioritize ps4 neo as number one. Those lines get first attention and priority. Secondly mobile is priority and selecting as we can see now 900MHz parts here for eg apple is second priority. Desktop market is then left with the leftovers. P10 is mainstream and low cost so it would naturally have high V to lower cost and handle the bad parts.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
The worrying thing for me is, on Bobcat this allowed the PLLs to go whack and read frequencies that were not the actual clocks. Even CPU-Z would verify them. In that period, the CPU would be running slower than stock in performance. I'm sure I have 3-7GHz Bobcat somewhere, like 13GHz Duron.

Sent from HTC 10
(Opinions are own)

The issue was partially software and partially design related. The CPU allowed to use blocked (too small) dividers, but it didn't actually use them. Software like CPU-Z didn't / don't know how to read the actual, effective divider. On Zen this shouldn't be the issue, since the current FID / DID registers use the same layout as 10h and 15h CPUs for example.
 

NTMBK

Lifer
Nov 14, 2011
10,322
5,352
136
Probably the biggest issue currently with the 14nm LPP is indeed the process variation. There appears to be a HUGE variation in the leakage alone. Since the variation is that high I expect that there is also a significant amount of partially defective parts, some harvestable and some not.

High leakage parts get dumped in consumer markets, low leakage parts get saved for servers? Seems plausible.
 

del42sa

Member
May 28, 2013
115
153
116
Tonga and Fiji have all of the advanced stuff always disabled. I think Polaris is the first one that actually uses AVFS and clock stretchers outside the paper.

Any idea why is that ? Does it have any negative influence to performance or clocking ability ? Or is it just broken ?
 

stuff_me_good

Senior member
Nov 2, 2013
206
35
91
BTW, is there a reason why did Keller and CO left out AVX and FMA from ZEN? Why did they bother to put serious effort to bring back that missing FPU performance when they didn't go all the way?


Are those features coming for ZEN+? If so, I hope it won't be some usual quick hack.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,706
1,233
136
BTW, is there a reason why did Keller and CO left out AVX and FMA from ZEN?
Those are in Zen. AVX+/AVX2+/FMA3 is in Zen's ISA. What is not in Zen's ISA is XOP/FMA4.

The only thing that would fit in the bold is FP256 and Non-bridged FMA units. Those weren't included since AMD is saving those for Bulldozer's successor with ASF.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,475
1,978
136
BTW, is there a reason why did Keller and CO left out AVX and FMA from ZEN? Why did they bother to put serious effort to bring back that missing FPU performance when they didn't go all the way?


Are those features coming for ZEN+? If so, I hope it won't be some usual quick hack.

They are in Zen. However, the FPU elements and datapaths are 128 bits wide, meaning that 256-byte AVX operations take two clocks, or both the execution units for a single clock, to complete.

This is imho a rational decision. Peak SIMD FPU throughput limits very few loads.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,475
1,978
136
Those weren't included since AMD is saving those for Bulldozer's successor with ASF.

There is no BD successor. I have a rough idea of all of AMD's cpu side design teams and what they are working on. None of them work on anything BD related.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
BTW, is there a reason why did Keller and CO left out AVX and FMA from ZEN? Why did they bother to put serious effort to bring back that missing FPU performance when they didn't go all the way?


Are those features coming for ZEN+? If so, I hope it won't be some usual quick hack.
AVX and FMA are there.

And is that performance missing? Would wider units help that much with anything else being improved on already? And how would they increase power consumption for the majority of workloads being happy with that FPU, or decrease max clocks?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
High leakage parts get dumped in consumer markets, low leakage parts get saved for servers? Seems plausible.

In case of Ellesmere XT (P10) at least that's not the reason But indeed I would expect that AMD bins Zeppelin dies pretty carefully, for server platforms specifically. For 32C/64T at least each core pair must be matched extremely accurately, since they will be sharing the supply voltage. The second pair doesn't have to match accurately with the first one, but otherwise the same conditions apply. Ideally of course you would have four dies with identical characteristics.

I would expect that these MCM chips get the premium silicon in terms of leakage, since otherwise there is no way the current draw can be kept at sane levels. For a single die chips it won't be necessary.

Any idea why is that ? Does it have any negative influence to performance or clocking ability ? Or is it just broken ?

No idea. In case of Carrizo I find it extremely odd, since AMD has talked alot about how AVFS works on it. I cannot tell for sure if it is actually disabled, but I've never seen it activate during the time I've been testing the system (over a year now). But on the GPU ASICs (prior Polaris) the advanced stuff is specifically disabled.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Forgive my ignorance, but unless you're custom compiling stuff isn't XOP/FMA4 something of a dead-end anyway, since only BD derivatives supported it?

Mostly correct, you can also hand write code paths into programs to use XOP/FMA4... but no one really does it.

AMD has finally wised up that they have nowhere near enough pull any more to create adoption of new instruction sets, so they have stripped support for them from Zen to help clean the plate.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Pretty hard to tell what is going on with 14nm LPP currently... Is the huge variation because of GlobalFoundries or the process itself. Would be interesting and helpful to see identical parts manufactured at GlobalFoundries and Samsung fabs, but obviously that's much easier said than done.

Looking at the variation of the most recent 32nm SHP silicon version manufactured in July 2014 (80 samples), the absolute maximum variation in SIDD is 33.1%. When the couple rare extremities of both ends are removed, the average variation is just 10.84%.

The more recent GPU-Z versions store the "ASIC Leakage" value into a database, so that the software can tell how your specimen positions in the average quality of the same type of ASICs. Ellesmere (P10) has been available for three weeks now, and we've already seen a huge variation in the static leakage. The range I managed to find in public was 65.7% - 89.3% (672 - 914 LeakageID). Since both of the screenshots displayed that neither of these are the absolute minimum and maximum figures seen, I asked the author (w1zzard @ TPU) what are the absolute lows and highs recorded so far. The answer was 62.4% - 94.7% (638 - 969). 51.9% variation, after three weeks in D:
 

KTE

Senior member
May 26, 2016
478
130
76
That's ridiculous.


Good to see wiz still around.

Sent from HTC 10
(Opinions are own)
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |