New Zen microarchitecture details

Page 44 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
You don't understand my point. Without a core count advantage CPU performance is all up to IPC and x Mhz, means it's harder for AMD to compete with Intel. If a 4 core version is enough for mobile is a different question.
If customers only buy for running CB ST, then this is as important as you imply. What if a CCX uses less power at the same clock frequency (FPU differences, 256b core power, iterative multiplier, etc.) and can sustain high clocks for CPU+GPU?
 

Gundark

Member
May 1, 2011
85
2
71
Anyone else think that Zen is a great name that marketing team could do wonders with it? If they fail to monetize on this opportunity, they should be lined up against the wall.
 

nenforcer

Golden Member
Aug 26, 2008
1,767
1
76
Anyone else think that Zen is a great name that marketing team could do wonders with it? If they fail to monetize on this opportunity, they should be lined up against the wall.

That's just the development code name - I'd be surprised if they kept the AMD FX moniker but I've heard nothing to the contrary to say they won't.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Seeing how strict VRM tolerances are used on Zen motherboards makes be a bit worried about the process characteristics / the available headroom :hmm:

When AMD moved from 32nm SHP SOI to 28nm BULK the voltage stability became extremely important. Despite the platforms using parts made with different processes (e.g AM3+ and FM2+) had exactly the same load-line specification (1.3mOhms) in reality the smaller and othewise inferior 28nm process was significantly more sensitive to voltage variations / fluctuations.

Achieving a stable voltage supply through proper (load dependent) load-line calibration can result in hundreds of MHz additional headroom when close to Fmax, even on the more recent 28nm (Godavari) chips.

For Zen the load-line appears to be (based on the existing VRM designs) significantly tighter than it was with previous AMD designs and much tighter than the Intel VR12 spec (which is already strict) specifies...

Makes me wonder if Zen is either floored from the factory or is the process itself just extremely sensitive to voltage fluctuations.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
For Zen the load-line appears to be (based on the existing VRM designs) significantly tighter than it was with previous AMD designs and much tighter than the Intel VR12 spec (which is already strict) specifies...

Makes me wonder if Zen is either floored from the factory or is the process itself just extremely sensitive to voltage fluctuations.
Does this fit to a design with very low voltage margins? It could also mean a higher average clock frequency at given TDP with many active cores.

How do A8/A8X voltage requirements look like?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Does this fit to a design with very low voltage margins? It could also mean a higher average clock frequency at given TDP with many active cores.

There are three possible reasons I can think of:

- Limited Vmax (to keep the voltage below certain threshold in low load / current draw conditions)
- Rapid & extremely frequent frequency (PState) switching, due power management (like Carrizo at low TDPs). This would cause large changes in current draw and cause larger voltage fluctuation, especially when certain states are restricted to n cores active while the others are available when all cores are utilized.
- A process characteristic when close to Fmax (like on 28nm BULK processes, compared to 32nm SHP SOI).

Impossible to say the real cause yet, but I would think it has something to do with the process characteristics. If it was a power optimization (second scenario) AMD probably would have used it on Carrizo too. Carrizo supports load-line adjustment through SW.
 
Last edited:

Flash831

Member
Aug 10, 2015
60
3
71
There are three possible reasons I can think of:

- Limited Vmax (to keep the voltage below certain threshold in low load / current draw conditions)
- Rapid & extremely frequent frequency (PState) switching, due power management (like Carrizo at low TDPs). This would cause large changes in current draw and cause larger voltage fluctuation, especially when certain states are restricted to n cores active while the others are available when all cores are utilized.
- A process characteristic when close to Fmax (like on 28nm BULK processes, compared to 32nm SHP SOI).

Impossible to say the real cause yet, but I would think it has something to do with the process characteristics. If it was a power optimization (second scenario) AMD probably would have used it on Carrizo too. Carrizo supports load-line adjustment through SW.
Aren't Zen motherboards = Bristol Ridge motherboards?
Could this be a reason how AMD managed to raise the clock from Carrizo?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Aren't Zen motherboards = Bristol Ridge motherboards?
Could this be a reason how AMD managed to raise the clock from Carrizo?

They are compatible yes.

Carrizo (by the original design) has several times higher load-line spec than FM2+ or AM3+, so Carrizo / Bristol Ridge compatibility is not the reason for it.

Bristol Ridge technically didn't raise the clocks compared to Carrizo. AMD just enabled more sufficient TDP configuration on AM4 Bristol Ridges (up to 65W) and raised the clocks as high as possible (by blowing the voltages through the ceiling). That's also the reason why there won't be any unlocked Bristol Ridge SKUs even for AM4 (AFAIK), since there is nothing more to be squeezed out of it.
 

Abwx

Lifer
Apr 2, 2011
11,167
3,862
136
Does this fit to a design with very low voltage margins? It could also mean a higher average clock frequency at given TDP with many active cores.

AM3+ was designed for up to 220W CPUs, so it s no wonder that it had lower voltage losses within the VRM--Socket---CPU path, hence the apparent (only) lower voltage variation in respect of 95W Sockets...

For instance 2.5mm2 copper section of 10cm length has about 0.68 millihoms resistance, to compare with the 0.0013R load resistance quoted by TheStilt.

As for voltage margin it must be 10% at a minimum, wich means that the circuit must work at 0.9x the nominal voltage.
 

Doom2pro

Senior member
Apr 2, 2016
587
619
106
They are compatible yes.

Carrizo (by the original design) has several times higher load-line spec than FM2+ or AM3+, so Carrizo / Bristol Ridge compatibility is not the reason for it.

Bristol Ridge technically didn't raise the clocks compared to Carrizo. AMD just enabled more sufficient TDP configuration on AM4 Bristol Ridges (up to 65W) and raised the clocks as high as possible (by blowing the voltages through the ceiling). That's also the reason why there won't be any unlocked Bristol Ridge SKUs even for AM4 (AFAIK), since there is nothing more to be squeezed out of it.

So your saying Carrizo was basically running at optimal clock and power, that it wasn't designed to be a desktop and downclocked for less power, that it was designed for mobile and basically Bristol Ridge is going to be overclocked mobile parts?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
basically Bristol Ridge is going to be overclocked mobile parts?

For the quoted part, yes. A desktop part wouldn't have eight PCI-E lanes available for dGPU, like Carrizo / Bristol Ridge does. In total Carrizo has 8x + 4x + 4x PCI-E lanes. The 8x is reserved for dGPU, 4x is GPP (general purpose) and 4x for UMI (FM2+ only, for external FCH communication).
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Although made with a different process than Zen, the high clocks of Nvidia's 1070 and 1080 possibly show an interesting development: trade size for clocks while still being very power efficient. There seems to be no need to go wide @ low clocks for efficient design sweet spots. Lower mem PHY area at higher Gbps doesn't call for big dies, too.


For a final look at this we need die sizes of course.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,866
3,418
136
Although made with a different process than Zen, the high clocks of Nvidia's 1070 and 1080 possibly show an interesting development: trade size for clocks while still being very power efficient. There seems to be no need to go wide @ low clocks for efficient design sweet spots. Lower mem PHY area at higher Gbps doesn't call for big dies, too.


For a final look at this we need die sizes of course.

how many stages to you think the Zen pipeline is? mid 20's? high teens?
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
how many stages to you think the Zen pipeline is? mid 20's? high teens?
TL;DR: High teens (at least).
Dresdenboy at SA said:
Code:
Architecture	        Branch Misprediction Penalty
AMD K10                 12 cycles
AMD Bulldozer           20 cycles
Pentium 4 (NetBurst)	20 cycles
Core 2 (Conroe, Penryn)	15 cycles
Nehalem                 17 cycles
Sandy Bridge	        14-17 cycles
Source: http://www.anandtech.com/show/5057/the-bulldozer-aftermath-delving-even-deeper/2

Zen's pipeline won't be as short as K10's. And it doesn't need to be, as improved branch prediction plus some (rumoured) additions like checkpointing (before taking a hardly predictable branch), µOp cache, SMT (run the other thread at higher IPC in the meantime) reduce the perceived cost of mispredictions.

Some AMD patents show the cycles, for example US20140025933:
MAP, RDY, SCH, XRF, EX0, [EX1..], RE0, RE1, RE2 (Seronx, did you get the 3 retire stages from this patent?)
Before that there should be IT0, IT1/IC0, IT2/IC1-ICn, DEC0-DECm (parallel BP?) (see US20150121050).
That adds up quickly. If you compare those parts to Jaguar with a 14 cycle branch misprediction penalty, it looks to be at least as long for Zen, if not longer.

Of course, more stages could be required for the added units and increased complexity (wider schedulers, FMA support, checkpointing, SMT, etc.).

http://semiaccurate.com/forums/showpost.php?p=258949&postcount=2339
 
Last edited:

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
Although made with a different process than Zen, the high clocks of Nvidia's 1070 and 1080 possibly show an interesting development: trade size for clocks while still being very power efficient.

Yep. Looks like the Netburst/Bulldozer strategy (high clocks, low/moderate IPC) actually works on GPUs, while it has been a total bust on CPUs so far.

Part of the problem is that there seems to be a natural CPU "speed limit" of about 4.5 - 5.0 GHz. Toward the top end of that range, power usage shoots through the roof for smaller and smaller clock speed gains. If Piledriver could have hit 6.0 GHz at ~140W, it might actually have been competitive with Intel at the time. But that didn't happen.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Yep. Looks like the Netburst/Bulldozer strategy (high clocks, low/moderate IPC) actually works on GPUs, while it has been a total bust on CPUs so far.
IMO NW was OK, Prescott on the wrong process, and BD likely was the result of tradeoffs not favoring generic DT code.

IBM's CPUs for example show, that a HF design isn't necessarily bad.

Part of the problem is that there seems to be a natural CPU "speed limit" of about 4.5 - 5.0 GHz. Toward the top end of that range, power usage shoots through the roof for smaller and smaller clock speed gains. If Piledriver could have hit 6.0 GHz at ~140W, it might actually have been competitive with Intel at the time. But that didn't happen.
That could simply be a tradeoff to have a better design for many of the intended use cases.
 
Last edited:

HiroThreading

Member
Apr 25, 2016
173
29
91
IMO NB was OK, Prescott on the wrong process, and BD likely was the results of tradeoffs not favoring generic DT code.

Do you meant Northwood was OK, or that NetBurst as an architecture was OK? Two very different statements.

Prescott was a disaster on 90nm, but even when ported to 65nm, Cedar Mill was pretty rubbish. Granted, Cedar Mill probably could have scaled to mid-4GHz speeds as a single core design. However, by that stage, the multicore era was unleashed and Intel never clocked Cedar Mill above 3.73GHz and mostly sold them as the MCM Pentium D (90-120W TDP).

Bulldozer was just a horrid architecture, much like NetBurst.

IBM's CPUs for example show, that a HF design isn't necessarily bad.

Yes, but they also have 250-300W TDPs to play with, IIRC.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Do you meant Northwood was OK, or that NetBurst as an architecture was OK? Two very different statements.

Prescott was a disaster on 90nm, but even when ported to 65nm, Cedar Mill was pretty rubbish. Granted, Cedar Mill probably could have scaled to mid-4GHz speeds as a single core design. However, by that stage, the multicore era was unleashed and Intel never clocked Cedar Mill above 3.73GHz and mostly sold them as the MCM Pentium D (90-120W TDP).

Bulldozer was just a horrid architecture, much like NetBurst.

Sorry, meant NW of course.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Netburst was horrible, but I don't think Intel was ever as much behind AMD in IPC as AMD is at the moment behind Intel, or was it? IIRC K7 had ~33% higher IPC in FP than Northwood.
 

nonameo

Diamond Member
Mar 13, 2006
5,949
3
76
Netburst was horrible, but I don't think Intel was ever as much behind AMD in IPC as AMD is at the moment behind Intel, or was it? IIRC K7 had ~33% higher IPC in FP than Northwood.

One thing to keep in mind about P4 is that it was designed for high clocks. Northwoods of the time were performing on par(not everywhere, but enough to count) with their AMD counterparts.
 

SPBHM

Diamond Member
Sep 12, 2012
5,058
410
126
Netburst was horrible, but I don't think Intel was ever as much behind AMD in IPC as AMD is at the moment behind Intel, or was it? IIRC K7 had ~33% higher IPC in FP than Northwood.

and Intel had an easy 1GHz advantage
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |