New Zen microarchitecture details

Page 96 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

KTE

Senior member
May 26, 2016
478
130
76
So, according to Lisa Su in Q2 EC (07/21/2016) "the true volume availability of DT Zeppelin" = Q1/2017.
I did say that since the start... Sorry.

Paper launch for end of Q4, volume availability starting Q1. Was same for Agena.

Sent from HTC 10
(Opinions are own)
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,864
4,546
136
I did say that since the start... Sorry.

Paper launch for end of Q4, volume availability starting Q1. Was same for Agena.

Sent from HTC 10
(Opinions are own)
Even if they do only a soft/paper launch in Q4 we will at least know how it performs and what kind of impact will it have based on that.

edit:



Now AMD has changed the "up to" from the previous slides with "greater than 40% uplift" in the newest slide deck from Q2 call.

Greater than 40% ST IPC uplift over EX core is in effect greater than 60% ST IPC uplift over PD core in FX83xx.
 
Last edited:

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Even if they do only a soft/paper launch in Q4 we will at least know how it performs and what kind of impact will it have based on that.

edit:



Now AMD has changed the "up to" from the previous slides with "greater than 40% uplift" in the newest slide deck from Q2 call.

Greater than 40% ST IPC uplift over EX core is in effect greater than 60% ST IPC uplift over PD core in FX83xx.


"Greater than 40%" claims by AMD are all made in the server/HTPC/datacentre columns - so that's compared to Piledriver, not Excavator.

They have not once claimed > 40% for desktop (over Excavator), that I've seen.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
The following is my Wattman voltage after system crashes and it reverts back to default setting. Not clean install, with a lot of crashes already. :twisted:


Is that 1106 actually the defaults? By far the lowest I've seen.

The 1131 is lower as well, so I have a new floor until you verify that the 1106 is, indeed, the default voltages.

Also: you have four RX 480s?! What do you do with them?
 

TimCh

Member
Apr 7, 2012
55
52
91
"Greater than 40%" claims by AMD are all made in the server/HTPC/datacentre columns - so that's compared to Piledriver, not Excavator.

They have not once claimed > 40% for desktop (over Excavator), that I've seen.
Untrue, AMD has repeattenly and explicitly stated that the 40% uplift in IPC is in relation to Excavator.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
"Upto"

It's now becoming "greater than 40%" for Server/HTPC/Datacentre.
That's not new for their datacenter claims.

But I've never seen an "up to". Maybe in a footnote? On Radeon slides there were "up to" remarks, but not on the Zen related ones I saw. It's alway been some forum member's interpretation, IIRC.
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,864
4,546
136
I haven't seen "up to" phrase in any of the slides or statements either.
 

KTE

Senior member
May 26, 2016
478
130
76
Yea, sorry. I should be clearer.

That is my own interpretation of their "40% IPC increase" claim.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Untrue, AMD has repeattenly and explicitly stated that the 40% uplift in IPC is in relation to Excavator.

Desktop Zen is in relation to Excavator, server Zen is in relation to Piledriver.

The distinction exists because of the newest product in each market is different.

The x4 845 (Excavator) is the highest IPC AMD desktop processor.

The Opteron 6300 series (Piledriver) are the highest IPC AMD server processors.

Everything is relative

I have yet to see a >40% claim for desktop Zen.

BTW - if Zen has 40% higher IPC than Excavator, it will have about 60~65% higher IPC than Piledriver. So only saying > 40% is AMD's way of not giving away too much - or making too bold of a claim in the server market.
 

KTE

Senior member
May 26, 2016
478
130
76
When's the last time any new uarch from AMD hit an average +40% IPC vs. the last gen?

Or Intel?
 

inf64

Diamond Member
Mar 11, 2011
3,864
4,546
136
When's the last time any new uarch from AMD hit an average +40% IPC vs. the last gen?

Or Intel?
Intel did it when they switched to Core from Netburst. AMD never had such a big jump since they never had such a radical departure from one gen to another (ok BD was a radical departure but the IPC loss was 10-15% vs K10 45nm core).
 

KTE

Senior member
May 26, 2016
478
130
76
So even if AMD hit +30% average IPC gain, it would be a serios boost in the right direction and a major win (talking DT here).
 

inf64

Diamond Member
Mar 11, 2011
3,864
4,546
136
So even if AMD hit +30% average IPC gain, it would be a serios boost in the right direction and a major win (talking DT here).
Well they absolutely NEED to hit the claimed number (~40% over EX) if they want to be even in the talk about being competitive. They also need to hit the frequency on the Zen parts (low to mid 3Ghz base and high 3Ghz Turbo on 8C) in order to be competitive since IPC alone won't mean jack sh** if they can only hit low clocks with no OC margin.
 

KTE

Senior member
May 26, 2016
478
130
76
Well they absolutely NEED to hit the claimed number (~40% over EX) if they want to be even in the talk about being competitive. They also need to hit the frequency on the Zen parts (low to mid 3Ghz base and high 3Ghz Turbo on 8C) in order to be competitive since IPC alone won't mean jack sh** if they can only hit low clocks with no OC margin.
Yea but think about it...

Engineering wise, process problems can be resolved in a few quarters (Agena vs Deneb) but poor arch performance or scaling can't be for near 5 years (BD).

And averaging >20% IPC gain is nothing to snub from one arch to another. It's a major feat.

Although business wise, financially, it may not be competitive or profitable (what you're saying).
 

nismotigerwvu

Golden Member
May 13, 2004
1,568
33
91
Yea but think about it...

Engineering wise, process problems can be resolved in a few quarters (Agena vs Deneb) but poor arch performance or scaling can't be for near 5 years (BD).

And averaging >20% IPC gain is nothing to snub from one arch to another. It's a major feat.

Although business wise, financially, it may not be competitive or profitable (what you're saying).

If there truely are process problems it will be a much bigger deal than the Agena to Deneb progress was. In that case they had the much more potent 45nm process just a bit out of reach when they launched Agena on 65nm. From everything I've heard so far, it sounds like 10nm isn't on the table for AMD CPUs so it's GF 14nm or bust and 7nm is a long ways off. Sure there will be refinements to the process and they can make a substantial difference (look no further than the GF 32nm process for something that seemed like a dud out of the gate and was refined into an extremely potent process) but it's still no replacement for a full on generational leap.
 
Last edited:

KTE

Senior member
May 26, 2016
478
130
76
If there truely are process problems it will be a much bigger deal than the Agena to Deneb progress was. In that case they had the much more 45nm process just a bit out of reach when they launched Agena on 65nm. From everything I've heard so far, it sounds like 10nm isn't on the table for AMD CPUs so it's GF 14nm or bust and 7nm is a long ways off. Sure there will be refinements to the process and they can make a substantial difference (look no further than the GF 32nm process for something that seemed like a dud out of the gate and was refined into an extremely potent process) but it's still no replacement for a full on generational leap.
90nm and 45nm were good to begin with. 65nm wasn't.

IF process related problems did exist, it would depend on what the exact nature of the problems are. This is so oversimplified. Some can be ironed out as process matures and yields improve... Even variation improves. Some can't as they tie down to the base characteristics.

If they can't, then yes, AMD would be in the same position as having a dud uarch.

A good indication for me is always the launch model frequency@power.

Sent from HTC 10
(Opinions are own)
 
Last edited:

looncraz

Senior member
Sep 12, 2011
722
1,651
136
When's the last time any new uarch from AMD hit an average +40% IPC vs. the last gen?

Or Intel?

Never for an evolutionary design, but Zen isn't an evolution - it's a completely new design. This is AMD's version of Intel's Core moment... except not as extreme.

If you look at Zen vs Excavator, it's really hard to see how it is only getting a 40% IPC bump.

It has about double the integer hardware, a dedicated, and large, FPU, faster, and dedicated, caches, a front-end known to be capable of delivering more instructions than entire Excavator module can handle, an updated memory controller using DDR4, faster L3 shared with only three other cores, and much more going for it...

The reason you only get a 40% IPC bump from all that is because we are hitting the area of maximum extraction from the fetched instructions using ILP (instruction-level parallelism)... if you can only pre-decode 16B of instructions, you can only go so far for identifying and executing non-dependent instructions.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,706
1,233
136
This is AMD's version of Intel's Core moment... except not as extreme.
Zen is AMD's Netburst/K6 moment. It can literally fail if its SMT+Cyclone re-architecture has bad performance. Bulldozer is the repeat mix of K5 and K6.

K9-Bulldozer(<2007) had a fully internal design team set for implementation on 45nm SOI @ Freq_high of 5 GHz // K5-ish moment (home-grown). 15h-Bulldozer(>2007) was a broken implementation of a re-architected, downgraded Alpha 21264 via automation // K6-ish moment (foreign architecture).

Off-topic;
From what I could gather from two things lost in Bulldozer; K9 architecture and 32nm FDSOI.
- 32nm FDSOI (AMD Foundry) = 32nm Bulk (Intel) // 126nm GP, 100nm M1P... 15% faster than 32nm PDSOI.. 30% less power than 32nm PDSOI.. Initial buried oxide 40nm, but STMicro 32FD_PDK hints final BOX @ 30nm for 32nm FDSOI. Reduced foundry cost and mask costs termed via STM32FD but they never did 32FD either.

- K9 Architecture = Okay, after research from various 2004-2009 sources. This architecture is largely based on K7 to 12h architectures. The changes involved moving from 6 integer pipes (3 ALUs & 3 AGUs) to 4 integer pipes (3 AGLUs + 1 ALU). The 3 AGLUs would retain similar perf with previous designs. The added ALU would do ops; Multiply, Division, Complex Branches on top of normal ALU. While back to the AGLUs, AGLU2 would do POPCNT/LZCNT and the rest would do what K7-12h SOG says for ALU/AGUs[2007s - 4 ALUs/3 AGUs for Bulldozer via Open64]. The FPU would either be what is in Bulldozer already or what is currently in Zen[2007s - 4 FPUs for Bulldozer via Open64]. The pipeline would have gone to 20 stages up from 12 stages. LD/ST Cache and Front-end from Bulldozer is close to what would have been in K9.
 
Last edited:

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Never for an evolutionary design, but Zen isn't an evolution - it's a completely new design. This is AMD's version of Intel's Core moment... except not as extreme.

If you look at Zen vs Excavator, it's really hard to see how it is only getting a 40% IPC bump.

It has about double the integer hardware, a dedicated, and large, FPU, faster, and dedicated, caches, a front-end known to be capable of delivering more instructions than entire Excavator module can handle, an updated memory controller using DDR4, faster L3 shared with only three other cores, and much more going for it...

The reason you only get a 40% IPC bump from all that is because we are hitting the area of maximum extraction from the fetched instructions using ILP (instruction-level parallelism)... if you can only pre-decode 16B of instructions, you can only go so far for identifying and executing non-dependent instructions.
If they were really sneaky, they'd given that 40% number as per thread IPC increase with SMT vs. CMT in multithreaded workloads for some sandbagging. :sneaky:

A 16B fetch window (if it has this size) would be OK, because of the µOp cache.

And of course, mem B/W does hold back per core performance somewhat, esp. with 16 threads on 2 channels.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,706
1,233
136
And of course, mem B/W does hold back per core performance somewhat, esp. with 16 threads on 2 channels.
I might as well post this here; http://i.imgur.com/0zjBXBr.jpg


8-channels to 8 threads for L3 MCT. [?MC?] Up to 256 GB/s for one L4, Up to 512 GB/s for two L4. [DF/FUSE] is the Cache Coherent Data Fabric on die so ccNUMA for HBM2 phys.
2-channels to 16 threads for Unified MCT. [UMC] Up to 25.6/51.2 GB/s for one/two channel. AMP speeds >25.6/>51.2 GBs.

I assume AMD will provide two versions of each config on AM4;
No L4 Cache -> Athlon FX X4/X8 (Bandwidth issues apply only here.. pretty sure these will be the garbage sub-2.6 GHz bins anway)
L4 Cache -> Phenom FX X4/X8 [1.4 GHz to ~2.8 GHz/1.6 GHz to ~3.2 GHz/2 GHz to ~4 GHz]
 
Last edited:

Doom2pro

Senior member
Apr 2, 2016
587
619
106
I might as well post this here; http://i.imgur.com/0zjBXBr.jpg


8-channels to 8 threads for L3 MCT. [?MC?] Up to 256 GB/s for one L4, Up to 512 GB/s for two L4. [DF/FUSE] is the Cache Coherent Data Fabric on die so ccNUMA for HBM2 phys.
2-channels to 16 threads for Unified MCT. [UMC] Up to 25.6/51.2 GB/s for one/two channel. AMP speeds >25.6/>51.2 GBs.

I assume AMD will provide two versions of each config on AM4;
No L4 Cache -> Athlon FX X4/X8 (Bandwidth issues apply only here.. pretty sure these will be the garbage sub-2.6 GHz bins anway)
L4 Cache -> Phenom FX X4/X8 [1.4 GHz to ~2.8 GHz/1.6 GHz to ~3.2 GHz/2 GHz to ~4 GHz]

So what does that mean? You think AMD is using a High Bandwidth Memory stacking technique for their Cache?
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
If they were really sneaky, they'd given that 40% number as per thread IPC increase with SMT vs. CMT in multithreaded workloads for some sandbagging. :sneaky:

A 16B fetch window (if it has this size) would be OK, because of the µOp cache.

And of course, mem B/W does hold back per core performance somewhat, esp. with 16 threads on 2 channels.

That would be a pretty damaging way for them to measure it.

As for the 16B fetch window, that was just a random size chosen by me. It all depends upon how deep the pre-decode logic can look for non-dependent instructions and how accurately it can predict branches. Smaller fetches simply make the task more difficult, but larger fetches are only useful if the hardware is fast and accurate enough to find non-dependent instructions that actually need to be executed.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,706
1,233
136
So what does that mean? You think AMD is using a High Bandwidth Memory stacking technique for their Cache?
For their Last-Level Cache, Level-4 cache.

My speculation is;
64B L1i Fetch Window (AMD64 Variable-op)
2x4 AMD64 to Macro-op Decode (Same as Steamroller/Excavator)
16B or 8 Macro-op L0i Fetch Window (Internal Macro-ops, RISC-like)

3-cycle(simple load), 4-cycle(complex load); L1d
10-cycle; L2
52-cycle; L3 (L3 -> L2 => 64B vs Jaguars 16B)
52-cycles + 50 ns; L4
52-cycles + 100 ns; Memory

Misprediction to L1 BTB; ~18-cycles
Misprediction to L0 BTB/BTQ; ~9-cylcles
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |