New Zen microarchitecture details

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
To be more accurate, it doesn't hint at a high frequency design à la BD. Since NHM Intel also uses 4 cycles. But at least it hints at a FinFET design achieving more than say 3 GHz at reasonable voltages. Jaguar has a 3 cycle latency and reaches 2.2GHz @ 28nm while being a LP design without not too much custom logic.

A fatter core can have very different causes. But here we're talking about fast transistors (14/16nm FinFET) in a not so dense process (20nm). Creating a 20-25 FO4 design would ease the timing pressure to use leaky transistors for critical paths. Maybe XV already went into that direction as it's already a heavily synthesized design, allowing for an easier change of target pipeline stage delays.

Here is an illustration of the possible benefits:
Ok makes good sense. I guess amd also had to design with less timing pressure when they dont have the same control of proces as Intel have?
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Any solid speculation on the memory arrangement for desktop Zen? Seems that Zen will be built around 8 core blocks served by dual channel memory which will be combined together in groups up to 4 providing 32 cores + 8 channel memory. (CERN slides + previous info)

For that 8 channel memory to be well utilized Zen must have some methodology to share the memory controllers over the transport layer to minimize overhead and latency, yes? Does Seamicro's Freedom Fabric IP help for this?

Guess what I'm getting at is how there might be up to 16 core AM4 (dual channel) Zen if cluster 0s memory can be shared with cluster 1. Avoiding the performance implications of being connected 1 channel each and sharing (or some other NUMA-type setup), which was somewhat of a concern to me with possible 10-16 core configurations.
 
Last edited:

Adul

Elite Member
Oct 9, 1999
32,999
44
91
danny.tangtam.com
Overclocked 980 Ti is also at least 20% and more faster than overclocked Fury X, which justifies the higher TDP. Also, 980 Ti isn't using a liquid based cooler, which lowers the TDP..

Anyway, back on topic. I really hope Zen performs well. An AMD slot A Athlon CPU was my first self built PC, so I have a soft spot for AMD and I want them to succeed.

We need them to succeed otherwise the stagnation of the market will continue.
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
XV already has support for FMA3/AVX2, and presumably Zen will have the same. XV suffers pretty badly under AVX2, though, and my impression was that the major reason for that was its requirement that it split instructions in hardware before executing them. Maybe I'm wrong in my reasoning. All I know is that AVX2 code doesn't handle much better than the same code optimized for SSE3 instead when run on XV.

Zen is going to run into a bunch of AVX/AVX2 code optimized for Intel processors with 256-bit FMACs. If it can't handle that any better than XV then I'll be rather disappointed.
Any weakness will be exploited with community benchmark obsession, too, no matter how questionable the relevance of the benchmark is.

If Zen doesn't do AVX2 well, expect the "definitive" benchmark to use it relentlessly. Hopefully Keller was smart enough to realize that the only way to compete with Intel is to copy their designs as much as possible because otherwise there is room for selective benching (like running FP-heavy benches when comparing Intel with FX chips and mostly ignoring integer). It looks like he got the hint by going to SMT.
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
To be honest, I'm not interested in a 95 watt part. And I'm getting more than a little sick of people who are. Since when did enthusiasts care about wattage?
It seems they mostly care when it's a fanboy talking point. When Nvidia was dominating with GTX 480s and 580s there was complaint (mostly just in the tech press articles) about heat and noise but also more people bragging about their SLI 480 setup — and articles about triple 480 SLI.

The same people, I'll wager, made a big fuss over the noise and power consumption of Hawaii, especially with the blower — even though we're seeing that Hawaii has more staying power than Fermi. And there is still complaint about Fiji even though the Nano has narrowed the performance per watt so much that the issue is barely relevant.

But, the main reason behind the 95W thing is cheap motherboards, I'll wager. Cheap VRMs. Cheap components. Thin PCBs. More profit.

I will grant that CPUs like the 9590 are nonsense, though.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
Good on Zen for having the latency advantage, but FMA (particularly FMA3) is going to be the issue I was driving at earlier. XV already has support for FMA3/AVX2, and presumably Zen will have the same. XV suffers pretty badly under AVX2, though, and my impression was that the major reason for that was its requirement that it split instructions in hardware before executing them. Maybe I'm wrong in my reasoning. All I know is that AVX2 code doesn't handle much better than the same code optimized for SSE3 instead when run on XV.

Zen is going to run into a bunch of AVX/AVX2 code optimized for Intel processors with 256-bit FMACs. If it can't handle that any better than XV then I'll be rather disappointed.

Given that Intel still disables AVX, AVX2 and FMA3 on their Pentium chips, I don't think AVX will get terribly important outside of a handful of apps.
 
Aug 11, 2008
10,451
642
126
Maybe we should wait for some benchmarks before starting the usual excuses and paranoid ramblings. It *might* actually be competitive without them.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Good on Zen for having the latency advantage, but FMA (particularly FMA3) is going to be the issue I was driving at earlier. XV already has support for FMA3/AVX2, and presumably Zen will have the same. XV suffers pretty badly under AVX2, though, and my impression was that the major reason for that was its requirement that it split instructions in hardware before executing them. Maybe I'm wrong in my reasoning. All I know is that AVX2 code doesn't handle much better than the same code optimized for SSE3 instead when run on XV.

Zen is going to run into a bunch of AVX/AVX2 code optimized for Intel processors with 256-bit FMACs. If it can't handle that any better than XV then I'll be rather disappointed.
Let's wait and see. A lot of people in the Zen team came from the cat core team. They didn't do bad on AVX, so they shouldn't with AVX2 on Zen as there is also enough integer SIMD execution hardware. Just look at how Jaguar did vs. Excavator in these examples:

Jaguar:
Code:
1798 AVX   :VADDPS ymm, ymm, ymm        L:   1.47ns=  3.0c  T:   0.98ns=  2.00c
1799 AVX   :VMULPS ymm, ymm, ymm        L:   0.98ns=  2.0c  T:   0.98ns=  2.00c
1800 AVX   :VMULPS+VADDPS ymm, ymm, ymm L:   2.44ns=  5.0c  T:   1.02ns=  2.08c
1801 AVX   :VMULPS ymm1.. VADDPS ymm2.. L:   1.47ns=  3.0c  T:   0.98ns=  2.00c

Carrizo:
Code:
1798 AVX   :VADDPS ymm, ymm, ymm        L:   2.39ns=  5.0c  T:   0.48ns=  1.00c
1799 AVX   :VMULPS ymm, ymm, ymm        L:   2.39ns=  5.0c  T:   0.48ns=  1.00c
1800 AVX   :VMULPS+VADDPS ymm, ymm, ymm L:   4.77ns= 10.0c  T:   0.95ns=  2.00c
1801 AVX   :VMULPS ymm1.. VADDPS ymm2.. L:   2.39ns=  5.0c  T:   0.95ns=  2.00c

Skylake:
Code:
1798 AVX   : VADDPS ymm, ymm, ymm          L:   1.81ns=  4.0c  T:   0.23ns=  0.50c
1799 AVX   : VMULPS ymm, ymm, ymm          L:   1.81ns=  4.0c  T:   0.23ns=  0.50c
1800 AVX   : VMULPS+VADDPS ymm, ymm, ymm   L:   3.62ns=  8.0c  T:   0.40ns=  0.88c
1801 AVX   : VMULPS ymm1.. VADDPS ymm2..   L:   1.81ns=  4.0c  T:   0.45ns=  1.00c

This is just for single precision. In double precision, Jaguar suffers from its iterative multiplier. But latencies are still not bad compared to Carrizo. For SP they are much better with similar throughput when executing a VMUL+VADD mix (both dependent and independent). Skylake has 1 cycle lower latencies than Carrizo and twice the throughput. This should be solved by Zen. The FADD/VADD latencies will be lower, while multiplies look to be the same. Integer SIMD latencies will be low. Decode throughput (how many "double" decoded ops per cycle) remains to be seen, but then there is a µOp cache...

We might actually think of Zen as a frequency tuned and strongly improved pair of Jaguar cores combined like a 2C VISC processor without the thread creation/core assignment overhead, able to support one or two threads.

Sources:
http://users.atw.hu/instlatx64/AuthenticAMD0700F01_K16_Kabini2_InstLatX64.txt
http://users.atw.hu/instlatx64/AuthenticAMD0660F01_K15_Carrizo_InstLatX64.txt
http://users.atw.hu/instlatx64/GenuineIntel00506E3_Skylake_InstLatX64.txt
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
The problem with good 256bit AVX/AVX2/FMA performance is you need 256bit paths. And that's a TDP killer. Without these paths my 6700K could maybe have been sold as a 55W TDP CPU.

Same reason why AVX512 is server only.

If Zen gets these 256bit paths, then it will be clocked at 3Ghz or below for 8 core parts.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
The problem with good 256bit AVX/AVX2/FMA performance is you need 256bit paths. And that's a TDP killer. Without these paths my 6700K could maybe have been sold as a 55W TDP CPU.

Same reason why AVX512 is server only.

If Zen gets these 256bit paths, then it will be clocked at 3Ghz or below for 8 core parts.

Yup, it seems like a sensible trade off, and lines up with AMD's long term goals. If you have a workload that benefits from massive FP vectors, buy a GPU!
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
The problem with good 256bit AVX/AVX2/FMA performance is you need 256bit paths. And that's a TDP killer. Without these paths my 6700K could maybe have been sold as a 55W TDP CPU.

Same reason why AVX512 is server only.

If Zen gets these 256bit paths, then it will be clocked at 3Ghz or below for 8 core parts.
I explained before, that AMD likely evaded this option and chose to add decoders, fpu pipelines, caches, etc. as additional full cores instead. The discussion above was about execution efficiency pitfalls of construction cores. They're not only limited by not having 256b paths.

Some computing tasks need high throughput (over all cores). Here both more cores or wider paths can help. Other tasks (e.g. raytracers) need shorter latencies and higher ILP (e.g. not so perfectly well parallelizable code with lots of short, partly independent serial paths).
 

mohit9206

Golden Member
Jul 2, 2013
1,381
511
136
I am not really interested in the deeply technical discussion going on in this thread. All i know is my next cpu upgrade will be sometime next year. So between i3-6100 and Zen quad core APU whichever has better performance will get my money. If the $120 Zen APU has igpu that's atleast as fast as an r7 250x and cpu performance atleast as good as Haswell i3 then that should be enough for me to consider Zen.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
I explained before, that AMD likely evaded this option and chose to add decoders, fpu pipelines, caches, etc. as additional full cores instead. The discussion above was about execution efficiency pitfalls of construction cores. They're not only limited by not having 256b paths.

Some computing tasks need high throughput (over all cores). Here both more cores or wider paths can help. Other tasks (e.g. raytracers) need shorter latencies and higher ILP (e.g. not so perfectly well parallelizable code with lots of short, partly independent serial paths).

Moar cores wont be any better than AVX/FMA. if you cant bundle them in AVX/FMA, you cant really do much better with moar cores either.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
I am not really interested in the deeply technical discussion going on in this thread. All i know is my next cpu upgrade will be sometime next year. So between i3-6100 and Zen quad core APU whichever has better performance will get my money. If the $120 Zen APU has igpu that's atleast as fast as an r7 250x and cpu performance atleast as good as Haswell i3 then that should be enough for me to consider Zen.

Well if Mathias assesments of 4-5mm2 is right, then that should be possible even at lower cost. It also mean imo lean datapaths and clearly some jobs the cpu will not do fine. But for normal users it would be a good upgrade.

For my household i would like my next utrabook to have 4 cores smt, the bigger notebooks to have 8 cores and the same for the desktop - if not 16 cores. When we render in the house its using the gpu on the desktops. Its waste of ressources to have that power in the cpu imo and its also wrong to go to the end of the world to get the last 15%ipc or high freq. I much prefer double the cores. But ofcource that depends on usecase, but with dx12/vulcan on the doorstep games can benefit hugely from many cores.

But i am simply not so confident AMD will reach those high speed. They have always been behind on cache latency for big cores - and now they dont control process so i wouldnt have my hopes high. I just hope we get those smaller cores, that is affordable, so this business can move forward. Its like nothing have happened the last 6 years. Even a K6 so to speak, will make a huge difference for the compettition.
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
I am not really interested in the deeply technical discussion going on in this thread. All i know is my next cpu upgrade will be sometime next year. So between i3-6100 and Zen quad core APU whichever has better performance will get my money. If the $120 Zen APU has igpu that's atleast as fast as an r7 250x and cpu performance atleast as good as Haswell i3 then that should be enough for me to consider Zen.

Well, we don't know what's going to be released by that time, really.
 

DrMrLordX

Lifer
Apr 27, 2000
21,805
11,161
136
Any weakness will be exploited with community benchmark obsession, too, no matter how questionable the relevance of the benchmark is.

If Zen doesn't do AVX2 well, expect the "definitive" benchmark to use it relentlessly. Hopefully Keller was smart enough to realize that the only way to compete with Intel is to copy their designs as much as possible because otherwise there is room for selective benching (like running FP-heavy benches when comparing Intel with FX chips and mostly ignoring integer). It looks like he got the hint by going to SMT.

I . . . yeah, that's sort of what I was thinking.

Let's wait and see. A lot of people in the Zen team came from the cat core team. They didn't do bad on AVX, so they shouldn't with AVX2 on Zen as there is also enough integer SIMD execution hardware. Just look at how Jaguar did vs. Excavator in these examples:

Hmm. Interesting that you brought up Jaguar there. I guess I'll just have to wait and see before I make any final verdicts about how Zen will handle AVX/AVX2 code.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Moar cores wont be any better than AVX/FMA. if you cant bundle them in AVX/FMA, you cant really do much better with moar cores either.

I can see the value in moar cores. I'm not seeing how valuable something like AVX-512 would be when AVX/AVX2 adoption isn't all that high. None of the software I use utilizes AVX2 to any appreciable degree (or at least I'm not using the features that do utilize AVX2).
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,764
4,223
136
I haven't seen much of new data on Zen but I can say that I'm now very confident that it won't be a "failure".

I think we are looking at (if there was 8C/16T) IB-E level of performance (overall similar ST and MT perf.). Now that might not be good enough for many enthusiasts who want the absolute best, but I'd say that a lot of enthusiasts would buy something that costs 60-70% of the price of intel top of the range SKU for around 80-85% of its performance (and features).
 

Atreidin

Senior member
Mar 31, 2011
464
27
86
IMO even getting to sandy bridge or ivy bridge IPC, on average, would be a success and prove AMD can make a relevant processor. That's a big jump, and it isn't like it will be exactly the same relative performance as its Intel counterpart in every program. If they finally get back to at least "pretty close" performance on average, they can probably find one or more niches where they can profit and stay relevant while they develop their successor designs to try to further catch up.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
Any weakness will be exploited with community benchmark obsession, too, no matter how questionable the relevance of the benchmark is.

If Zen doesn't do AVX2 well, expect the "definitive" benchmark to use it relentlessly. Hopefully Keller was smart enough to realize that the only way to compete with Intel is to copy their designs as much as possible because otherwise there is room for selective benching (like running FP-heavy benches when comparing Intel with FX chips and mostly ignoring integer). It looks like he got the hint by going to SMT.

This seems overly cynical. The reason "FP-heavy benches" are ubiquitous isn't because the reviewers were biased against AMD, but because FPU-focused loads are extremely common in ordinary software. You have to stretch to find a handful of relevant modern programs that don't rely heavily on FPU operations (e.g. 7-Zip LZMA and x264).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |