New Zen microarchitecture details

PPB · Feb 1, 2017

Abwx said:
Blender is FP32 (and looking at Computerbase tests it use as much power as Cinebench FI), so that s somewhat indicative of FP32 perf, what is still missing is the FP64 perf, although in this register the improvement is even more dramatic according to the Sisoftware submission..

On the other hand Handbrake use INT code even if there s ops processed in the FPU, these are still operation on integer numbers and it s quite representative of INT perfs, what is left unknown for this latter case are softs like Fritz or Stockfish wich stress the branch predictor, but on this respect AMD has already good perf with previous designs, so there should be nothing unexpected here..

You dont seem to understand. Blender vanilla renderer is as used as 3dmax's, as in no one really. Most renderers use fp32 for their calculations, still, not all of them perform the same.

This is the same if amd showed us cinebench scores for their demos. They dont mean squat in real world scenarios.

So, as I said, lets wait for more benchmarks to get out of the oven near launch day.

Sent from my XT1040 using Tapatalk

dullard · Feb 1, 2017

swilli89 said:
All AMD has to do is come within 10% at a reduced cost and they have a stunner on their hands. Most people (I'm assuming) will gladly take 57 FPS for $$$ hundreds less than a system giving 59 FPS...Of course we need to see some thorough gaming benchmarks to further clarify this point but who wouldn't want to save $300 on their CPU and spend that on a higher quality PSU and a videocard upgrade?

1) 10% less than 59 FPS is 53.1 FPS, not 57 FPS.
2) $300 less than the top Intel gaming CPU (the 7700K which is selling at the moment at many locations for $349) is $49.

I think you are either expecting too much or have math difficulties. AMD is not going to sell a Ryzen chip that comes within 10% of the 7700K for $49.

Agent-47 · Feb 1, 2017

itsmydamnation said:
Zen and broadwell/skylake all have around the same amount of resources per core, It is unlikely in this scenario for Zen to have better SMT on the first attempt, its more likely to be worse.

Also AMD have said they exceeded the 40% ipc claim.....

your numbers need some work.....

SMT can go either ways despite having same resource count. But if the CPU was designed with servers in mind, it will probably go the SMT heavy route . Also AMD has publicly claimed their ifinity fabric does multi threading better and showed demonstrated it in CES with their side by side streaming shootout with i7 6900k.

That is a long jump from 40+ % to 75 %. Thats almost twice. Excess of 40% can be 55% to 60% at best.which is still 25 pc behind.

4 of my 5 build has had AMD CPU. So I am rooting for them. I just don't know how that excess of 40 pc will cut it.

swilli89 said:
Most people (I'm assuming) will gladly take 57 FPS for $$$ hundreds less than a system giving 59 FPS. .

That's how you and I and most here think. Not average Joe unfortunately

Doom2pro · Feb 1, 2017

ButtMagician said:
What happened to 3.4+ GHz base clocks? Or was that only for the 8-core Ryzen?

Well Canard PC reported 3.6 Base and 4.0 Boost on a 8c/16t ES they managed to snag, BitsNChips reported the latest spin is clocking higher than expected and then someone noticed a 3.6 Base 3.9 Boost part on one of the 8c/16t demo machines at AMD's CES booth...

So we pretty much know they exist, but we don't know if that is the new minimum base/boost or the higher end SKU.

Doom2pro · Feb 1, 2017

Agent-47 said:
CPC benchmark suggested that Zen was about 8pc behind BWE clock for clock.

I think there is some obfuscating going on there on the IPC side of things to prevent AMD from discovering the source of the ES used.

inf64 · Feb 1, 2017

Agent-47 said:
That is a long jump from 40+ % to 75 %. Thats almost twice. Excess of 40% can be 55% to 60% at best.which is still 25 pc behind.

I think you are a bit confused.
AMD stated at first that their goal was 40% IPC jump VS Excavator core. XV core is ~15% faster, on average, than Piledriver core we have in FX today. No we have rumors from sweclocers forum that actual IPC increase Vs XV core is closer to 55%. Taking a geometric mean between 1.4 and 1.55 (1.47) and accounting for IPC difference between XV and PD (1.15) we land around 1.47x1.15~=1.69 or around 70% IPC increase Vs PD core. This is ST IPC jump, SMT comes on top of that as per AMD's Zen project lead at Hot Chips' conference Q&A session.

Now we have users(ie. majord) on this forum posting comparisons between XV and Skylake core regarding ST IPC. Long story short , with AVX2 benchmarks in the mix, SKylake is on average 61% faster than XV(1.61x) and BDW-E would according to that land around 1.56x or 56% faster than XV core. Compare that to rumored 1.55x IPC jump and you get the picture where Zen could be IPC wise.

Doom2pro · Feb 1, 2017

inf64 said:
I think you are a bit confused.
AMD stated at first that their goal was 40% IPC jump VS Excavator core. XV core is ~15% faster, on average, than Piledriver core we have in FX today. No we have rumors from sweclocers forum that actual IPC increase Vs XV core is closer to 55%. Taking a geometric mean between 1.4 and 1.55 (1.47) and accounting for IPC difference between XV and PD (1.15) we land around 1.47x1.15~=1.69 or around 70% IPC increase Vs PD core. This is ST IPC jump, SMT comes on top of that as per AMD's Zen project lead at Hot Chips' conference Q&A session.

Now we have users(ie. majord) on this forum posting comparisons between XV and Skylake core regarding ST IPC. Long story short , with AVX2 benchmarks in the mix, SKylake is on average 61% faster than XV(1.61x) and BDW-E would according to that land around 1.56x or 56% faster than XV core. Compare that to rumored 1.55x IPC jump and you get the picture where Zen could be IPC wise.

I wouldn't be surprised or disappointed if Zen's IPC ended up either slightly below Broadwell-E, a statistical tie, or slightly ahead, as others have said here, it's a win win either way for AMD and presumably the consumer as well.

Agent-47 · Feb 1, 2017

inf64 said:
I think you are a bit confused.
AMD stated at first that their goal was 40% IPC jump VS Excavator core. XV core is ~15% faster, on average, than Piledriver core we have in FX today. No we have rumors from sweclocers forum that actual IPC increase Vs XV core is closer to 55%. Taking a geometric mean between 1.4 and 1.55 (1.47) and accounting for IPC difference between XV and PD (1.15) we land around 1.47x1.15~=1.69 or around 70% IPC increase Vs PD core. This is ST IPC jump, SMT comes on top of that as per AMD's Zen project lead at Hot Chips' conference Q&A session.
.

1.47 sounds more than reasonable.

I know what you mean with the 1.15 factor, but that's a very slippery slope there. There is no Full CPU with L3 based on XV, true. But if we multiply XV gains to PD, we are basically making a fictitious core. We don't know if that is what AMD did. Its certainly counter productive, as they could otherwise have claimed "60 % gain over previous gen" which sounds more impressive and certainly would be a more appropriate claim than a 40% gain over a fictious core.

Edit: AMD claimed a 5x increased L3 bandwidth over previous gen (PD). So they certainly have no issue going back to PD for comparison

Dresdenboy · Feb 1, 2017

guskline said:
Nice avatar the Stilt!

This. And while enjoying the day in the worlds biggest indoor water park (710,000 sqft) I had the locker number (FX-)8150. RyZening closed. Confirmed by Dresdenboy™. Time to double up in $AMD.

The Stilt said:
Naturally.
The design dictates that both CCXs are enabled (three cores each), however besides that there are no limitations.

Is that also a requirement for 4C, too? Canard PC Hardware said 4+0, 2+2, 3+1 are possible:
https://twitter.com/CPCHardware/status/826563162250022912

inf64 said:
I think you are a bit confused.
AMD stated at first that their goal was 40% IPC jump VS Excavator core. XV core is ~15% faster, on average, than Piledriver core we have in FX today. No we have rumors from sweclocers forum that actual IPC increase Vs XV core is closer to 55%.

The 55% wave started in this very forum with a posting by The Stilt. I mentioned this on Twitter. Somehow it made its rounds.

The Stilt · Feb 1, 2017

Dresdenboy said:
Is that also a requirement for 4C, too? Canard PC Hardware said 4+0, 2+2, 3+1 are possible:
https://twitter.com/CPCHardware/status/826563162250022912

Both CCX must have same number of enabled cores and same amount of L3. I forgot to mention that unless the other CCX is fully disabled
Following configurations are possible: 1 (1+0), 2 (2+0 or 1+1), 3 (3+0), 4 (4+0 or 2+2), 6 (3+3), 8 (4+4).

DrMrLordX · Feb 1, 2017

The Stilt said:
I cannot disclose any technical details.
However you can tell from the pictures in those slides that it is "sufficient"

That'll have to do for now, then. We'll learn more as the boards hit the market.

zinfamous said:
yeah well if you don't have VR optimized USBs, then your USBs will run out of RAMs for VR. You're going to have to create a GUI to hack in more RAMs.

Then you hack the Gibson.

NeoLuxembourg said:
Canard PC Hardware

Source:https://twitter.com/cpchardware/status/826829044402552833

Huh. That contradicts earlier information from them.

Doom2pro · Feb 1, 2017

DrMrLordX said:
That'll have to do for now, then. We'll learn more as the boards hit the market.

Then you hack the Gibson.

Huh. That contradicts earlier information from them.

Well think about it, what was the first source about a 6c/12t Ryzen part and it's TDP?

Doom2pro · Feb 1, 2017

More info from SA, this time from user Ironlynx: https://videocardz.com/65654/amd-ryzen-6-core-cpu-exists

Confirmation of a 6 core SKU? 3.3 Base, 3.7 Boost - ZD3301BBM6IF4_37/33_Y

itsmydamnation · Feb 1, 2017

Agent-47 said:
SMT can go either ways despite having same resource count.

yes it could, throughput gain from SMT is likely to favor intel.

But if the CPU was designed with servers in mind, it will probably go the SMT heavy route .

AMD have already told us how Zen does SMT at the pipeline level, it is very similar to Skylake, What neither AMD or Intel tell you about is the heuristics that are used to determine what a core should do when there are competing priorities. Michael Clarke has already said this is an area AMD need to work on. This is the area where the experience of 10+ uarch's that intel has designed with SMT that means its likely intel SMT will perform better.

Also AMD has publicly claimed their ifinity fabric does multi threading better and showed demonstrated it in CES with their side by side streaming shootout with i7 6900k.

The Cache throughput per cycle have already been detailed by AMD again they are very comparable to Skylake when we are talking a CCX vs 4 core intel chip ( inter CCX remains to be seen).

That is a long jump from 40+ % to 75 %. Thats almost twice. Excess of 40% can be 55% to 60% at best.which is still 25 pc behind.

SO Excavator isn't 75% behind in cache light workloads, AMD showed us in cache heavy workloads they are around Broadwell E level with Zen. The Zen core is wider, has lower L2 latency, checkpointing, stack engine/memfile better load store handling, uop cache and bigger structures and queue's all these will help narrow both the cache heavy and the cache light workloads.

4 of my 5 build has had AMD CPU. So I am rooting for them. I just don't know how that excess of 40 pc will cut it.

I've been saying since the begging of this thread dont look and arbitrary numbers look at architecture, there is nothing we can see in the architecture that says it can't compete (unlike with bulldozer) with broadwell-E/skylake.

Remember AMD have already said they will be performance per socket competitive with 32core skylake-EP.

Enigma- · Feb 1, 2017

Enigma from SweC here. Have been following this thread for a while We all hope Zen can perform, and it really looks like a strong design, but in practice I hope it can deliver. Those benches of aots with 3.6/4.0 "F4" looks very low performing to me and I don't know if this is a fake or another ES with crippled performance from AGESA:

Zen looks to have a strong FPU for typical 128-bit SSE and big fast L2 so I am expecting some good performance in games and rendering/science apps. From know we can only speculate and it will be a long and silent month until release...

Enigma

AtenRa · Feb 2, 2017

itsmydamnation said:
yes it could, throughput gain from SMT is likely to favor intel.

AMD have already told us how Zen does SMT at the pipeline level, it is very similar to Skylake, What neither AMD or Intel tell you about is the heuristics that are used to determine what a core should do when there are competing priorities. Michael Clarke has already said this is an area AMD need to work on. This is the area where the experience of 10+ uarch's that intel has designed with SMT that means its likely intel SMT will perform better.

I believe it will favor RYZEN, the design allows higher throughput than Skylake per core.
Also, AMD has experience with SMT for a long time know with Bulldozer (FP SMT per Module)

swilli89 · Feb 2, 2017

dullard said:
1) 10% less than 59 FPS is 53.1 FPS, not 57 FPS.
2) $300 less than the top Intel gaming CPU (the 7700K which is selling at the moment at many locations for $349) is $49.

I think you are either expecting too much or have math difficulties. AMD is not going to sell a Ryzen chip that comes within 10% of the 7700K for $49.

1) I used 10% as a mixed use case including all other types of processing beyond gaming. I'm assuming the workloads not shown off by amd will be around that number. But thank you very much for pointing out how small a difference 59 and say, 57fps is. It's indistinguishable.

An 8c16t Ryzen will compete with the 8c16t Intel equivalent, not a $350 7700k. Your entire post is invalid.

itsmydamnation · Feb 2, 2017

AtenRa said:
I believe it will favor RYZEN, the design allows higher throughput than Skylake per core.
Also, AMD has experience with SMT for a long time know with Bulldozer (FP SMT per Module)

No it doesn't the same instruction decode, issue and retire, approx the same size I and FP PRF, around the same scheduler capacity. Dont count width of the core by number of execution ports/units, its not that important especially given the flexibility intel has in terms of what can be scheduled where.

Rngwn · Feb 2, 2017

Enigma- said:
Enigma from SweC here. Have been following this thread for a while We all hope Zen can perform, and it really looks like a strong design, but in practice I hope it can deliver. Those benches of aots with 3.6/4.0 "F4" looks very low performing to me and I don't know if this is a fake or another ES with crippled performance from AGESA:

Zen looks to have a strong FPU for typical 128-bit SSE and big fast L2 so I am expecting some good performance in games and rendering/science apps. From know we can only speculate and it will be a long and silent month until release...

Enigma

Are there any other 6900k/6950x benchmark with the similar setup (ideally with Titan X pascal) and configurations? This one seems to be using the combination of 4k and "Crazy" quality setting. I can't seem to find a good comparison somewhere else.

beginner99 · Feb 2, 2017

Rngwn said:
Are there any other 6900k/6950x benchmark with the similar setup (ideally with Titan X pascal) and configurations? This one seems to be using the combination of 4k and "Crazy" quality setting. I can't seem to find a good comparison somewhere else.

This whole result seems strange or shall I say fake? Resolution is 0x0 and Game version is 1.5. You can't even search for any benches below version 2.0. But yeah, if you just take 4k crazy results, this looks very bad.

lolfail9001 · Feb 2, 2017

Enigma- said:
Enigma from SweC here. Have been following this thread for a while We all hope Zen can perform, and it really looks like a strong design, but in practice I hope it can deliver. Those benches of aots with 3.6/4.0 "F4" looks very low performing to me and I don't know if this is a fake or another ES with crippled performance from AGESA:

Zen looks to have a strong FPU for typical 128-bit SSE and big fast L2 so I am expecting some good performance in games and rendering/science apps. From know we can only speculate and it will be a long and silent month until release...

Enigma

Here are the 3 red flags:
1. Version 1.50 cannot exist.
2. Build number is too high for ANY 1.xx version. And is not on the list of 2.xx either.
3. CPU name is NOT "AMD Eng Sample ZD36....". The "Ryzen" in the name is the clean give away something is wrong.
3.5. Oh, and the user name... It exists but it is clear as air. Maybe it is legit, but first 2 flags have to be addressed first.

bjt2 · Feb 2, 2017

itsmydamnation said:
Zen and broadwell/skylake all have around the same amount of resources per core, It is unlikely in this scenario for Zen to have better SMT on the first attempt, its more likely to be worse.

Also AMD have said they exceeded the 40% ipc claim.....

your numbers need some work.....

Zen can do 4 int PLUS 4 fp per cycle. SKL can do 4 int OR 3 vec int or 2 FP or any combination up to 4 uop/cycle. 8 uop/cycle versus 4 uop/cycle for 2 threads. How come that Ryzen SMT gain less than INTEL's?

bjt2 · Feb 2, 2017

itsmydamnation said:
No it doesn't the same instruction decode, issue and retire, approx the same size I and FP PRF, around the same scheduler capacity. Dont count width of the core by number of execution ports/units, its not that important especially given the flexibility intel has in terms of what can be scheduled where.

most of the stack uops are deleted from the stream by the stack memfile, moreover the decoding is broken into 2 parts: the high level that translated almost all the x86 instructions into one microop, INCLUDING microcoded instructions, that occupy one slot in early stages and uop cache (contrary to INTEL) and are expanded just before dispatching. The uop cache is bigger and not bloated by microcoded instructions as they occupy only one slot. Moreover we have 10 uops/cycle executable for two threads, plus those executed in the stack/memfile stage, that does not consume any ROB/PRF/queue/cache ports resources, as they are resolved earlier. Finally jumps: Zen can do always 2 jumps/cycle. SKL/KBL can do 2 only if the second port is not occupied by an FP or vecint instruction. Zen does not have this problem.

itsmydamnation · Feb 2, 2017

bjt2 said:
Zen can do 4 int PLUS 4 fp per cycle. SKL can do 4 int OR 3 vec int or 2 FP or any combination up to 4 uop/cycle. 8 uop/cycle versus 4 uop/cycle for 2 threads. How come that Ryzen SMT gain less than INTEL's?

Because you have to SUSTAIN that cycle after cycle for it to make a difference, both can only decode 4 x86 ops and most x86 ops are 1 uop for both. So for all these extra ports to matter you have to be able to feed them and neither Zen or Skylake can Feed more ~6 uops from uop-cache or 4 from Decode.

They also both have the ~same amount of L/S and all the other structures i mentioned, when you have FP workloads for example you will see very high percentage of ops being Loads or Stores, with two threads that will bottleneck both Skylake and Zen before port congestion.

Now you need to find me this workload that is both scalar and SIMD heavy concurrently has a minimum ipc of 2 and is the perfect fit for SMT without bottlenecking the L/S system.

The perfect example of why all these theoretical super high cocurrent port usage doesn't matter is actually 256bit AVX SB/IB vs haswell. Both have the same amount of execution width but haswell is significantly faster because those FP heavy workloads because for 256bit ops it has twice the load and store bandwidth,

So answer me how is Zen going to SUSTAIN 8 128bit reads and 4 128bit writes a cycle when it can only get 2 reads and 1 write a cycle, its very common to see FP workloads with >50% of operations being loads or stores.

bjt2 said:
most of the stack uops are deleted from the stream by the stack memfile, moreover the decoding is broken into 2 parts: the high level that translated almost all the x86 instructions into one microop, INCLUDING microcoded instructions, that occupy one slot in early stages and uop cache (contrary to INTEL) and are expanded just before dispatching. The uop cache is bigger and not bloated by microcoded instructions as they occupy only one slot. Moreover we have 10 uops/cycle executable for two threads, plus those executed in the stack/memfile stage, that does not consume any ROB/PRF/queue/cache ports resources, as they are resolved earlier. Finally jumps: Zen can do always 2 jumps/cycle. SKL/KBL can do 2 only if the second port is not occupied by an FP or vecint instruction. Zen does not have this problem.

You dont need to tell me how an X86 processor works, you also do not have, 10uops a cycle. you have 6. upto 6 to int and upto 4 to FP.

edit: before you try to claim its additive to 10uops please explain then why Micheal Clake says they have wider retire then dispatch because it helps to clear out the retire queue and get more instructions in flight, is he lying?

Agent-47 · Feb 2, 2017

itsmydamnation said:
AMD have already told us how Zen does SMT at the pipeline level, it is very similar to Skylake, What neither AMD or Intel tell you about is the heuristics that are used to determine what a core should do when there are competing priorities. Michael Clarke has already said this is an area AMD need to work on. This is the area where the experience of 10+ uarch's that intel has designed with SMT that means its likely intel SMT will perform better.

Like you agreed, it can go either ways

itsmydamnation said:
The Cache throughput per cycle have already been detailed by AMD again they are very comparable to Skylake when we are talking a CCX vs 4 core intel chip ( inter CCX remains to be seen).

So? aMD have demoed a 8c16t zen having less lag while streaming than an i7 6900k and attributed it to better multithreading gains due its infinity fabric. I.e. gains from multicore and SMT.

itsmydamnation said:
SO Excavator isn't 75% behind in cache light workloads,

Now you are cherry picking results. I said on average. Also amd said average.

itsmydamnation said:
AMD showed us in cache heavy workloads they are around Broadwell E level with Zen. The Zen core is wider, has lower L2 latency, checkpointing, stack engine/memfile better load store handling, uop cache and bigger structures and queue's all these will help narrow both the cache heavy and the cache light workloads.

They are better than BD, but we don't know how they stack up against Intel.

itsmydamnation said:
I've been saying since the begging of this thread dont look and arbitrary numbers look at architecture, there is nothing we can see in the architecture that says it can't compete (unlike with bulldozer) with broadwell-E/skylake.

Those are just theory note written down by the architect. On paper BD was also a monster which led to it hype before. I think you should not try to fly so high on arbitrary details with no benchmark to back you.

itsmydamnation said:
Remember AMD have already said they will be performance per socket competitive with 32core skylake-EP.

Lol. Yes they did. Compitative. To a server socket with 32 core. Yes indeed. but:
1. servers usually have better SMT.
2. At 32 core its clocked lower, so amd may be referring to its better TDP for lower clocked parts. That will allow them to clock better than Intel at lower TDP. Otherwise they could have picked any CPU. Most importantly, it has nothing about IPC.

Dont be so arrogant all the time sir, there are far smarter people in this thread. don't make arbitrary connections to make your case like juan does it for Intel.

New Zen microarchitecture details

Golden Member

Elite Member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Golden Member

Golden Member

Lifer

Senior member

Senior member

Diamond Member

Junior Member

Lifer

Golden Member

Diamond Member

Member

Diamond Member

Golden Member

Senior member

Senior member

Diamond Member

Senior member