AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Arachnotronic · Nov 20, 2016

It's pretty clear from your results too that Zen will offer lower performance per clock than Skylake does by a significant margin.

Skylake is also likely to clock better.

The main selling point for SR may be multithreaded performance per $.

witeken · Nov 20, 2016

Arachnotronic said:
The main selling point for SR may be multithreaded performance per $.

But that's exactly what AMD tried with Bulldozer. Zen was touted was a high IPC architecture to go head to head with Intel.

Arachnotronic · Nov 20, 2016

witeken said:
But that's exactly what AMD tried with Bulldozer. Zen was touted was a high IPC architecture to go head to head with Intel.

its a lot higher than XV!

witeken · Nov 20, 2016

Arachnotronic said:
its a lot higher than XV!

But still not enough. It's as if Intel was also offering a 14nm Sandy Bridge SKU. Sure, if the price is low enough they will sell some units.

cdimauro · Nov 20, 2016

bjt2 said:
AMD didn't say if the 40% was minimum, mean, maximum, but looking at blender, the maximum is at least +80%. This imply that 40% at least is the mean value, that is reasonable. Stating that 40% is the minimum in an official statement is very dangerous, because a single example of less than 40% gain would suffice to be wrong.
Thinking better on this subject, i think (and probabily can be verified lokking at the small notes) that AMD intended mean IPC, as it is reasonable.

IPC is a mean, and that's what I expect to be used when more applications are involved to calculate some "aggregated" IPC.

In the case of Zen you can safely assume a weighted mean between 4 and 6 upos/cycle, weighted by the uop cache hit/miss ratio. Because if there is a branch misprediction, the uops can still be taken on the uop cache, so if there is an hit, the throughput is still 6 uops/cycle.
You don't need the decoder again, except if the branch is in a never taken piece of code... But we are talking of loops here, right? Because a single path without loops has a negligible execution time, compared to loops of code and so in the total performance evaluation can be safely ignored. We are talking of program that take at least seconds to be executed, so piece of code executed only one time and first execution of a loop, were the uop cache probabily would miss, are a negligile fraction of the total execution time...

Well, even emulators have loops, and a bigger loop (the main, "evaluation", loop at least), but it doesn't mean that a uop cache can be of any benefit.

If you take a look at the main loop of an emulator, for example, you'll see a big (C-like) switch statement which usually is internally translated to an unconditional jump to a pointers table to execute the code of specific instructions. And so on with other switches for handling other cases, especially with more complex architectures to be emulated (x86 and 68K are primary examples, with the latter which is even more complicated due to 16-bit opcodes).

Virtual machines for programming languages have very similar behaviors.

FIVR said:
Geez, 74 pages and 73 of them are special friends ELF and Cidimauro arguing about what "40%" means, as if that has any bearing on current zen performance whatsoever. Can we give these trolls their own thread to fill with this drivel? It's pedantic and totally unrelated to technology, it's also threadcrapping at this point.

Discussions about IPC are perfectly in-topic. Ignore them if you don't like, but don't invoke censorship.
And if you have problem with specific members here, there's an ignore list.

KTE said:
I expected 40% to be the minimum actually, and I have seen it quoted somewhere here, but when searching I could not find a direct reference, except:

First point suggests pretty clearly that 40% is with the SMT. Otherwise why list it under how the 40% IPC is gained?

Going off HotChips, I'd say it's IPC pretty clearly, and that means per core. Not 40% performance.

Sent from HTC 10
(Opinions are own)

The slide talks about core improvements, so I agree with you that it should be referred to SMT.

Which is obvious, considering how a core works (it commits instructions which come from ANY thread), and the definition of IPC.

And I still think that it's completely non-sense talking of "ST IPC", unless you are measuring a purely ST application (in which case they match): a MT application will use BOTH (in a SMT-2 design) hardware threads to achieve the goal, and so the core will execute/commit instructions of BOTH threads.

P.S. I know that, after the presentation, an AMD executive talked about "ST IPC" for the +40%.

NTMBK · Nov 20, 2016

witeken said:
But that's exactly what AMD tried with Bulldozer. Zen was touted was a high IPC architecture to go head to head with Intel.

Bulldozer was also an efficiency disaster, while Zen could be alright. Could still be alright for servers and laptops (i.e. where the money is), even if it doesn't clock high enough to win high end desktops.

bjt2 · Nov 20, 2016

cdimauro said:
Well, even emulators have loops, and a bigger loop (the main, "evaluation", loop at least), but it doesn't mean that a uop cache can be of any benefit.

If you take a look at the main loop of an emulator, for example, you'll see a big (C-like) switch statement which usually is internally translated to an unconditional jump to a pointers table to execute the code of specific instructions. And so on with other switches for handling other cases, especially with more complex architectures to be emulated (x86 and 68K are primary examples, with the latter which is even more complicated due to 16-bit opcodes).

Virtual machines for programming languages have very similar behaviors.

I remember your article on the pyton interpreter... In this case it's true that it's a big switch, but as with common instructions in CPU, that are encoded with shorter codes or are decoded in one uop (the famous 96% of instructions decoded in 1 uop), this can be said also for emulators. If the uop cache is big enough, you will find in it the most used code snipped, corresponding to the most used emulated instructions... I bet that more than 90% of emulated code can be reduced to a few dozen instructions, and the switch branches that take care of them can be safely stored in a 1k-2k uops cache... When a less used opcode must be emulated, the new snippet kicks out a less used bunch of instructions... But the majority would stay in cache... This is in essence the task of a cache...

mikk · Nov 20, 2016

https://www.heise.de/newsticker/mel...enCompute-mit-ROCm-aber-kein-Zen-3491050.html

- No samples were presented/showcased at SC16 conference, not even behind the public
- In NDA sessions only slides were presented
- In SPECint2006 AMD replaced Intel with gcc compiler and the performance for 32 core Naples was slightly below Haswell
- Results are without Hyper-Threading because of somes issues
- When all is working AMD claims they will be on par with Skylake EP (Xeon E5 2698v5), but there is no info on what this is based on or if this is even based on some real comparisons rather than predictions, also AMD remained silent about some other important HPC performance metrics (e.g. SPECfp, Linpack)

They didn't say which Haswell was used but given that Haswell maxed out at 16 or 18 cores, it doesn't sound great because Skylake will be a different tier with up to 28 cores or 32 cores for Custom SKUs. To me it sounds suspicious, the silence/issues/lack of samples etc.

inf64 · Nov 20, 2016

mikk said:
https://www.heise.de/newsticker/mel...enCompute-mit-ROCm-aber-kein-Zen-3491050.html

- No samples were presented/showcased at SC16 conference, not even behind the public
- In NDA sessions only slides were presented
- In SPECint2006 AMD replaced Intel with gcc compiler and the performance for 32 core Naples was slightly below Haswell
- Results are without Hyper-Threading because of somes issues
- When all is working AMD claims they will be on par with Skylake EP (Xeon E5 2698v5), but there is no info on what this is based on or if this is even based on some real comparisons rather than predictions, also AMD remained silent about some other important HPC performance metrics (e.g. SPECfp, Linpack)

They didn't say which Haswell was used but given that Haswell maxed out at 16 or 18 cores, it doesn't sound great because Skylake will be a different tier with up to 28 cores or 32 cores for Custom SKUs. To me it sounds suspicious, the silence/issues/lack of samples etc.

Thanks for the summary.
Maybe AMD used same core counts when they compared Naples to Haswell parts? Ie. comparing 2P Naples (32x2) with 4P Haswell(16x4)?
These "issues" that made them turn off SMT look like the rumor from that forum lol. It claimed that there is a bug that affects the performance up to 40% when software fix is applied..

Sweepr · Nov 20, 2016

mikk said:
- In SPECint2006 AMD replaced Intel with gcc compiler and the performance for 32 core Naples was slightly below Haswell

Not being able to match 2014 Haswell-EP three years later with a core count advantage doesn't look promising - up to 18C/36T vs 32C/32T if they had SMT disabled. Puts a big question mark on their claim/plan of matching Skylake-EP as well - how do they know Xeon E5 2698 v5's performance - projections?

bjt2 · Nov 20, 2016

mikk said:
https://www.heise.de/newsticker/mel...enCompute-mit-ROCm-aber-kein-Zen-3491050.html

- No samples were presented/showcased at SC16 conference, not even behind the public
- In NDA sessions only slides were presented
- In SPECint2006 AMD replaced Intel with gcc compiler and the performance for 32 core Naples was slightly below Haswell
- Results are without Hyper-Threading because of somes issues
- When all is working AMD claims they will be on par with Skylake EP (Xeon E5 2698v5), but there is no info on what this is based on or if this is even based on some real comparisons rather than predictions, also AMD remained silent about some other important HPC performance metrics (e.g. SPECfp, Linpack)

They didn't say which Haswell was used but given that Haswell maxed out at 16 or 18 cores, it doesn't sound great because Skylake will be a different tier with up to 28 cores or 32 cores for Custom SKUs. To me it sounds suspicious, the silence/issues/lack of samples etc.

32x2 cores, vs 18x2+HTT cores, right? What frequencies? Haswell 18 core I suppose was the 2.3Ghz model, right? What about Zen? If it was the early ES at 1.44GHz, it's not bad...

mikk · Nov 20, 2016

There is no info about Haswell or Zen frequency, I also wonder if they did disable HT on it as well. They surely didn't use a worst case bench for Zen, more likely a better bench for Zen.

KTE · Nov 20, 2016

cdimauro said:
The slide talks about core improvements, so I agree with you that it should be referred to SMT.

Which is obvious, considering how a core works (it commits instructions which come from ANY thread), and the definition of IPC.

And I still think that it's completely non-sense talking of "ST IPC", unless you are measuring a purely ST application (in which case they match): a MT application will use BOTH (in a SMT-2 design) hardware threads to achieve the goal, and so the core will execute/commit instructions of BOTH threads.

P.S. I know that, after the presentation, an AMD executive talked about "ST IPC" for the +40%.

My own opinion is that it is looking to end up as 20-25% slower than Intels SKL "all around". I don't care to dwell into the ST/MT depths just yet.

Part of that will be SKLs clockspeed advantage, which I'm sure of.

IPC wise, I couldn't put a figure on it.

This is the same difference BD 8150 had against SNB 2600k, and the top models were even more ahead.

AMD typically loses a lot of revenue due to its repeated quarterly delays. Such delays buy a lot of time for the competitor.

They fail miserably in ambush or sudden counter fire.

Sent from HTC 10
(Opinions are own)

KTE · Nov 20, 2016

NTMBK said:
Bulldozer was also an efficiency disaster, while Zen could be alright. Could still be alright for servers and laptops (i.e. where the money is), even if it doesn't clock high enough to win high end desktops.

Currently, in the big datacentres of Europe, the main Virtualized and Cloud infrastructure... I've seen none using AMD chips. As sad as that sounds.

And I deal in this field with the biggest corps, daily.

Sent from HTC 10
(Opinions are own)

Nothingness · Nov 20, 2016

mikk said:
- In SPECint2006 AMD replaced Intel with gcc compiler and the performance for 32 core Naples was slightly below Haswell

Replacing icc with gcc is the way to go to make fair comparisons for SPEC given how Intel tuned their compiler for that benchmark.

also AMD remained silent about some other important HPC performance metrics (e.g. SPECfp, Linpack)

It was my understanding that AMD isn't aiming at HPC. Anyway they don't stand a chance against upcoming AVX-512 Xeon CPU (unless Intel has to drastically reduce frequency when running AVX-512 code).

They didn't say which Haswell was used but given that Haswell maxed out at 16 or 18 cores, it doesn't sound great because Skylake will be a different tier with up to 28 cores or 32 cores for Custom SKUs. To me it sounds suspicious, the silence/issues/lack of samples etc.

Yes, bad feeling here too

KTE · Nov 20, 2016

mikk said:
https://www.heise.de/newsticker/mel...enCompute-mit-ROCm-aber-kein-Zen-3491050.html

- No samples were presented/showcased at SC16 conference, not even behind the public
- In NDA sessions only slides were presented
- In SPECint2006 AMD replaced Intel with gcc compiler and the performance for 32 core Naples was slightly below Haswell
- Results are without Hyper-Threading because of somes issues
- When all is working AMD claims they will be on par with Skylake EP (Xeon E5 2698v5), but there is no info on what this is based on or if this is even based on some real comparisons rather than predictions, also AMD remained silent about some other important HPC performance metrics (e.g. SPECfp, Linpack)

They didn't say which Haswell was used but given that Haswell maxed out at 16 or 18 cores, it doesn't sound great because Skylake will be a different tier with up to 28 cores or 32 cores for Custom SKUs. To me it sounds suspicious, the silence/issues/lack of samples etc.

Comparing vs Haswell is a move that makes sure of the fact that they have a product that is noncompetitive with Intel 2014, 2015, 2016, 2017. There's problems, for sure. My alarm bells are now exploding :/

Sent from HTC 10
(Opinions are own)

jpiniero · Nov 20, 2016

TBH I am not really reading too much into the performance metrics; but it is disturbing that SMT is still buggy this late. Makes you wonder when Zen Server will actually be released.

Glo. · Nov 20, 2016

The information about SMT disabled because of a bug is exactly in line with information posted on carbonite forum few weeks ago, and largely concluded as being faked.

Arachnotronic · Nov 20, 2016

If SMT still isn't working at this point then there is something wrong with the core. That's NOT a good sign.

The Stilt · Nov 20, 2016

Glo. said:
The information about SMT disabled because of a bug is exactly in line with information posted on carbonite forum few weeks ago, and largely concluded as being faked.

The "leak" didn't say anything about SMT being disabled. On the contrary it claimed that: "AMD’s Hyper Threading is called SMU and it is ************ good. The same efficiency as Intel’s HT."

"There are some errata issues present in the current testing samples, similar in a way to the TLB bug of the Phenom. The workaround right now is done via the BIOS. The workaround however, strips around 30 ~ 40% of the CPU performance."

Also SMT being disabled generally doesn't result in 30-40% performance penalty.

Besides AMD has already demoed Zeppelin with SMT enabled (Blender).

Arachnotronic · Nov 20, 2016

KTE said:
Comparing vs Haswell is a move that makes sure of the fact that they have a product that is noncompetitive with Intel 2014, 2015, 2016, 2017. There's problems, for sure. My alarm bells are now exploding :/

Sent from HTC 10
(Opinions are own)

Intel's data center group gets an annual budget that's larger than AMD's total budget. That doesn't even include all of the money spent on other technologies within Intel that its data center group pulls from.

Why is ANYBODY surprised that AMD won't be able to go toe to toe with Intel on its own turf?

blake0812 · Nov 20, 2016

So Zen is a bust? It's good? I'm getting conflicting messages here.

Arachnotronic · Nov 20, 2016

NTMBK said:
Bulldozer was also an efficiency disaster, while Zen could be alright. Could still be alright for servers and laptops (i.e. where the money is), even if it doesn't clock high enough to win high end desktops.

In volume notebooks, AMD will have to deal with CNL-U...at least a full generation behind on process technology again.

In servers, Zen needs to go up against Skylake-EP, which will certainly be a formidable architecture.

Gaining share against Intel is not going to be easy.

lolfail9001 · Nov 20, 2016

blake0812 said:
So Zen is a bust? It's good? I'm getting conflicting messages here.

We don't know, but the only truly positive message we have had was that Blender demo with all that associates with it. Others were either mixed, like that "leak" (or made-up BS, that it looks like) or straight negative (those geekbench leaks, that AotS leak, that article).

jpiniero · Nov 20, 2016

blake0812 said:
So Zen is a bust? It's good? I'm getting conflicting messages here.

You're going to have to wait until release to know really.

AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Lifer

Diamond Member

Lifer

Diamond Member

Member

Lifer

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Lifer

Diamond Member

Lifer

Golden Member

Lifer

Senior member

Lifer

Golden Member

Lifer