AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

cdimauro · Nov 22, 2016

bjt2 said:
In the golden days of Pentium4, INTEL did the same, maybe even worse: the INTEL compiler let run optimized code only on INTEL CPUs and many softwares were compiled with it...

That's not true. Intel compiler optimized code for SOME of its micro-architectures. It means that the optimization didn't worked even some Intel processors.

The reason is "simple": such optimizations work on a micro-architecture basis, instead of features basis.

Which is obvious if you take a look at the optimization manual, and see how different are and behave micro-architectures, even if they have the same features set.

Ok, but INTEL is in the same situation...

Sure.

Actually worse for the L1I cache, because Zen has double the value...

The kind is different.

I would guess that the fetch and other common code are exactly the same instructions that probabily sit in the L1I and uop caches, because they are used at every emulated instruction and the LRU algorithm don't let them be evicted...

Yes, it MIGHT be the case but... who knows how actually and really work? Unfortunately there's no such (useful) information.

KTE · Nov 22, 2016

Riisen was my gaming nick coming back to compete in CB after 6 years

Zacate was horribly unsuitable in ST or Gfx performance for the bigger laptops. It was heavily hamstrung by the shared memory. Cheap HD 5450 1GB performed miles better in practicality, although unexpectedly.

Yes, it killed Atom but it was pitted against i3 by the OEMs more often than not, where it left a sour taste.

It was the IPC that really showed it's ugly neck back then.

Sent from HTC 10
(Opinions are own)

bjt2 · Nov 22, 2016

cdimauro said:
That's not true. Intel compiler optimized code for SOME of its micro-architectures. It means that the optimization didn't worked even some Intel processors.

The reason is "simple": such optimizations work on a micro-architecture basis, instead of features basis.

Which is obvious if you take a look at the optimization manual, and see how different are and behave micro-architectures, even if they have the same features set.

Well, on Cyryx CPU there were the possibility to fake the CPUID identification strings (amd maybe also the codes) and all INTEL compiled programs gained performance (even 50%+)... And if i am not wrong a virtualization software let do this for all CPUs and many tests were made. INTEL was condemned for this. 1 billion dollars if I remember well. Plus some other things like relax x86 license for AMD (there was the obligation to own a fab).
The truth is that for INTEL CPUs (and as you pinpointed not all of them) the correct optimized path was used. For the other INTEL CPUs probabily only a path coherent with the feature set provided by CPUID was selected (this would have been the correct behaviour for all CPUs, even non INTEL, if there is not an optimized path). For non INTEL CPU the slowest path (486/x87 for older compilers, up to SSEn for newer compiler) was selected.

cdimauro said:
The kind is different.

The L1I? 64KB for Zen and 32KB for INTEL. The ways are higher on INTEL, so AMD Zen cache is not so effective, but it's bigger.

cdimauro said:
Yes, it MIGHT be the case but... who knows how actually and really work? Unfortunately there's no such (useful) information.

Well, if it fits in cache (uop or L1I or L2), you spare some cycles... Maybe after a big emulated instruction the uop cache is wiped out, but i doubt that a single emulated instruction has more than 32/64kb of code...

sirmo · Nov 22, 2016

cdimauro said:
That's not true. Intel compiler optimized code for SOME of its micro-architectures. It means that the optimization didn't worked even some Intel processors.

I think he's referring to Intel cheating by having its compiler inject GenuineIntel authentication, and then taking the unoptimized path when Intel CPU wasn't detected.

http://www.agner.org/optimize/blog/read.php?i=49

TheELF · Nov 22, 2016

.vodka said:
We have this confirmed, and then we have that blender benchmark where 8C16T Zen equals 8C16T BW-E...

That blender benchmark,wasn't a benchmark,blender has a bunch of complex scenes available for download that they deem complex enough to compare CPUs with,what AMD ran was just a publicity stunt.
It's like running FSX on a celeron compared to a top FX-9xxx it doesn't say much.

sirmo · Nov 22, 2016

TheELF said:
That blender benchmark,wasn't a benchmark,blender has a bunch of complex scenes available for download that they deem complex enough to compare CPUs with,what AMD ran was just a publicity stunt.
It's like running FSX on a celeron compared to a top FX-9xxx it doesn't say much.

A different scene isn't going to change the instructions used to render it. Ray tracing isn't heuristic.

Nothingness · Nov 22, 2016

cdimauro said:
That's not true. Intel compiler optimized code for SOME of its micro-architectures. It means that the optimization didn't worked even some Intel processors.

The reason is "simple": such optimizations work on a micro-architecture basis, instead of features basis.

You are wrong. This was more about instruction usage than micro-architecture tweaks.

krumme · Nov 22, 2016

KTE said:
Riisen was my gaming nick coming back to compete in CB after 6 years

Zacate was horribly unsuitable in ST or Gfx performance for the bigger laptops. It was heavily hamstrung by the shared memory. Cheap HD 5450 1GB performed miles better in practicality, although unexpectedly.

Yes, it killed Atom but it was pitted against i3 by the OEMs more often than not, where it left a sour taste.

It was the IPC that really showed it's ugly neck back then.

Sent from HTC 10
(Opinions are own)

You can compare a Ferrari to a Kia and say the Kia sucks. But its still nonsense.

Zakate was fine for small laptops. 50% faster in cpu than atom and 300% faster in gpu and cheaper to make. And actually had a working gpu. For that it sold 50M units. No wonder. It wasnt excactly bd that kept amd afloate. Zakate was a fantastic product.

And with the margins Intel is having ofcource oem was using it on 14-17 laptops. The same way atom is used today. As most of those laptops comes with a mechanical hd i find it a good match for office use.

Now enthusiast want zen to be both fast as skl e and cheap and hate atom speeds. Well its like discussing economic priorities with your wife.

KTE · Nov 22, 2016

krumme said:
You can compare a Ferrari to a Kia and say the Kia sucks. But its still nonsense.

Zakate was fine for small laptops. 50% faster in cpu than atom and 300% faster in gpu and cheaper to make. And actually had a working gpu. For that it sold 50M units. No wonder. It wasnt excactly bd that kept amd afloate. Zakate was a fantastic product.

And with the margins Intel is having ofcource oem was using it on 14-17 laptops. The same way atom is used today. As most of those laptops comes with a mechanical hd i find it a good match for office use.

Now enthusiast want zen to be both fast as skl e and cheap and hate atom speeds. Well its like discussing economic priorities with your wife.

If the Ferrari cost the same as Kia, EVERYONE will take the Ferrari

Sent from HTC 10
(Opinions are own)

sirmo · Nov 22, 2016

Zacate was great, it came at a wrong time though. iPad came out and and tablets ate up the netbook market.

Dresdenboy · Nov 22, 2016

TheELF said:
Did they state that it's 40% avg?
Yes 40 % could be throughput only it could be speed only it could be a mix of both or it could be whatever.
When people here state same IPC as... what are they refering to?Speed or throughput or avg in a number of cases?
What do you call the speed if you have one thread that only needs one command per cycle?That's where intel is so much better,it has a lot of things that improve cycles per instruction.

Too many unknowns...

Being some kind of average is the most likely outcome, as for being the minimum it is far too high (could even be negative) and for being the maximum, we simply can't rule out that someone might create some code which runs at least at a 50% higher IPC on Zen by simply hitting some of the many bottlenecks in XV, which are not present in Zen.

But how would "avg" compare to metrics like throughput? IPC says what it is, just there are some variants like IPC for 1T (seems to be quasi standard) or multiple threads on a core. Avg, min, max is a qualifier. Then there can be avg IPCs over the whole runtime or time slices. The latter varies a lot for a single thread (say between 0.01 and 5).

That 1 IPC thread has a running time of #instructions/IPC/f in seconds, a throughput of 1 instructions per clock, a "speed" of IPC*f instructions per second. What is your point?

Here's also a nice example of how to think of SMT using issue slots in a wide microarchitecture. I assume the big gap is some longer latency op with the 3 following ones depending on it. With a superscalar uarch, this will be denser, but there will bubbles most of the time.

Atari2600 · Nov 22, 2016

Going back to the Bobcat discussion and also considering the generally poor product offerings with the Llano, Trinity and Kaveri...

I firmly believe AMD should have paid a small builder to make to an AMD spec, laptops and netbooks and sell them directly on the AMD site.

The message to the OEMs being clear - if you are going to cripple our image by selling our products outside its intended market niche to line your own pockets then we will take steps to remedy matters.

I looked several times on shop.amd and around the net for the high-end A12 APUs with a quality 17-18" screen & >4GB RAM . Repeatedly found nothing.

MajinCry · Nov 22, 2016

VirtualLarry said:
I think that the x300 was what I had on an old HP laptop with XP. Played Unreal Tournament (original) really well at 640x480. I'm having some trouble remembering the CPU. It could have been a Pentium M, or maybe it was an AMD Sempron.

In me case, it was a Pentium M. Thing had twice the performance of a Pentium 4, but ran at half the clock.

Still played games better than my pissy i3 350m.

Atari2600 · Nov 22, 2016

Oh and my expectations for launch top-of-line 8C16T Zen are:

Base clock: 3.0 GHz
Turbo: 3.2 GHz

IPC, single thread: Broadly similar to Sandy Bridge
IPC, SMT thread: Broadly similar to Nehalem

#OnceBitten

KTE · Nov 22, 2016

Atari2600 said:
Going back to the Bobcat discussion and also considering the generally poor product offerings with the Llano, Trinity and Kaveri...

I firmly believe AMD should have paid a small builder to make to an AMD spec, laptops and netbooks and sell them directly on the AMD site.

The message to the OEMs being clear - if you are going to cripple our image by selling our products outside its intended market niche to line your own pockets then we will take steps to remedy matters.

I looked several times on shop.amd and around the net for the high-end A12 APUs with a quality 17-18" screen & >4GB RAM . Repeatedly found nothing.

Mate, even the 15.6" screens paired up with them were horrible. The saturation and contrast, or huge lack of, was visible in basic reviews.

Sent from HTC 10
(Opinions are own)

MajinCry · Nov 22, 2016

Atari2600 said:
Oh and my expectations for launch top-of-line 8C16T Zen are:

Base clock: 3.0 GHz
Turbo: 3.2 GHz

IPC, single thread: Broadly similar to Sandy Bridge
IPC, SMT thread: Broadly similar to Nehalem

#OnceBitten

Someone's fairly conservative. Sandybridge would be the +40% claim over Piledriver, tho' we've been told Excavator, which is Haswell territory.

Honestly, though, I'd be fine with a bunch of cores 'n' Sandybridge performance, but I'm not someone with alot of cash to burn, so that probably steers me opinion.

Arachnotronic · Nov 22, 2016

MajinCry said:
Someone's fairly conservative. Sandybridge would be the +40% claim over Piledriver, tho' we've been told Excavator, which is Haswell territory.

Honestly, though, I'd be fine with a bunch of cores 'n' Sandybridge performance, but I'm not someone with alot of cash to burn, so that probably steers me opinion.

+40% over XV is likely closer to Sandy/Ivy-class than Haswell/Broadwell. The leaks we have seen of Summit Ridge point to performance/MHz lower than Haswell (I'm thinking of the AoTS leak in particular).

We'll only know the truth when these CPUs are run through a comprehensive test suite by the third party reviewers, though.

coffeemonster · Nov 22, 2016

Atari2600 said:
considering the generally poor product offerings with the Llano, Trinity and Kaveri...

carizzo had the poor offerings. Trinity/Richland/Kaveri could be found in nice laptops with proper ram setup and quality screens. I have one.

I looked several times on shop.amd and around the net for the high-end A12 APUs with a quality 17-18" screen & >4GB RAM . Repeatedly found nothing.

Depending on what you consider high-end there are machines like this: http://m.costco.com/.product.100312901.html and this: http://www8.hp.com/ca/en/products/laptops/product-detail.html?oid=11084775#!tab=specs out there now.

MajinCry · Nov 22, 2016

Arachnotronic said:
+40% over XV is likely closer to Sandy/Ivy-class than Haswell/Broadwell. The leaks we have seen of Summit Ridge point to performance/MHz lower than Haswell (I'm thinking of the AoTS leak in particular).

We'll only know the truth when these CPUs are run through a comprehensive test suite by the third party reviewers, though.

'Eres me bollocksy math from ages ago in this thread:

Piledriver == Phenom II @ equal clocks
Phenom II @ 3.4ghz == Nehalem @ 2.7ghz (700mhz deficit)

Sandybridge = Nehalem + 40% performance
Sandybridge @ 3ghz == Nehalem @ 4.2ghz

Steamroller = Piledriver * 1.05 performance
Excavator = Steamroller * 1.05 performance
Excavator @ 3.4ghz == Piledriver @ 3.7ghz
Excavator == Nehalem @ 3.0ghz (400mhz deficit)

Excavator + 40% performance == Nehalem @ 4.2Ghz

Eh, you're right. If we're drawing on just clockspeeds 'n' such, Zen is at Sandybridge level. Still, uses much less power with more cores 'n' threads. Should have some room for decent clocks.

Glo. · Nov 22, 2016

Arachnotronic said:
+40% over XV is likely closer to Sandy/Ivy-class than Haswell/Broadwell. The leaks we have seen of Summit Ridge point to performance/MHz lower than Haswell (I'm thinking of the AoTS leak in particular).

We'll only know the truth when these CPUs are run through a comprehensive test suite by the third party reviewers, though.

Based on Anandtech review of Steamroller, 4 core A10 7700K in desktop, real world jobs like WinRar, or Dolphin Benchmark is 40% behind Core i5 4690. A10 7700K has 3.5 GHz, i5 4690K has 3.4 GHz core clock.

And Zen is supposed to be 40% faster than XV, not Steamroller.

http://www.anandtech.com/show/9287/the-amd-a10-7700k-and-amd-a6-7400k-cpu-review/2
Nearly every real-world benchmark puts that particular Steamroller part around 40% behind similarly clocked Haswell part. Both CPUs are 4 core/4 thread.

Not that I would imply anything.

blublub · Nov 22, 2016

Glo. said:
Based on Anandtech review of Steamroller, 4 core A10 7700K in desktop, real world jobs like WinRar, or Dolphin Benchmark is 40% behind Core i5 4690. A10 7700K has 3.5 GHz, i5 4690K has 3.4 GHz core clock.

And Zen is supposed to be 40% faster than XV, not Steamroller.

http://www.anandtech.com/show/9287/the-amd-a10-7700k-and-amd-a6-7400k-cpu-review/2
Nearly every real-world benchmark puts that particular Steamroller part around 40% behind similarly clocked Haswell part. Both CPUs are 4 core/4 thread.

Not that I would imply anything.

So if AMD isn't BS us we can expect about Haswell performance in real world - makes sense

Arachnotronic · Nov 22, 2016

Glo. said:
Based on Anandtech review of Steamroller, 4 core A10 7700K in desktop, real world jobs like WinRar, or Dolphin Benchmark is 40% behind Core i5 4690. A10 7700K has 3.5 GHz, i5 4690K has 3.4 GHz core clock.

And Zen is supposed to be 40% faster than XV, not Steamroller.

http://www.anandtech.com/show/9287/the-amd-a10-7700k-and-amd-a6-7400k-cpu-review/2
Nearly every real-world benchmark puts that particular Steamroller part around 40% behind similarly clocked Haswell part. Both CPUs are 4 core/4 thread.

Not that I would imply anything.

Seems like you're trying to imply that Zen will be right up there with Skylake?

Glo. · Nov 22, 2016

Arachnotronic said:
Seems like you're trying to imply that Zen will be right up there with Skylake?

I have already said before. Core design gives no chance for Zen to be clock for clock on Skylake level for single threaded performance. However, Haswell/Broadwell - that is completely different story.

bjt2 · Nov 22, 2016

40% is a mean value... In blender is on par with Broadwell E (that have more L3 and more memory channels) and, while i can't find data on blender for excavator, i vaguely remember 80-90% advantage in cinebench of intel CPUs... If the advantage in blender is similar...

cdimauro · Nov 22, 2016

bjt2 said:
Well, on Cyryx CPU there were the possibility to fake the CPUID identification strings (amd maybe also the codes) and all INTEL compiled programs gained performance (even 50%+)... And if i am not wrong a virtualization software let do this for all CPUs and many tests were made.

I haven't said anything regarding this. It might be true.

INTEL was condemned for this. 1 billion dollars if I remember well. Plus some other things like relax x86 license for AMD (there was the obligation to own a fab).

No, that's what for the unfair competition in the 2002-2007 period, for abuse of dominant position.

The truth is that for INTEL CPUs (and as you pinpointed not all of them) the correct optimized path was used. For the other INTEL CPUs probabily only a path coherent with the feature set provided by CPUID was selected (this would have been the correct behaviour for all CPUs, even non INTEL, if there is not an optimized path). For non INTEL CPU the slowest path (486/x87 for older compilers, up to SSEn for newer compiler) was selected.

No. As I reported before, the code path selection follows an (Intel's) micro-architecture criterion. Otherwise a fallback/general code-path is used.

See the Agner's page, and below on the other comments.

The L1I? 64KB for Zen and 32KB for INTEL. The ways are higher on INTEL, so AMD Zen cache is not so effective, but it's bigger.

Intel's L1 cache is 32KB, 8-way, with a 64 byte line.
Zen's L1 cache is 64KB, 4-way, but we don't know how many bytes per line it holds (32?).

Well, if it fits in cache (uop or L1I or L2), you spare some cycles... Maybe after a big emulated instruction the uop cache is wiped out, but i doubt that a single emulated instruction has more than 32/64kb of code...

Of course not, but since you continually jump between different instructions, it's quite likely that a uop cache is frequently flushed and reloaded.

sirmo said:
I think he's referring to Intel cheating by having its compiler inject GenuineIntel authentication, and then taking the unoptimized path when Intel CPU wasn't detected.

http://www.agner.org/optimize/blog/read.php?i=49

Yes, he was referring to this case.

Thanks for the link, which proves what I've reported before.

Nothingness said:
You are wrong. This was more about instruction usage than micro-architecture tweaks.

from Agner's page, Intel developers:

"You mentioned we will not support future Intel processors with non-'6' family designations without a compiler update. Yes, that is correct and intentional. Our compiler produces code which we have high confidence will continue to run in the future. This has the effect of not assuming anything about future Intel or AMD or other processors."

and Agner's conclusion:

"In other words, they claim that they are optimizing for specific processor models rather than for specific instruction sets. If true, this gives Intel an argument for not supporting AMD processors properly. But it also means that all software developers who use an Intel compiler have to recompile their code and distribute new versions to their customers every time a new Intel processor appears on the market."

AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Member

Senior member

Senior member

Golden Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Senior member

Golden Member

Golden Member

Golden Member

Platinum Member

Golden Member

Senior member

Platinum Member

Lifer

Senior member

Platinum Member

Diamond Member

Member

Lifer

Diamond Member

Senior member

Member