AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

bjt2 · Nov 22, 2016

Glo. said:
I have already said before. Core design gives no chance for Zen to be clock for clock on Skylake level for single threaded performance. However, Haswell/Broadwell - that is completely different story.

Why? How you can say that?
Zen can decode 4 instructions or give 6 uops from uop cache. Skylake 4 fused (so up to 8). For ST loads this is not a bottleneck.
Zen can execute 4 int + 2 mem + 4 fp. Skylake 4 between int and FP and 3-4 mem, but in ST this is more than enough
Zen can retire 8 instructions. Also SKL... More than enough...

SKL throughput is higher only for 256 bit FMAC and on par for 256 bit FMUL or FADD. On 128/80 bit tasks is lower.

There is no bottleneck for single threaded tasks. In heavy FP tasks INTEL has to share ports with integer instructions, but in ST tasks this is simple since IPC of real application is below 3... So no problem for both.
The only problem could be for Zen on 256 bit tasks, since INTEL has more memory BW and resources... But if the 256 bit task requires many data, the L2, L3 and mem BW should be similar...
There is a clue on L3 BW though. I ever thought that L3 cache in INTEL went with core clock, but on an overclocker forum I discovered that L3 clock is lower, about 2.4GHz... Is it true?
On Zen hot chips presentation, the L3 BW was declared 5x than on bulldozer. 4x is for the bus and 25% is for the clock, I think. On BD the L3 goes at 2,4GHz. +25% = 3GHz. So it seems that the L3 goes at core clock on Zen...

So the only unknown is cache and branch prediction efficiency. We have no clue to say who it's better.

Doom2pro · Nov 22, 2016

Why does everybody forget that AMD explicitly stated that the 40% figure was independent of the process node... Excavator uses DDR3, factor in the faster memory, the new 14nm node, and some of the other cache advantages that Excavator didn't have (Like L3) and low latency higher bandwidth that cannot be attributed to contributing towards the infamous 40% figure.

What does all of that plus the 40% figure add up to? That is the question.

LTC8K6 · Nov 22, 2016

Glo. said:
Based on Anandtech review of Steamroller, 4 core A10 7700K in desktop, real world jobs like WinRar, or Dolphin Benchmark is 40% behind Core i5 4690. A10 7700K has 3.5 GHz, i5 4690K has 3.4 GHz core clock.

And Zen is supposed to be 40% faster than XV, not Steamroller.

http://www.anandtech.com/show/9287/the-amd-a10-7700k-and-amd-a6-7400k-cpu-review/2
Nearly every real-world benchmark puts that particular Steamroller part around 40% behind similarly clocked Haswell part. Both CPUs are 4 core/4 thread.

Not that I would imply anything.

In Dolphin the 4690 is at 7.62
If you multiply that by 1.4 you get a score of 10.6
The 7700K has a score of 14.13
Is that 40% slower?

For WinRar it is 78.29 for the 4690
X1.4 = 109.6
7700K score is 140.81

I'm not finding 40%, but a lot more?

What am I doing wrong?

http://images.anandtech.com/graphs/graph10543/83041.png

Here in Dolphin, A10-7890K at 4.1/4.3Ghz is way more than 40% slower than i5-6600 at 3.3/3.9?
It's way slower than i3-6100, in fact?

Here is Winrar from the same review:

http://images.anandtech.com/graphs/graph10543/83042.png

FX-4350 appears to be about 40% faster than 7890K here.

I'm not sure how useful these numbers are, though?

bjt2 · Nov 22, 2016

cdimauro said:
I haven't said anything regarding this. It might be true.

I have no direct experience. I read it on the internet (C)

cdimauro said:
No, that's what for the unfair competition in the 2002-2007 period, for abuse of dominant position.

Ok, i did't remember well.

cdimauro said:
No. As I reported before, the code path selection follows an (Intel's) micro-architecture criterion. Otherwise a fallback/general code-path is used.

See the Agner's page, and below on the other comments.

Yes, moreless as I said. The fallback path was different between versions, from 486/x87 of older compilers, up to SSEn (4.2?) of latest compilers...

cdimauro said:
Intel's L1 cache is 32KB, 8-way, with a 64 byte line.
Zen's L1 cache is 64KB, 4-way, but we don't know how many bytes per line it holds (32?).

I think that is 64 byte. It was ever 64 byte from eons... And with DDR4 and 8 beat 32 bytes would be awkward...

cdimauro said:
Of course not, but since you continually jump between different instructions, it's quite likely that a uop cache is frequently flushed and reloaded.

But not L1I. Better than nothing...

Glo. · Nov 22, 2016

bjt2 said:
Why? How you can say that?
Zen can decode 4 instructions or give 6 uops from uop cache. Skylake 4 fused (so up to 8). For ST loads this is not a bottleneck.
Zen can execute 4 int + 2 mem + 4 fp. Skylake 4 between int and FP and 3-4 mem, but in ST this is more than enough
Zen can retire 8 instructions. Also SKL... More than enough...

Its all about FP and Integer resources. Registers are smaller in Zen than they are in Skylake. What is funny on this occasion, is that they are precisely on Broadwell level.

cdimauro · Nov 22, 2016

@bjt2: unfortunately even L1I cache can be not enough. -_-

Glo. · Nov 22, 2016

LTC8K6 said:
In Dolphin the 4690 is at 7.62
If you multiply that by 1.4 you get a score of 10.6
The 7700K has a score of 14.13
Is that 40% slower?

For WinRar it is 78.29 for the 4690
X1.4 = 109.6
7700K score is 140.81

I'm not finding 40%, but a lot more?

What am I doing wrong?

http://images.anandtech.com/graphs/graph10543/83041.png

Here in Dolphin, A10-7890K at 4.1/4.3Ghz is way more than 40% slower than i5-6600 at 3.3/3.9?
It's way slower than i3-6100, in fact?

Here is Winrar from the same review:

http://images.anandtech.com/graphs/graph10543/83042.png

FX-4350 appears to be about 40% faster than 7890K here.

I'm not sure how useful these numbers are, though?

You are using the the scores for i5 4690 as baseline. You have to use A10- 7700K numbers as baseline, make them better by 40%, and compare to Haswell Scores.

The difference this way is around 40% increase in performance.

bjt2 · Nov 22, 2016

Glo. said:
Its all about FP and Integer resources. Registers are smaller in Zen than they are in Skylake. What is funny on this occasion, is that they are precisely on Broadwell level.

For 128 bit code, Zen has more FP resources than SKL. Only on 256 bit code and with 256 bit FMAC the FP throughput is superior. With FADD and FMUL is the same. On 128 bit Zen is always superior.
And AMD has not shared ports.
Anyway your case of single thread can not saturate the resources in both CPUs. All depends on cache and predictors efficiency...

bjt2 · Nov 22, 2016

cdimauro said:
@bjt2: unfortunately even L1I cache can be not enough. -_-

I can imagine an huge pyton instruction... Moreless like java virtual calls that also are huge... I hope that they are not so common. But if not this is anyway a corner case. I avoid using pyton also for your articles on appuntidigitali...

cdimauro · Nov 22, 2016

An Intel processor can have some advantage on 128-bit code too, when using FMACs.

LTC8K6 · Nov 22, 2016

Glo. said:
You are using the the scores for i5 4690 as baseline. You have to use A10- 7700K numbers as baseline, make them better by 40%, and compare to Haswell Scores.

The difference this way is around 40% increase in performance.

Well, you said 40% behind 4690, which implies it would take 40% longer to complete the tasks.

14.13 x .4 = 5.65
14.13-5.65=8.48
4690 is at 7.62

Still doesn't work.

4690 is better than 45% faster that way.

That way actually has 7890K about 40% slower than i5-6600.
So that would mean Zen is close to Skylake in these benches.

Congrats to AMD.

bjt2 · Nov 22, 2016

cdimauro said:
An Intel processor can have some advantage on 128-bit code too, when using FMACs.

It can do 2 FMACs@128 bit. So do Zen (2 FMUL + 2 FADD). The only advantage is if there is also vec int code with fp code, since the vecint ports could be free... But the FP/vecint ports are shared with int ports... So if there is a cmp, jmp, loop instruction, it could conflict with the vecint or fp instructions...

sirmo · Nov 22, 2016

bjt2 said:
I can imagine an huge pyton instruction... Moreless like java virtual calls that also are huge... I hope that they are not so common. But if not this is anyway a corner case. I avoid using pyton also for your articles on appuntidigitali...

Python is glue. Even when you use Python most heavy lifting is done by low level C/C++ libraries and the OS components themselves, in majority of apps where Python is applied. People who do heavy computational tasks with python leverage libraries like numpy, scipy or similar. For instance if you parse large blobs of text in python you might use the re (regular expressions) module which is written in C. You compile the regular expression once, and you pass it the blob to parse.. that work isn't done in python.

In a typical web app scenario which results in say a database query.. most of that heavy lifting is done by the database server, whether it be an RDBMS or some key value store.

The point I am making is right tool for the job is important. And Python is a very powerful tool. It is a rapid development scripting language with some of the best readability around, which makes it a great language for collaboration and quick development. That combined with a mature ecosystem makes it a right choice for a whole host of applications.

There are cases where it doesn't make sense like implementation of new performance sensitive algorithms, but most people don't come up with new algorithms all the time. Chances are a library for what you're trying to accomplish already exists.

cdimauro · Nov 22, 2016

bjt2 said:
It can do 2 FMACs@128 bit. So do Zen (2 FMUL + 2 FADD). The only advantage is if there is also vec int code with fp code, since the vecint ports could be free... But the FP/vecint ports are shared with int ports... So if there is a cmp, jmp, loop instruction, it could conflict with the vecint or fp instructions...

There's also one free "int" port, which can also handle branches.

sirmo said:
Python is glue. Even when you use Python most heavy lifting is done by low level C/C++ libraries and the OS components themselves, in majority of apps where Python is applied. People who do computational tasks with python leverage libraries like numpy, scipy or similar. For instance if you parse large blobs of text in python you might use the re (regular expressions) module which is written in C. You compile the regular expression once, and you pass it the blob to parse.. that work isn't done in python.

But the rest is, and it's also important.

For example, once you have parsed a large blob of text using the re or csv module, you want to manipulate the result, and for this Python code / CPython's VM is used... which isn't known to be fast.

In a typical web app scenario which results in say a database query.. most of that heavy lifting is done by the database server, whether it be an RDBMS or some key value store.

Absolutely. This is usually an I/O-bound kind of code, where Python's slowness isn't the real bottleneck.

The point I am making is right tool for the job is important. And Python is a very powerful tool. It is a rapid development scripting language with some of the best readability around, which makes it a great language for collaboration and quick development.

I fully agree!

antihelten · Nov 22, 2016

LTC8K6 said:
Well, you said 40% behind 4690, which implies it would take 40% longer to complete the tasks.

No it doesn't. 40% slower (or 40% behind if you want) means that it takes 67% longer to complete the task (1 / (1 - 0.4) = 1.67).

It taking 40% longer would be equal to being 29% slower, not 40%.

TheELF · Nov 22, 2016

cdimauro said:
No, that's what for the unfair competition in the 2002-2007 period, for abuse of dominant position.

http://www.amd.com/Documents/AMD_Intel_Settlement_Agreement_-_Full.pdf
Actually it was for both and a lot more issues,like AMD breaching contract by selling their manufacturing operations and creating glo.fu,it never got judged by court but was settled between them.
(otherwise even if intel would have lost the lawsuits against them AMD still would be in danger of loosing the rights to produce x86 altogether
"3. GF Dispute . On or about March 9, 2009, AMD closed a transaction with Advanced Technology Investment Company “ATIC” through which, among other actions, ATIC and AMD created a venture called GLOBALFOUNDRIES (“ GF ”), transferred AMD’s wafer manufacturing operations to GF, and claimed that GF was a subsidiary as defined under a January 1, 2001 Intel/AMD patent cross-license, and as such, entitled to rights thereunder. Intel claims that various aspects of this transaction have breached the Intel/AMD patent cross-license. Intel also has advised AMD and GF that by using, manufacturing, selling, offering to sell and/or importing products utilizing Intel’s patented inventions without a license, AMD and GF are each infringing certain of Intel’s patents. In response, AMD has accused Intel of breaching that patent cross-license."
)
Good guy Intel keeping AMD from disappearing since 2009 at great monetary cost.
(Of course so they won't have to find out what would happen if monopoly laws would come into place)

cdimauro · Nov 22, 2016

@TheELF: there's no Intel's compiler reference in the PDF.

sirmo · Nov 22, 2016

TheELF said:
http://www.amd.com/Documents/AMD_Intel_Settlement_Agreement_-_Full.pdf
Actually it was for both and a lot more issues,like AMD breaching contract by selling their manufacturing operations and creating glo.fu,it never got judged by court but was settled between them.
(otherwise even if intel would have lost the lawsuits against them AMD still would be in danger of loosing the rights to produce x86 altogether
"3. GF Dispute . On or about March 9, 2009, AMD closed a transaction with Advanced Technology Investment Company “ATIC” through which, among other actions, ATIC and AMD created a venture called GLOBALFOUNDRIES (“ GF ”), transferred AMD’s wafer manufacturing operations to GF, and claimed that GF was a subsidiary as defined under a January 1, 2001 Intel/AMD patent cross-license, and as such, entitled to rights thereunder. Intel claims that various aspects of this transaction have breached the Intel/AMD patent cross-license. Intel also has advised AMD and GF that by using, manufacturing, selling, offering to sell and/or importing products utilizing Intel’s patented inventions without a license, AMD and GF are each infringing certain of Intel’s patents. In response, AMD has accused Intel of breaching that patent cross-license."
)
Good guy Intel keeping AMD from disappearing since 2009 at great monetary cost.
(Of course so they won't have to find out what would happen if monopoly laws would come into place)

You can't be serious. Intel was caught red handed and had to pay $1.3B in settlement fees.. fallout with which they are still dealing with in a similar case in the EU. The new cross licensing agreement between Intel and AMD shows why they both needed each other. AMD couldn't make x86 chips without Intel, and Intel couldn't make 64bit x86 chips without AMD (Intel licensed x86_64 from AMD).

This is also part of the reason Intel can't go after the Thatic JV.

Intel bullied their way to block AMD from selling product when AMD was on top. That's the important part, and as such they cannot be called a good guy. And it's why I'll always purchase AMD when it makes sense.

Glo. · Nov 22, 2016

LTC8K6 said:
Well, you said 40% behind 4690, which implies it would take 40% longer to complete the tasks.

14.13 x .4 = 5.65
14.13-5.65=8.48
4690 is at 7.62

Still doesn't work.

4690 is better than 45% faster that way.

That way actually has 7890K about 40% slower than i5-6600.
So that would mean Zen is close to Skylake in these benches.

Congrats to AMD.

Does it really matter?

The point is: 40% increase over Excavator puts the performance of Zen clock for clock on Haswell/Broadwell Level.

Lets put the dreams that it will match Skylake in single threaded performance, to the rest.

sirmo · Nov 22, 2016

Glo. said:
Does it really matter?

The point is: 40% increase over Excavator puts the performance of Zen clock for clock on Haswell/Broadwell Level.

Lets put the dreams that it will match Skylake in single threaded performance, to the rest.

All I want is the Haswell level of performance. That alone would be a huge leap forward. For a brand new from scratch, tic and toc combined it would be quite a feat. It's easier to improve on a new clean design from that point on.

AtenRa · Nov 22, 2016

Where did you see ST performance in WinRar and Dolphin benchmarks on the Anandtech review ??

Glo. · Nov 22, 2016

AtenRa said:
Where did you see ST performance in WinRar and Dolphin benchmarks on the Anandtech review ??

In the very first page that pops up after clicking the review of A10 7700K.

http://www.anandtech.com/show/9287/the-amd-a10-7700k-and-amd-a6-7400k-cpu-review/2
For reference.
A10-7700K Quad core/4 thread CPU design.
i5-4690 4C/4T design.

To get to i5-4690 levels you need around 45% increase. And that is for Steamroller arch.

AtenRa · Nov 22, 2016

Glo. said:
In the very first page that pops up after clicking the review of A10 7700K.

http://www.anandtech.com/show/9287/the-amd-a10-7700k-and-amd-a6-7400k-cpu-review/2
For reference.
A10-7700K Quad core/4 thread CPU design.
i5-4690 4C/4T design.

To get to i5-4690 levels you need around 45% increase. And that is for Steamroller arch.

Those are not SINGLE THREAD PERFORMANCE , those are 4 Threads CMT/SMT or 4 Cores
throughput.

Glo. · Nov 22, 2016

TheELF · Nov 22, 2016

cdimauro said:
@TheELF: there's no Intel's compiler reference in the PDF.

Yeah that would be way too clear for a legal document.

2.3 TECHNICAL PRACTICES Intel shall not include any Artificial Performance Impairment in any Intel product or require any Third Party to include an Artificial Performance Impairment in the Third Party’s product. As used in this Section 2.3, “ Artificial Performance Impairment ” means an affirmative engineering or design action by Intel (but not a failure to act) that (i) degrades the performance or operation of a Specified AMD product, (ii) is not a consequence of an Intel Product Benefit and (iii) is made intentionally to degrade the performance or operation of a Specified AMD Product. For purposes of this Section 2.3, “ Product Benefit ” shall mean any benefit, advantage, or improvement in terms of performance, operation, price, cost, manufacturability, reliability, compatibility, or ability to operate or enhance the operation of another product. In no circumstances shall this Section 2.3 impose or be construed to impose any obligation on Intel to (i) take any act that would provide a Product Benefit to any AMD or other non-Intel product, either when such AMD or non-Intel product is used alone or in combination with any other product, (ii) optimize any products for Specified AMD Products, or (iii) provide any technical information, documents, or know how to AMD

AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Senior member

Senior member

Lifer

Senior member

Diamond Member

Member

Diamond Member

Senior member

Senior member

Member

Lifer

Senior member

Golden Member

Member

Golden Member

Diamond Member

Member

Golden Member

Diamond Member

Golden Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member