AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

KTE · Dec 4, 2016

Averaging 4790K performance with the same core count at similar power would be well beyond my expectations, and make Zen a serious competitor.

Sent from HTC 10

LTC8K6 · Dec 4, 2016

KTE said:
Averaging 4790K performance with the same core count at similar power would be well beyond my expectations, and make Zen a serious competitor.

Sent from HTC 10

4C/8T Zen should be 65W vs 4790K at 88W.

Except that the competition is actually Kaby Lake.

If Zen is not quite up to Haswell, then it is probably Ivy Bridge level, and AMD is taking too long to release it, imo.

I think we are looking at Ivy Bridge vs Kaby Lake in terms of performance, but at similar tdp.

KTE · Dec 4, 2016

LTC8K6 said:
4C/8T Zen should be 65W vs 4790K at 88W.

Except that the competition is actually Kaby Lake.

If Zen is not quite up to Haswell, then it is probably Ivy Bridge level, and AMD is taking too long to release it, imo.

I think we are looking at Ivy Bridge vs Kaby Lake in terms of performance, but at similar tdp.

That really depends on the frequencies they require to be competitive.

In the past, it has been FAR easier to double lower clocked cores than increase clocks with half the core count at a given TDP.

This depends more on their process implementation.

The question is, what clocks does 4/8 Zen need to be competitive with 4790K 4-4.4GHz?

Sent from HTC 10

krumme · Dec 4, 2016

How difficult is it to design a good cpu at RTL level these days vs post RTL verification process?

Bd as a solution was bad on rtl level and implementation. Or implementation was near impossible due to wrong arch desicions. Perhaps those issues shadow that the basic problem that bd was trying to take Intel head on.

Intel have ressources to implement a RTL design in a quality nobody can approach. Fortunately the design of Zen seems so different with eg non avx2 that it might get a solid compettitive edge in some server segments and high end workstation laptops. Segments where all the smt capability will be usefull. And where there is profit to be made.

I am not an engineer but it doesnt take much knowledge to know AMD could easily use 2000 man years extra on implementation to reach higher freq at acceptable efficiency. Issues that hit highend desktop user especially.

KTE · Dec 4, 2016

krumme said:
How difficult is it to design a good cpu at RTL level these days vs post RTL verification process?

Bd as a solution was bad on rtl level and implementation. Or implementation was near impossible due to wrong arch desicions. Perhaps those issues shadow that the basic problem that bd was trying to take Intel head on.

Intel have ressources to implement a RTL design in a quality nobody can approach. Fortunately the design of Zen seems so different with eg non avx2 that it might get a solid compettitive edge in some server segments and high end workstation laptops. Segments where all the smt capability will be usefull. And where there is profit to be made.

I am not an engineer but it doesnt take much knowledge to know AMD could easily use 2000 man years extra on implementation to reach higher freq at acceptable efficiency. Issues that hit highend desktop user especially.

Extremely difficult when you have tight budgets that only allow the very basic of research, resources, manpower and testing.

It's all about the resources.

Sent from HTC 10

itsmydamnation · Dec 4, 2016

LTC8K6 said:
4C/8T Zen should be 65W vs 4790K at 88W.

Except that the competition is actually Kaby Lake.

If Zen is not quite up to Haswell, then it is probably Ivy Bridge level, and AMD is taking too long to release it, imo.

I think we are looking at Ivy Bridge vs Kaby Lake in terms of performance, but at similar tdp.

kaby lake so for all intents and purposes Haswell

Which 4C/8T the APU or a cutdown Zeppelin? The TDP can be whatever they wan't just make it configurable.

There is nothing architecturally that suggests IVB performance. It has broadwell/skylake level sized structures, it has a much improved cache system over CON cores ( that blender demo is one big cache test (you can tell because SMT gives such a big perf increase and its 128bit vectors)), it follows on from CAT/CON core predictors*. As i said before load store system is hardest to gauge but that was an area that saw continued improvement in steamroller and excavator and will likely continue to improve with Zen.**

I dont see why people keep saying IVB for Zen IPL, its wider, has more resources, will have faster memory, more modern predictors etc etc. I dot see AMD beating skylake for ILP but also dont see it being at IVB levels, there would have to be some big glass jaws in the architecture. With better cache and wider execution resources the two big ones are already gone, the uop cache helps with the 3rd (miss predict penalty).

*anger on AMD predictors

My tests indicate that complex repetitive patterns are predicted well after a certain learning period. There appears to be no sharp limit to the length of branch patterns that can be predicted, and even very long patterns can be predicted. There seems to be no loop counter, and nested loops are not predicted particularly well. Indirect branches are predicted well. The prediction success rate is somewhat higher in the Steamroller than in the previous models.

for Intel

The processor is able to predict very long repetitive jump patterns with few or no mispredictions. I found no specific limit to the length of jump patterns that could be predicted. Loops are successfully predicted up to a count of 32 or a little more. Nested loops and branches inside loops are predicted reasonably well.

So even though its soft words they are pretty equal and you would expect AMD to continue to improve them with Zen.

** from agners data they look pretty simlar with AMD having more edge cases the big thing that hurts AMD is when store forwarding fails its 25 cycles vs upto 11 for intel, which look very much like cache latency issues which should be much better for Zen compared to CON.

edit: repeat after me, AVX2 and AVX can both be 128 or 256bits! Zen fully supports AVX just like SR and EX. Zen even supports 256bit AVX,AVX2 and FMA (again different to both AVX and AVX2) as one uop even though it needs to be issues to the execution units twices (ones for lower half ones for upper half).

Zen is only 128bit wide datapaths and execution units not 256 bit like >Haswell, it has nothing to do with AVX2 specificly.

LTC8K6 · Dec 4, 2016

KTE said:
That really depends on the frequencies they require to be competitive.

In the past, it has been FAR easier to double lower clocked cores than increase clocks with half the core count at a given TDP.

This depends more on their process implementation.

The question is, what clocks does 4/8 Zen need to be competitive with 4790K 4-4.4GHz?

Sent from HTC 10

That there is a 4C/8T Zen at all to test at this point is surprising to me. I was expecting Zen to be 8C only for a little while, at least.
If there are already 4C Zen chips being tested, then I am puzzled by the lack of a release.

LTC8K6 · Dec 4, 2016

itsmydamnation said:
I dont see why people keep saying IVB for Zen IPL, its wider, has more resources, will have faster memory, more modern predictors etc etc. I dot see AMD beating skylake for ILP but also dont see it being at IVB levels, there would have to be some big glass jaws in the architecture. With better cache and wider execution resources the two big ones are already gone, the uop cache helps with the 3rd (miss predict penalty).

Well, the leak said close to Haswell, so I said Ivy Bridge since that's close to Haswell.
There was nothing more in it than that.

Actually, most people have been saying Sandy Bridge performance for Zen, based on 40% over Excavator, haven't they?

A switch to Ivy Bridge performance is a bump for Zen.

KTE · Dec 4, 2016

I am more interested in server benchmarks for Zen than anything else.

Virtualized cloud and data analytics are huge, and becoming even bigger now. Every corp is chasing after these right now.

Many more cloud datacentres are currently being invested into, for which, viable all-rounder CPUs will be needed.

AMD needs to pitch Zen strongly to the major OEMs who will be providing these server solutions en-masse to the datacentres.

And it's SPEC that they will look to first.

Sent from HTC 10

sirmo · Dec 4, 2016

KTE said:
Extremely difficult when you have tight budgets that only allow the very basic of research, resources, manpower and testing.

It's all about the resources.

Sent from HTC 10

Some of the best in the business worked on this thing (Jim Keller and the team he assembled, the guys who like a challenge). Also on the process side it's not just AMD who does high frequency research. Ton of semiconductor companies use these nodes as well. And a lot of them deal with high speed interconnect stuff and microwave. A lot of IBM research went into Global Fundries and Samsung Venture as well. These aren't small players.

I do think that Intel has an edge since they own their own fabs which allows them further process level customization for a given product. But I wouldn't count AMD out on this one.

I think Zen will likely have Haswell or better IPC.. in some cases better than Broadwell-E (workload dependent, like the blender demo showed). But it will probably not be able to reach the OC clocks we're getting on Intel side. It could however have more perf/watt on stock clocks. We shall see.. the day is approaching.

Dresdenboy · Dec 4, 2016

LTC8K6 said:
That there is a 4C/8T Zen at all to test at this point is surprising to me. I was expecting Zen to be 8C only for a little while, at least.
If there are already 4C Zen chips being tested, then I am puzzled by the lack of a release.

It seems, you missed the 4C8T ES on Zauba months ago.

krumme · Dec 4, 2016

Its pretty smart they can use same die for 32c 180w server and 8c 95w consumer, but its better for us enthusiast if its not the same dies qualifying for 32c servers and highest perf 8c desktops ...lol. Is the dies tilting to the"good side" different here or are we mostly out of luck ?

KTE · Dec 4, 2016

sirmo said:
Some of the best in the business worked on this thing (Jim Keller and the team he assembled, the guys who like a challenge). Also on the process side it's not just AMD who does high frequency research. Ton of semiconductor companies use these nodes as well. And a lot of them deal with high speed interconnect stuff and microwave. A lot of IBM research went into Global Fundries and Samsung Venture as well. These aren't small players.

I do think that Intel has an edge since they own their own fabs which allows them further process level customization for a given product. But I wouldn't count AMD out on this one.

I think Zen will likely have Haswell or better IPC.. in some cases better than Broadwell-E (workload dependent, like the blender demo showed). But it will probably not be able to reach the OC clocks we're getting on Intel side. It could however have more perf/watt on stock clocks. We shall see.. the day is approaching.

I work for one of the firms you mentioned. The reality in these corps is, decisions aren't made by superstar engineers but by their management and the execs. Most of whom be non-technical, businessy, snakeoil salesmen.

Excellent decisions worth an easy billions in profit get canned. Viable, cheaper solutions get overlooked. In-house excellency gets buried. Products far ahead of time don't even get a chance. All without explanation to the superstar engineers.

That's the reality of international corporations ESPECIALLY when things are not going well. Corporate politics is absolutely crazy.

Engineering/Science =! Business/Politics.

Trust me, it's not the engineering I doubt.

Sent from HTC 10

F-Rex · Dec 4, 2016

KTE said:
I work for one of the firms you mentioned. The reality in these corps is, decisions aren't made by superstar engineers but by their management and the execs. Most of whom be non-technical, businessy, snakeoil salesmen.

Excellent decisions worth an easy billions in profit get canned. Viable, cheaper solutions get overlooked. In-house excellency gets buried. Products far ahead of time don't even get a chance. All without explanation to the superstar engineers.

That's the reality of international corporations ESPECIALLY when things are not going well. Corporate politics is absolutely crazy.

Engineering/Science =! Business/Politics.

Trust me, it's not the engineering I doubt.

Sent from HTC 10

Let me remind you this quote
“It is the first time in a very long time that we engineers have been given the total freedom to build a processor from scratch and do the best we can do,” said Suzanne Plummer, a director of design engineering at AMD and also a veteran Austin chip engineer, who heads development of a “Zen”-based processor, in an interview with MyStatesman.

When you give total freedom to a team of experiences engineers, the result can't disappoint you.

I'm confident for zen because if it was a bulldozer cpu all over again she would not have said that.

Fanatical Meat · Dec 4, 2016

^^I want to agree but I've learned that no matter how remote of a chance failure is, AMD will find a way.
I want Zen to be great, I want to see Intel be less lazy again. All I can do is wait and hope.

bjt2 · Dec 4, 2016

KTE said:
Intel's process is typically FAR better than whatever AMD uses for the respective pMOS/nMOS values...

Did Intel just drastically change their architecture, hint much wider and more powerful?

Do you think IPC increases + SMT are free?

That happened from Core to Nehalem. Remember that?

Do you realize same chip, one running default and one tuned to extract higher IPC on a specific code = latter burns more power?

So you've designed nanotech ICs and you say there is no scientific chance possible that can make Zen clock lower or equal to Excavator at 95W.

Strong statements.

So if Zen turns out to clock lower than Exc at 95W, you seriously have no clue about processor design or nanotechnology.

I've already explained to you your compares and deductions are logically invalid but you just keep repeating the same words. You are missing most of the crucial factors to create crazy hype around Zen.

Your 4.0G/4.3Hz 8/16 belief is... we'll see by how much you're off

Sent from HTC 10

I have designed CMOS inverter, with calculation of optimal W and L to minimize delay, given the load and also other circuits... These were exercises... Also PSPICE simulations...

Zen has similar pipeline stages than BD, probabily same or lower FO4. They simplified the INT scheduler: from 4 queue switch to 6 single queues. Why they did that? They did that and the FO4 is increased? Are they crazy? I don't think so... If the FO4 is the same or similar, the FMAX should be the same on 28nm. But the 14nm has higher transconductance, lower leakage and parasitic capacitance. And AMD declared same consumption than excavator core at same frequency. Given the known data i think that it is very plausible. So FMAX in the same ballpark than XV (4.3GHz). You don't buy this only because INTEL does not reach these clocks with 8 core... But AMD managed to have a 4.1/4.3 core on the 28nm BULK in 95W... I have an i7 3820 (3.6/3.9) on my pc at work, with 130W TDP. On 32nm BUKLK INTEL couldn't reach even 4GHz in turbo... Why? 28nm TSMC is better than INTEL 32nm HKMG? No. Excavator has lower FO4. And also Zen. If you don't get this, you will not ever agree with me...

8/16 probabily only FMAX is on the 4-4.3GHz... The FX 8370E on 32nm SOI, is 95W and is 3.3/4.3 turbo max. I expect at least this for the 8c Zen (3.3 base and 4.3 turbo), maybe higher base and slightly higher turbomax. We are talking of 2 full nodes... I think that the FMAX could be higer than 4.3... For the base frequency, it depends on the energy saving techniques. 14nm FF has very low leakage. It's estimated than in each instant, 85-90% of the transistors are off and drain only leakage. If the leakage is low, the power budget can be given to raise the base clock... I think that at least 3.5/4.5 for the 95W and 4.0/4.6 for the 125W, if it will be ever produced...

bjt2 · Dec 4, 2016

itsmydamnation said:
kaby lake so for all intents and purposes Haswell

Which 4C/8T the APU or a cutdown Zeppelin? The TDP can be whatever they wan't just make it configurable.

There is nothing architecturally that suggests IVB performance. It has broadwell/skylake level sized structures, it has a much improved cache system over CON cores ( that blender demo is one big cache test (you can tell because SMT gives such a big perf increase and its 128bit vectors)), it follows on from CAT/CON core predictors*. As i said before load store system is hardest to gauge but that was an area that saw continued improvement in steamroller and excavator and will likely continue to improve with Zen.**

I dont see why people keep saying IVB for Zen IPL, its wider, has more resources, will have faster memory, more modern predictors etc etc. I dot see AMD beating skylake for ILP but also dont see it being at IVB levels, there would have to be some big glass jaws in the architecture. With better cache and wider execution resources the two big ones are already gone, the uop cache helps with the 3rd (miss predict penalty).

*anger on AMD predictors

for Intel

So even though its soft words they are pretty equal and you would expect AMD to continue to improve them with Zen.

** from agners data they look pretty simlar with AMD having more edge cases the big thing that hurts AMD is when store forwarding fails its 25 cycles vs upto 11 for intel, which look very much like cache latency issues which should be much better for Zen compared to CON.

edit: repeat after me, AVX2 and AVX can both be 128 or 256bits! Zen fully supports AVX just like SR and EX. Zen even supports 256bit AVX,AVX2 and FMA (again different to both AVX and AVX2) as one uop even though it needs to be issues to the execution units twices (ones for lower half ones for upper half).

Zen is only 128bit wide datapaths and execution units not 256 bit like >Haswell, it has nothing to do with AVX2 specificly.

Zen can decode 4, issue 6, dispatch 10 and retire 8 uops... It's better than IVB, HSW, BDW and on par or slightly better with skylake (except decode)... The only unknown are cache and branch prediction efficiency...
The shared int and fp pipeline on INTEL are far worse than the separate int and fp scheduler. INTEL perform well only because has many memory ports, the FP units are 256 bit and can do up to 2x256 bit FMAC. On 128 bit code without FMACs Zen has more potential throughput...

Arachnotronic · Dec 4, 2016

bjt2 said:
The shared int and fp pipeline on INTEL are far worse than the separate int and fp scheduler.

Proof?

INTEL perform well only because has many memory ports, the FP units are 256 bit and can do up to 2x256 bit FMAC. On 128 bit code without FMACs Zen has more potential throughput...

Yeah, that's the ONLY reason Intel CPUs perform well. Yeesh.

cdimauro · Dec 4, 2016

bjt2 said:
Zen can decode 4, issue 6, dispatch 10 and retire 8 uops... It's better than IVB, HSW, BDW and on par or slightly better with skylake (except decode)... The only unknown are cache and branch prediction efficiency...
The shared int and fp pipeline on INTEL are far worse than the separate int and fp scheduler. INTEL perform well only because has many memory ports, the FP units are 256 bit and can do up to 2x256 bit FMAC. On 128 bit code without FMACs Zen has more potential throughput...

Yes, it's so worse that in the Zen's best case, with Blender / SSE / 128 bit, Intel takes only 2% more time.

itsmydamnation · Dec 4, 2016

cdimauro said:
Yes, it's so worse that in the Zen's best case, with Blender / SSE / 128 bit, Intel takes only 2% more time.

And you would expect both cores to be load store limited in that workload, which is why i kind think all these port counting things get a little silly.

bjt2 · Dec 4, 2016

Arachnotronic said:
Proof?

Yeah, that's the ONLY reason Intel CPUs perform well. Yeesh.

Queue theory... Also studied at college... But also intuitively... 8 pipelines/servants are better than 4... No way that an unified scheduler can make up for double the servants...

bjt2 · Dec 4, 2016

cdimauro said:
Yes, it's so worse that in the Zen's best case, with Blender / SSE / 128 bit, Intel takes only 2% more time.

There are other things that are better in INTEL, that we don't know, evidently...

cdimauro · Dec 4, 2016

itsmydamnation said:
And you would expect both cores to be load store limited in that workload, which is why i kind think all these port counting things get a little silly.

Well, the scene used by AMD for the Blender test doesn't seem to be so complicated to stress a lot the L/S unit. On the contrary: it's quite simple.

But unfortunately AMD haven't released the data file, so that some runs can be made with a profiler to extract such data.

bjt2 said:
There are other things that are better in INTEL, that we don't know, evidently...

Well, the processors are there with their optimization manuals. I don't know what kind of "secret sauce" is still missing...

DrMrLordX · Dec 4, 2016

Fanatical Meat said:
I want to see Intel be less lazy again.

Intel isn't being lazy. They're directing resources to other markets and throwing bones to the "enthusiast" PC crowd. I know what you mean, but let's not think that they're merely slothful.

Things aren't the way they were in the Netburst days, nosiree Bob.

KTE · Dec 4, 2016

bjt2 said:
Zen can decode 4, issue 6, dispatch 10 and retire 8 uops... It's better than IVB, HSW, BDW and on par or slightly better with skylake (except decode)... The only unknown are cache and branch prediction efficiency...
The shared int and fp pipeline on INTEL are far worse than the separate int and fp scheduler. INTEL perform well only because has many memory ports, the FP units are 256 bit and can do up to 2x256 bit FMAC. On 128 bit code without FMACs Zen has more potential throughput...

In science, hypothesis can be pretty absurd without empirical test results.

Meaning, we will have to test the chip TO THEN see if it has any weak points, Achilles Heel, glass jaws. In the past 11 years, on paper, things have nearly always looked FAR better than they turned out in reality.

Only one major weakness is needed to make this chip a failure. That doesn't have to be uarch. Cache and branch prediction are make or break but it can also be the power, clocks or yields.

Sent from HTC 10

AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Senior member

Lifer

Senior member

Diamond Member

Senior member

Platinum Member

Lifer

Lifer

Senior member

Golden Member

Golden Member

Diamond Member

Senior member

Junior Member

Lifer

Senior member

Senior member

Lifer

Member

Platinum Member

Senior member

Senior member

Member

Lifer

Senior member