AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Page 22 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.
May 11, 2008
20,117
1,307
126
That assumption is incorrect.
AMD has stated 40% IPC improvement over a Excavator core from the day one, and according to AMD a single module is two cores on every 15h design.

With that 40 % improvement, does AMD mean integer performance or floating point performance ?
Because with multithreaded software relying heavy on the floating point unit, would it not be easy when compared to the current CMT design with only one FPU per module/2 cores?
 

inf64

Diamond Member
Mar 11, 2011
3,777
4,251
136
With that 40 % improvement, does AMD mean integer performance or floating point performance ?
Because with multithreaded software relying heavy on the floating point unit, would it not be easy when compared to the current CMT design with only one FPU per module/2 cores?

Zen has 2x ALU and 2x FP resources per core Vs Excavator design. This should mean that one Zen core running 2 threads on FP co processor should roughly match one whole EX module running 2 threads on its own FP co processor. There is also a factor of lower FP instruction latencies that Zen supposedly has Vs EX so this also should play a big role in fp performance of Zen.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
With that 40 % improvement, does AMD mean integer performance or floating point performance ?
Because with multithreaded software relying heavy on the floating point unit, would it not be easy when compared to the current CMT design with only one FPU per module/2 cores?

In the past (with PD, SR, XV) the quoted IPC improvement has been very close to the average figure, which naturally includes both integer and floating point workloads. For Steamroller the official average figure was 10% and for Excavator 5%. Both of them can achieve higher and lower numbers depending on the workload, but these figures are extremely close to the average.
 

Drazick

Member
May 27, 2009
53
70
91
No it doesn't, only a handful of consumer apps benefit from quad channel memory. I doubt they're even in the double digits (the more popular ones) & even then high speed DDR4 in dual channel is much preferred because of stability & the fact that there aren't too many CPU's which can handle 4GHz quad channel memory, or make better use of it.

I'm not so sure about your statement.
I have no experience in Games but in the Signal / Image Processing a lot of tasks (For instance, Matrix / Image Convolution, etc...) are Memory Bounded.
Quad memory channel assists a lot for that (Actually, any task GPU is good at and you want the CPU to do, Quad Channel will benefit).
 
Reactions: Sweepr

inf64

Diamond Member
Mar 11, 2011
3,777
4,251
136
I think that quad channel on 8C intel desktop parts is a massive overkill in like 95% consumer workloads (counting in the good threaded ones).
 

KTE

Senior member
May 26, 2016
478
130
76
In the past (with PD, SR, XV) the quoted IPC improvement has been very close to the average figure, which naturally includes both integer and floating point workloads. For Steamroller the official average figure was 10% and for Excavator 5%. Both of them can achieve higher and lower numbers depending on the workload, but these figures are extremely close to the average.
And for Phenom? For Bulldozer?

They're the new uarchs of relevance.

Sent from HTC 10
(Opinions are own)
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
And for Phenom? For Bulldozer?

They're the new uarchs of relevance.

Sent from HTC 10
(Opinions are own)

No idea.
What difference does it make to the way AMD evaluates the IPC if it is a brand new µarch or a evolution version of existing one?
 

KTE

Senior member
May 26, 2016
478
130
76
No idea.
What difference does it make to the way AMD evaluates the IPC if it is a brand new µarch or a evolution version of existing one?
Performance compares for AMD with ANY major uarch+process change have always turned out fallacious. Just when they've been boasting about it for 5 years in the making and just when it counted most. Just when the change was so drastic. From memory:

Bulldozer - they said same IPC but much higher frequencies.

Neither was true. And Thuban was far faster, IPC wise.

Barcelona - they compared a simulated 2.6GHz model in SPEC rate and said AMD is 42% faster than the fastest Intel Xeon.

9 months on they could only launch with 2GHz... while Clovertown 3.16GHz 45nm was now shipping, along with Tigerton. And Penryn slapped Agena around in every department. Don't even mention the TLB bug, dead clocking or the paper launch.

Oh, that 40% figure again

Sent from HTC 10
(Opinions are own)
 

sirmo

Golden Member
Oct 10, 2011
1,014
391
136
Bulldozer - they said same IPC but much higher frequencies.
They never said anything about IPC when it came to bulldozer, in fact they were point blank asked multiple times and have always avoided to answer that question. We all just assumed it was going to be better than previous gen (Phenom II), because that's been their historical norm.

I even had personal sparring matches with JF-AMD from AMD about it, and he danced around the issue never answering it.

I remember being distinctly shocked about the IPC regression when the benchmarks came out, and I followed the Bulldozer rumor mill closer than I follow Zen.

I didn't expect Bulldozer to have a substantial IPC gain but I was surprised it was actually worse than Phenom II.
 
Last edited:
May 11, 2008
20,117
1,307
126
Zen has 2x ALU and 2x FP resources per core Vs Excavator design. This should mean that one Zen core running 2 threads on FP co processor should roughly match one whole EX module running 2 threads on its own FP co processor. There is also a factor of lower FP instruction latencies that Zen supposedly has Vs EX so this also should play a big role in fp performance of Zen.

Do you not mean module for the excavator ? Was the module not always counted as 2 cores by AMD ?
That is what makes it confusing. Because when taking 2 zen cores in comparison to one excavator module, then zen has double the floating point performance next to all the other improvements. Ignoring the SMT capability from zen for a moment.



In the past (with PD, SR, XV) the quoted IPC improvement has been very close to the average figure, which naturally includes both integer and floating point workloads. For Steamroller the official average figure was 10% and for Excavator 5%. Both of them can achieve higher and lower numbers depending on the workload, but these figures are extremely close to the average.

I see. Read your post again. It is mentioned per core comparison.



Very interesting.

 

dogen1

Senior member
Oct 14, 2014
739
40
91
With that 40 % improvement, does AMD mean integer performance or floating point performance ?
Because with multithreaded software relying heavy on the floating point unit, would it not be easy when compared to the current CMT design with only one FPU per module/2 cores?

The 40% is probably some kind of average. 40% better floating point would be a disappointment.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Still putting my marker on under Haswell IPC on average. Perf/W can't be too bad if OEMs are investing in the platform. Biggest question for me as an enthusiast is how much headroom.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Performance compares for AMD with ANY major uarch+process change have always turned out fallacious. Just when they've been boasting about it for 5 years in the making and just when it counted most. Just when the change was so drastic. From memory:

Bulldozer - they said same IPC but much higher frequencies.

Neither was true. And Thuban was far faster, IPC wise.

Barcelona - they compared a simulated 2.6GHz model in SPEC rate and said AMD is 42% faster than the fastest Intel Xeon.

9 months on they could only launch with 2GHz... while Clovertown 3.16GHz 45nm was now shipping, along with Tigerton. And Penryn slapped Agena around in every department. Don't even mention the TLB bug, dead clocking or the paper launch.

Oh, that 40% figure again.
Let's instead try an objective analysis, while there are still some facts missing for a final result.

With "ANY" do you also mean 486->K5, K5->K6->K6-2->K6-3, K6-3->K7, etc.?

Regarding BD we have "viral" information spread by JF-AMD, that he heard from engineers, who probably run recompiled SPEC code, that IPC would be higher (vs. core or module, which has been called "core" by the engineers -> see related law suit). JF also said, that perf. was projected to be "up to" 50% higher (which was "up to" 35% when Seifert gave that number later). In this case, with "up to" there are always options to find such code.

AMD officially (not in forums by a non-engineer) said about IPC:
"without significant loss on serial single-threaded workload components", which could mean IPC(BD) < IPC(K10) by 0 to 5% on average.



Regarding Barcelona:
This is probably a plans vs. process reality thing. When they looked at results at simulated or even "OC'ed" 2.6GHz (ES could probably run that fast, just at too high power/voltage for short times), there were 2.66GHz Clovertowns reviewed just weeks before Allen's interview. Aside from that model, the next slower one had 1.86GHz only. What did they use?
http://ark.intel.com/de/products/codename/23349/Clovertown#@All

Do you know, which frequency the K10 and Clovertown used for the comparison had? Finally, when Barcelona launched at max. 2GHz in September 2007, Intel already had 3GHz Clovertowns. That would mean an performance projection delta of -32% (compared to 2.6GHz K10 vs. 2.66GHz CT).

Yes, this 40% figure never can be true.
"(On) average, we are 40 percent-plus better than the competition in performance and a little bit better in power, and the combination is 1.5X in terms of power/performance," said Pat Gelsinger
http://www.cnet.com/news/a-dazed-intel-shifts-into-comeback-mode/


Very interesting.
It's just a puzzle game. As you see, we still have to wait for Hot Chips to fill in some speculative holes.
 

Abwx

Lifer
Apr 2, 2011
11,207
3,920
136
Do you know, which frequency the K10 and Clovertown used for the comparison had? Finally, when Barcelona launched at max. 2GHz in September 2007, Intel already had 3GHz Clovertowns.

Yes, this 40% figure never can be true.

Intel's own marketing material seems to admit that a Xeon E5472 (Edit : 3GHz..) with 800MHz memory is just as a fast as AMD's quad-core at 2GHz. AMD's 2.5GHz model will surely take the lead in LS-DYNA. Looking at the Fluent and LS-DYNA benchmarks it appears that AMD will remain very competitive in the HPC market.



http://www.anandtech.com/show/2386/11
 

KTE

Senior member
May 26, 2016
478
130
76
With "ANY" do you also mean 486->K5, K5->K6->K6-2->K6-3, K6-3->K7, etc.?
Yea let's talk about stuff no one remembers

I think it's very clear what I mean by any. But just to be pedantic, any recent archs AKA since K8 or since the downfall of AMD. AMD was different before their financial struggles deepened.

AMD officially (not in forums by a non-engineer) said about IPC:
"without significant loss on serial single-threaded workload components", which could mean IPC(BD) < IPC(K10) by 0 to 5% on average.
Also see the reviews as to what AMD told reviewers too. Look at Anand himself, similar IPC:

"AMD's goal with Bulldozer was to have IPC remain constant compared to its predecessor, while increasing frequency, similar to Prescott."

Johan said the same. And the frequency target of 30% higher... That would mean a 4.3GHz launch base.

Regarding Barcelona:
This is probably a plans vs. process reality thing. When they looked at results at simulated or even "OC'ed" 2.6GHz (ES could probably run that fast, just at too high power/voltage for short times), there were 2.66GHz Clovertowns reviewed just weeks before Allen's interview. Aside from that model, the next slower one had 1.86GHz only. What did they use?
http://ark.intel.com/de/products/codename/23349/Clovertown#@All

Do you know, which frequency the K10 and Clovertown used for the comparison had? Finally, when Barcelona launched at max. 2GHz in September 2007, Intel already had 3GHz Clovertowns. That would mean an performance projection delta of -32% (compared to 2.6GHz K10 vs. 2.66GHz CT).

Yes, this 40% figure never can be true.
It wasn't true at all. Barcelona nor Agena were any where near their counterparts at launch. Not in performance, not in power, not in clocks and no way in underclocking/overclocking.

Oh, and 3.16GHz Penryn Xeon was shipping by then.

Even then, they only used rate to show best case for AMD... Throughput. Why not use Int? It was marketing gibberish. Why compare a year before launch in a shady best case benchmark with completely unrealistic frequencies, knowing well that the competition would have something much newer and faster by launch? Why so when you are having major clocks/power problems? It was all highly misleading.

Now I'm a car guy if you know me. Namely supercars. So it's a bit like Nissan marketing touting that their Q4 2016 to release, yet unreleased, GTR at 1000hp is 42% faster than Bugatti fastest Chirons. That GTR then doesn't launch until 12 months on. When it does, it is 500hp in limited overheating scenarios, with no mod/tune headroom, runs 30% slower than boasted about, at Chirons price, that Nissan was talking about fastest/top speed around Tom Cruise's back street, and by then, Bugatti has its new generation which is 20% faster and cheaper than the Chiron compared to by Nissan anyway...

AMDs marketing thinks it can stay afloat by conning masses with media antics. They seem to think it works for a year before launch when their myths are busted. That shareholders will plough money in with the hype they build. They need to function on product first, hype second. I just hope they don't let investors and the public down once again. It's been 10 years since Conroe.

Sent from HTC 10
(Opinions are own)
 
Reactions: psolord and Sweepr

itsmydamnation

Platinum Member
Feb 6, 2011
2,882
3,439
136
Everyone ignores that a server guy said it, on anything greater the 1P amd still ruled to roost until Nehalem.

But again i ask the doubters, where is the performance deficit coming from, which area's specifically. I would never try to guess IPC with the known information,but you guys have no problem doing so. But there is nothing in all the known detail ( we know quite a bit from link'din, die shot, compiler patches and what amd released a few days ago) thats says IPC can't be high.

More execution width
Lower latency cache
More associative, larger per int core L1I
more cache bandwidth
larger PRF's
greater issue width (u-op cache)

I can keep naming things that are all headed in the direction of improved IPC, So start naming things other then "derp derp AMD" that look like performance limiters..............
 

Abwx

Lifer
Apr 2, 2011
11,207
3,920
136
It wasn't true at all. Barcelona nor Agena were any where near their counterparts at launch. Not in performance, not in power, not in clocks and no way in underclocking/overclocking.

What is displayed in Intel s marketing slide.?.

http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=threads/first-summit-ridge-zen-benchmarks.2482739/page-22

I can keep naming things that are all headed in the direction of improved IPC, So start naming things other then "derp derp AMD" that look like performance limiters..............

You ll read nothing about such datas, hence the tengential discussions that have nothing to do with Zen, so far we got an AOTS test wich use integer code and a Blender demo wich is FP, both point to the same perfs levels wich should be in the same ball park as Intel s esquivalent offerings.

As a reference for the FX in Blender :



Obviously it didnt favour AMD.
 
Last edited:
Reactions: prtskg

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I did a quick test and now I'm even more confused about AMDs claims. If the results AMD displayed versus Broadwell-E are fair and real then it is obviously great, however potentially there is a huge risk this back firing... I used the newest public pre-compiled Blender build (2.77a 64-bit) and ran the standard "BMW27" (ver. 4) benchmark scene. I reduced the scene size to 320x180 pixels to bring down the rendering time with a single thread to sane levels, and set the tile size to 20x20 pixels (even tiles, similar to Cinebench and AMD's own Blender demo). Piledriver operating at 3.5GHz completed the scene in 6 minutes 49 seconds. The score was within 200ms regardless if the CMT was enabled or disabled. Meanwhile my Haswell-EP operating at the same 3.5GHz frequency completed the scene 2 minutes 57 seconds. That's 131% higher IPC for Haswell-E. Since Broadwell-E should be < 10% faster than Haswell-E, AMD basically claims that they've achieved > 140% IPC improvement over Piledriver with Zen

I still need to test if using dual channel instead of quad or downcoring some of the L3 cache on Haswell-EP makes a major difference.
 

KTE

Senior member
May 26, 2016
478
130
76
Everyone ignores that a server guy said it, on anything greater the 1P amd still ruled to roost until Nehalem.

But again i ask the doubters, where is the performance deficit coming from, which area's specifically. I would never try to guess IPC with the known information,but you guys have no problem doing so. But there is nothing in all the known detail ( we know quite a bit from link'din, die shot, compiler patches and what amd released a few days ago) thats says IPC can't be high.

More execution width
Lower latency cache
More associative, larger per int core L1I
more cache bandwidth
larger PRF's
greater issue width (u-op cache)

I can keep naming things that are all headed in the direction of improved IPC, So start naming things other then "derp derp AMD" that look like performance limiters..............
I personally don't care for IPC alone.

I care for AMD to be competitive. Performance, power, clocks, price, platform, OC. Especially the first four. They all work in-sync.

Raw IPC is just one part of the winning equation

Though if AMD did improve IPC vs EXC +25% overall, minus the outliers... It'd be nothing short of amazing. On an engineering front.

Since this event was marketing, I'm here only discussing about marketing claims vs. what we'll get/standing at launch.

Sent from HTC 10
(Opinions are own)
 

inf64

Diamond Member
Mar 11, 2011
3,777
4,251
136
I did a quick test and now I'm even more confused about AMDs claims. If the results AMD displayed versus Broadwell-E are fair and real then it is obviously great, however potentially there is a huge risk this back firing... I used the newest public pre-compiled Blender build (2.77a 64-bit) and ran the standard "BMW27" (ver. 4) benchmark scene. I reduced the scene size to 320x180 pixels to bring down the rendering time with a single thread to sane levels, and set the tile size to 20x20 pixels (even tiles, similar to Cinebench and AMD's own Blender demo). Piledriver operating at 3.5GHz completed the scene in 6 minutes 49 seconds. The score was within 200ms regardless if the CMT was enabled or disabled. Meanwhile my Haswell-EP operating at the same 3.5GHz frequency completed the scene 2 minutes 57 seconds. That's 131% higher IPC for Haswell-E. Since Broadwell-E should be < 10% faster than Haswell-E, AMD basically claims that they've achieved > 140% IPC improvement over Piledriver with Zen

I still need to test if using dual channel instead of quad or downcoring some of the L3 cache on Haswell-EP makes a major difference.

When you think about it, Zen has 2x the FP and L/S resources per core Vs PD and since it has 2x more threads that comes to around same amount of resources per thread (with supposedly lower instruction latencies). Also it has SMT which adds another 20-30% if done right. Should be just about 130% faster than PD.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
When you think about it, Zen has 2x the FP and L/S resources per core Vs PD and since it has 2x more threads that comes to around same amount of resources per thread (with supposedly lower instruction latencies). Also it has SMT which adds another 20-30% if done right. Should be just about 130% faster than PD.

How does SMT have anything to do with what I just wrote?
Alternatively one could ask how the average IPC improvement becomes just 40% if they manage to hit over 140% in some cases?
 

inf64

Diamond Member
Mar 11, 2011
3,777
4,251
136
How does SMT have anything to do with what I just wrote?
Alternatively one could ask how the average IPC improvement becomes just 40% if they manage to hit over 140% in some cases?

Well SMT is part of the core and it surely adds something to the table. As for the average number being "just 40%" I suppose AMD used mostly integer workloads that dominated the mix. Plus there might be some workloads that do not see that big of a jump?

BTW I thought it was settled that 40% figure was over EX core. EX core is not the same as PD and you wrote about it above. So 40% above EX is around ~60% above PD (or so, depending on the workload of course).

EDIT:
I forgot to add that AMD said 40% IPC improvement core vs core. 140% in MT scenario has SMT component in it so it is not part of their average for ST IPC jump - or at least that is how I see it.
 
Reactions: prtskg

itsmydamnation

Platinum Member
Feb 6, 2011
2,882
3,439
136
I find it funny you guys are going on about extrapolation yet not one word of per ALU schedulers......

There is plenty of stuff that AMD Have had long running papers and patients for that could be in Zen that can help explain big performance jumps one obvious area that comes to mind is branch miss recovery.........

AMD didn't go 4 wide ALU for fun, they could have gone 3 saved some routing/forwarding/bypass network/scheduling headaches and some power per clock. yet they didn't, look "north" of the execution units to figure out why.
 
Reactions: prtskg and inf64

KTE

Senior member
May 26, 2016
478
130
76
I did a quick test and now I'm even more confused about AMDs claims. If the results AMD displayed versus Broadwell-E are fair and real then it is obviously great, however potentially there is a huge risk this back firing... I used the newest public pre-compiled Blender build (2.77a 64-bit) and ran the standard "BMW27" (ver. 4) benchmark scene. I reduced the scene size to 320x180 pixels to bring down the rendering time with a single thread to sane levels, and set the tile size to 20x20 pixels (even tiles, similar to Cinebench and AMD's own Blender demo). Piledriver operating at 3.5GHz completed the scene in 6 minutes 49 seconds. The score was within 200ms regardless if the CMT was enabled or disabled. Meanwhile my Haswell-EP operating at the same 3.5GHz frequency completed the scene 2 minutes 57 seconds. That's 131% higher IPC for Haswell-E. Since Broadwell-E should be < 10% faster than Haswell-E, AMD basically claims that they've achieved > 140% IPC improvement over Piledriver with Zen

I still need to test if using dual channel instead of quad or downcoring some of the L3 cache on Haswell-EP makes a major difference.

I think something's not quite right there.

AFAICR, my understanding is that Zambezi was only just behind Sandy Bridge in Blender anyway and Vishera only just behind Ivy Bridge (in MT, and certainly in CB R15 MT it was very close).

Although it does depend on the scene... and while MT was close, ST was a great whitewash for Intel.

Sent from HTC 10
(Opinions are own)
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
BTW I thought it was settled that 40% figure was over EX core. EX core is not the same as PD and you wrote about it above. So 40% above EX is around ~60% above PD (or so, depending on the workload of course).

Of course it is over Excavator. But the thing is, in Blender Excavator is marginally faster than Piledriver. Just did the same test on FX-8800P. 6 minutes 57 seconds at 3.4GHz, which would mean ~ 6 minutes 45 seconds at 3.5GHz ((3400/3500) * 417), i.e < 1% higher IPC than on PD. Most likely the smaller L2 cache and AVX2 at it again.
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |