AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Page 24 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
From a high level view and based on what Hot Chips slides revealed, Zen is somewhere in between SB and Skylake when core resources are in question (closer to Skylake in integer and closer to SB/IB in FP). I bet the IPC will end up right in the middle of these two and from AT we know the gap is around 25%. So if it was to end up in the middle of that range or a bit lower (around 10% faster than SB) it would end up being: 1) just a tiny bit faster than IB (~4%) ; 2) 7% slower than HSWL; 3) 10% slower than BDWL ; 4) ~14% slower than Skylake (funny enough the int/fp schedulers in Zen are ~15-20% smaller than Skylake's and right at Haswell level).

The amount of time Intel has spend tweaking its cores will win out. I doubt execution resources and IPC can be directly compared like that - AMD will need more resources to achieve a given IPC relative to Intel simply because this is their first version of Zen compared to how many iterations of Core. (Intel has extracted small, but measurable performance increases across their ticks, despite allocating nearly no additional hardware resources).
 

VirtualLarry

No Lifer
Aug 25, 2001
56,452
10,120
126
A bit off-topic, but at the time, attempting to use a tilde produced a dash instead (~~~~~ see it's still doing it) so . . . must be a bug/snafu in there somewhere. Anyway I think VirtualLarry caught on to it.

OT, but I saw tildes in my post, and I see them in yours. Check your font, your browser zoom factor, and your screen scaling and cleartype settings.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Today I spent (too) many hours while testing Blender.

I made few customized builds including the standard MSVC 2013 compiled version, with and without AVX2 and then a similar set of builds with Cycles compiled with ICL. It soon became obvious that despite Blender does have AVX2 kernel present, it does absolutely nothing (at least speed wise). The AVX2 kernel can be disabled easily during build time (CCX_HAS_AVX2 = False) and disabling it didn't change the performance even the slightest neither on MSVC or ICL. The funniest thing is that compiling Cycles with ICL generic arch tunings (SSE2, SSE3, SSE4.1 & AVX) improved the performance on Piledriver by ~ 39%. Meanwhile the performance on Haswell-E improved by < 9%. So much for the ICL making AMD CPUs look worse than they actually are

Even when compiled with ICL, the SMT yield in Blender is abnormally high. Combined with the fact that the AVX2 kernel seems to do nothing, I don't think Blender manages to extract all of the performance potential out of Haswell and newer Intel parts.

Tomorrow morning I'll upload all of the builds I compiled today, so other people can test them too.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,868
3,419
136
Even when compiled with ICL, the SMT yield in Blender is abnormally high. Combined with the fact that the AVX2 kernel seems to do nothing, I don't think Blender manages to extract all of the performance potential out of Haswell and newer Intel parts.

Tomorrow morning I'll upload all of the builds I compiled today, so other people can test them too.
sigh..... lets think about this a second.

1. AVX2 is really just(mainly,mostly) an int SIMD instruction set ( you know like MMX) so whats the workload look like, blender is rendering right? 3D co-ord space right? FLOATING POINT right?
2. both cores appear to get a massive boost from SMT, what does that tell you? Lots of cache misses/execution bubbles. Making those cases better is exactly how you improve IPC.....

So how isn't it using all of Haswell and newer iIntel parts potential. You seem to do all this good methodical work and then jump the shark at the easy part....................
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
What's the point in wasting the time writing AVX / AVX2 kernels if AVX is generally useless for the task?
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,868
3,419
136
because its not just int SIMD there was some initial Scatter support in AVX2, but from what i have seen most have struggled to derive any benefit from it. AVX 512 is supossed to have much better scatter gather support.

There can be other optimizations that only help in corner cases etc.
 

DrMrLordX

Lifer
Apr 27, 2000
21,813
11,168
136
OT, but I saw tildes in my post, and I see them in yours. Check your font, your browser zoom factor, and your screen scaling and cleartype settings.

I only see tildes in quote boxes. Outside I see dashes. It's weird.

Even when compiled with ICL, the SMT yield in Blender is abnormally high. Combined with the fact that the AVX2 kernel seems to do nothing, I don't think Blender manages to extract all of the performance potential out of Haswell and newer Intel parts.

Disappointing that AVX2 does nothing for Blender. The support is advertised.

sigh..... lets think about this a second.

1. AVX2 is really just(mainly,mostly) an int SIMD instruction set ( you know like MMX)

Wait. What? AVX2 is not int-only, it's fp too:

https://software.intel.com/en-us/articles/how-intel-avx2-improves-performance-on-server-applications

For existing vectorized code that uses floating point operations, you can gain a potential performance boost when running on newer platforms such as the Intel® Xeon® processor E5 v3 family by doing one of the following:

  1. Recompile your code, using the Intel® compiler with the proper AVX2 switch to convert existing SSE code. See the Intel® Compiler Options for Intel® SSE and Intel® AVX generation (SSE2, SSE3, SSSE3, ATOM_SSSE3, SSE4.1, SSE4.2, ATOM_SSE4.2, AVX, AVX2) and processor-specific optimizations for more details.
  2. Modify your code's function calls to leverage the Intel® Math Kernel Libraries (Intel® MKL) which are already optimized to use AVX2 where supported
  3. Use the AVX2 intrinsic instructions. For high level language (such as C or C++) developers, you can use Intel® Intrinsic instructions to make the call and recompile code. See the Intel® Intrinsic Guide and Intel® 64 and IA-32 Architectures Optimization Reference Manual for more details
  4. Code in assembly instructions directly. For low level language (such as assembly) developers, you can use those equivalent AVX2 instructions from their existing SSE code. See the Intel® 64 and IA-32 Architectures Optimization Reference Manual for more details
 

TheELF

Diamond Member
Dec 22, 2012
3,993
744
126
sigh..... lets think about this a second.

1. AVX2 is really just(mainly,mostly) an int SIMD instruction set ( you know like MMX) so whats the workload look like, blender is rendering right? 3D co-ord space right? FLOATING POINT right?
2. both cores appear to get a massive boost from SMT, what does that tell you? Lots of cache misses/execution bubbles. Making those cases better is exactly how you improve IPC.....

So how isn't it using all of Haswell and newer iIntel parts potential. You seem to do all this good methodical work and then jump the shark at the easy part....................
That AMD specially selected blender for a reason instead of using an proven generally accepted benchmark?

No you improve IPC by making everything run faster,for bubbles you improve ILP OoO and so on.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Today I spent (too) many hours while testing Blender.

I made few customized builds including the standard MSVC 2013 compiled version, with and without AVX2 and then a similar set of builds with Cycles compiled with ICL. It soon became obvious that despite Blender does have AVX2 kernel present, it does absolutely nothing (at least speed wise). The AVX2 kernel can be disabled easily during build time (CCX_HAS_AVX2 = False) and disabling it didn't change the performance even the slightest neither on MSVC or ICL. The funniest thing is that compiling Cycles with ICL generic arch tunings (SSE2, SSE3, SSE4.1 & AVX) improved the performance on Piledriver by ~ 39%. Meanwhile the performance on Haswell-E improved by < 9%. So much for the ICL making AMD CPUs look worse than they actually are

Even when compiled with ICL, the SMT yield in Blender is abnormally high. Combined with the fact that the AVX2 kernel seems to do nothing, I don't think Blender manages to extract all of the performance potential out of Haswell and newer Intel parts.

Tomorrow morning I'll upload all of the builds I compiled today, so other people can test them too.

Nice, thanks!
 

majord

Senior member
Jul 26, 2015
444
533
136
It's virtually impossible for Zen to be faster than broadwell. Why? Because AMD basically outright said it won't be..

40% higher 'ipc' than Excavator is less than Broadwell.. Simple

If you watch the video of Blender Video, AMD themselves essentially state that it shows what it's capable of. T

There's enough Architectural differences (creating strength's and weaknesses) to [almost] ensure there will be corner cases of Haswell-broadwell throughput (more so than ST IPC), especially given there are areas where Zen is already technically at an advantage.. but on average - No, obviously not.
 
Reactions: Sweepr

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
The Stilt said:
So much for the ICL making AMD CPUs look worse than they actually are
There isn't much need for Intel to do this at this time, given that the FX chips are so old now.
The Stilt said:
(SSE2, SSE3, SSE4.1 & AVX)
Wouldn't XOP/FMA3-4 support improve the performance further, provided that the program would be coded to leverage it?
 

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
It's virtually impossible for Zen to be faster than broadwell. Why? Because AMD basically outright said it won't be..

40% higher 'ipc' than Excavator is less than Broadwell.. Simple

If you watch the video of Blender Video, AMD themselves essentially state that it shows what it's capable of. T

There's enough Architectural differences (creating strength's and weaknesses) to [almost] ensure there will be corner cases of Haswell-broadwell throughput (more so than ST IPC), especially given there are areas where Zen is already technically at an advantage.. but on average - No, obviously not.
On ST I agree... on MT... I don't think so....
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
2. both cores appear to get a massive boost from SMT, what does that tell you? Lots of cache misses/execution bubbles. Making those cases better is exactly how you improve IPC.....
How IPC gets increased, also depends on the uarch feature's metrics (area, delay, perf increase/power increase). So it might be more expensive to try to avoid the bubbles and misses, than to add more resources, which improve performance in general. Some misses/bubbles are just inevitable, as long as 2GB L1s are out of question. SMT then just becomes the add-on to make use of the resources during stalls of the other thread.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
It's virtually impossible for Zen to be faster than broadwell. Why? Because AMD basically outright said it won't be..

40% higher 'ipc' than Excavator is less than Broadwell.. Simple

If you watch the video of Blender Video, AMD themselves essentially state that it shows what it's capable of. T

There's enough Architectural differences (creating strength's and weaknesses) to [almost] ensure there will be corner cases of Haswell-broadwell throughput (more so than ST IPC), especially given there are areas where Zen is already technically at an advantage.. but on average - No, obviously not.

Yes, AMD likely chose one of the best comparisons to show the best of Zen's capabilities. Nothing so far raises expectations beyond IB like IPC on average given AMD's 40% statements.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
How IPC gets increased, also depends on the uarch feature's metrics (area, delay, perf increase/power increase). So it might be more expensive to try to avoid the bubbles and misses, than to add more resources, which improve performance in general. Some misses/bubbles are just inevitable, as long as 2GB L1s are out of question. SMT then just becomes the add-on to make use of the resources during stalls of the other thread.

Don't underestimate SMT, power-8 gets more than 50% performance uplift from the second thread and more than 100% more throughput from running 4 threads.
 

KTE

Senior member
May 26, 2016
478
130
76
IPC of any workload only profits major increases with SMT if it's low to begin with compared to the max physically possible (0.7-1.2 you'll find common).

But when you increase IPC in any app with something like SMT... Watch what happens to the power draw. Power virus galore.

Sent from HTC 10
(Opinions are own)
 

DrMrLordX

Lifer
Apr 27, 2000
21,813
11,168
136
Yes, AMD likely chose one of the best comparisons to show the best of Zen's capabilities. Nothing so far raises expectations beyond IB like IPC on average given AMD's 40% statements.

And how exactly did you come to those conclusions?

Since when did fp workloads show AMD in the best possible light?

And how do you conclude that Zen/Summit Ridge is going to have "IB like IPC" when one of two available benchmarks shows it slightly edging out Broadwell-E in overall throughput? Remember, Zen is not a CMT design.
 

DrMrLordX

Lifer
Apr 27, 2000
21,813
11,168
136
Uh, I don't think Stilt is trying to cripple the performance of anything . . . or at least, that's my impression.
 
Mar 10, 2006
11,715
2,012
126
And how exactly did you come to those conclusions?

Since when did fp workloads show AMD in the best possible light?

And how do you conclude that Zen/Summit Ridge is going to have "IB like IPC" when one of two available benchmarks shows it slightly edging out Broadwell-E in overall throughput? Remember, Zen is not a CMT design.

It's marketing, they are going to put their best foot forward. Come on now.
 
Apr 20, 2008
10,162
984
126
Really now? Some examples, please.

Any instruction that has a higher probability of a cache-miss or targeting FPU registers in a fixed-size, nonparallel manner. Also, targeting a vendor-specific instruction set (also in a fixed length if it is common) can drastically alter performance. Does SSE/3DNow! ring a bell for anyone? This isn't rocket science.
 
Reactions: ShintaiDK

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Any instruction that has a higher probability of a cache-miss or targeting FPU registers in a fixed-size, nonparallel manner. Also, targeting a vendor-specific instruction set (also in a fixed length if it is common) can drastically alter performance. Does SSE/3DNow! ring a bell for anyone? This isn't rocket science.

Same reason why the Zen benchmark cant be trusted on so many levels.
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |