First ever look at core IPC and Blender benchmark

Dannotech

Junior Member
Jul 19, 2016
10
3
36
Hi guys,

I'm working on a program that measures core IPC on the processor in real time and I thought it would be an interesting exercise to extrapolate Zen's IPC based on the Blender benchmark AMD presented at their New Horizon demo last week. So I made this Youtube video: https://youtu.be/IN0BjzaP7lA

I hope you guys will check it out am and looking forward to constructive feedback.

If you don't have time to sit though the video, I'll give you the spoilers:

1) AMD Ryzen completed the benchmark in 35.1 seconds.
2) My Dual Ivy Bridge Xeons with 24C/48T finished the benchmark in 19.8 seconds.
3) The overall average IPC of the Ivy Bridge Xeon is 1.59 (and this is a fact).
4) Extrapolated AMD Ryzen to be 40% faster clock-for-clock than Ivy Bridge @ 2.22 IPC
5) WOW!!
(I don't think I'm going to trade in my Dual Xeons for Summit Ridge but I will consider 2x Naples if the price is right =)

Thanks for checking it out!
 
Reactions: inf64 and rvborgh

Dannotech

Junior Member
Jul 19, 2016
10
3
36
2nd video is up; it shows the exact IPC of an Intel Skylake processor running the same test. Spoilers:
  • Skylake IPC is 1.35 (without HT), just a smidge slower than my Ivy Bridge Xeons with Hyper-threading at 1.59.
  • The Core i5 Skylake finished the benchmark in 2:08
  • Provides solid confirmation that Ryzen IPC is 2.21
  • I don't have a Skylake or Broadwell i7 which sucks because I know that's really what we want to see /sadface
https://youtu.be/buOKWKyJ4-I
Let me know what you think.
 
Reactions: rvborgh

jhu

Lifer
Oct 10, 1999
11,918
9
81
Hi guys,

I'm working on a program that measures core IPC on the processor in real time and I thought it would be an interesting exercise to extrapolate Zen's IPC based on the Blender benchmark AMD presented at their New Horizon demo last week. So I made this Youtube video: https://youtu.be/IN0BjzaP7lA

I hope you guys will check it out am and looking forward to constructive feedback.

If you don't have time to sit though the video, I'll give you the spoilers:

1) AMD Ryzen completed the benchmark in 35.1 seconds.
2) My Dual Ivy Bridge Xeons with 24C/48T finished the benchmark in 19.8 seconds.
3) The overall average IPC of the Ivy Bridge Xeon is 1.59 (and this is a fact).

4) Extrapolated AMD Ryzen to be 40% faster clock-for-clock than Ivy Bridge @ 2.22 IPC
5) WOW!!
(I don't think I'm going to trade in my Dual Xeons for Summit Ridge but I will consider 2x Naples if the price is right =)

Thanks for checking it out!

This is not entirely accurate. IPC/core decreases as core count increases and especially when there's more than 1 socket involved. Try running the scene using only 1 core. You'll see the IPC rise. For example, here's what happens with the BMW benchmark on my Sandy Bridge processors running Blender 2.71 on Debian 8.5:

*** number of samples is 1920*1080/4 * 200

Core i7 2600 (stock, 3.5 GHz turbo, 4C/8T)
Time: 1 minutes 40.44 seconds
73732 samples/s/core/GHz

Xeon E5 2670 x2 (3.0 GHz turbo, 16C/32T)
Time: 42.33 seconds
51028 samples/s/core/GHz

With all cores and threads loaded, the dual E5 has about 70% of the IPC/core as the i7 2600.
 
Reactions: Ken g6

rvborgh

Member
Apr 16, 2014
195
94
101
hi Jhu, it must vary depending on arch... on my quad socket 48 core Opteron rig... for Cinebench (both 11.5 and R15) i see about a 6-7% dropoff out at 48 cores... and virtually none between 1 and 16 cores (in the 16 core case i am running 2 cores per die, or 4 cores per each of the 4 sockets). When i run 4 cores at 3.6 GHz on socket 0... i score 330cb (4.01 on 11.5), and up at 16 cores (score of 16 exactly on CB 11.5), virtually perfect scaling. i wonder why the dual socket Intels have that dropoff

i will experiment with Blender and the Ryzen file with varying # of sockets to see if it makes a difference.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
hi Jhu, it must vary depending on arch... on my quad socket 48 core Opteron rig... for Cinebench (both 11.5 and R15) i see about a 6-7% dropoff out at 48 cores... and virtually none between 1 and 16 cores (in the 16 core case i am running 2 cores per die, or 4 cores per each of the 4 sockets). When i run 4 cores at 3.6 GHz on socket 0... i score 330cb (4.01 on 11.5), and up at 16 cores (score of 16 exactly on CB 11.5), virtually perfect scaling. i wonder why the dual socket Intels have that dropoff

Try running Blender. As Maxon says, Cinema4D/Cinebench is NUMA aware (which to me would mean that code and data get spawned to memory that is local to the CPU). As far as I know, Blender is not.
 

rvborgh

Member
Apr 16, 2014
195
94
101
Jhu, Its very strange because on my rig... CB (both) run best without NUMA... ie with node interleaving on in the BIOS, under Windows Server 2008R2. i cannot recall the exact amount of decrease... but it was significant... maybe 8% or some such. Enough to make me think that CB isn't NUMA aware...

i'll do more Blender runs at 1, 2, 4,16, 32 and 48 cores... to see what the scaling is like.

Try running Blender. As Maxon says, Cinema4D/Cinebench is NUMA aware. As far as I know, Blender is not.
 
Mar 10, 2006
11,715
2,012
126
AMD itself has said Zen offers 40% more IPC than XV. XV had far lower IPC than Ivy Bridge.

Your math doesn't mesh well with AMD's official statements.
 

inf64

Diamond Member
Mar 11, 2011
3,765
4,223
136
AMD itself has said Zen offers 40% more IPC than XV. XV had far lower IPC than Ivy Bridge.

Your math doesn't mesh well with AMD's official statements.

To be fair Lisa Su stated they exceeded the 40% IPC increase(she never said by how much, I reckon it is not by 1%). Also you forget that this number is for ST IPC improvement and SMT comes on top of that ( refer to HotChips Q&A session). For reference, in Cinebench SMT on post Haswell cores brings around 35-40% more performance (1T on on core vs 2T SMT enabled ran on one core). I think there is similar gain in Blender.
 

Abwx

Lifer
Apr 2, 2011
11,172
3,868
136
AMD itself has said Zen offers 40% more IPC than XV. XV had far lower IPC than Ivy Bridge.

Your math doesn't mesh well with AMD's official statements.

HW, not SB/IB, has 40.3% IPC advantage over Piledriver, not XV, in Fritzchess, and lower than this with Houdini and Stockfish, in 7Zip it s about 30%..

As pointed by inf64 you are somewhat confusing 1C/1T results with 1C/2T ones.
 
Mar 10, 2006
11,715
2,012
126
To be fair Lisa Su stated they exceeded the 40% IPC increase(she never said by how much, I reckon it is not by 1%). Also you forget that this number is for ST IPC improvement and SMT comes on top of that ( refer to HotChips Q&A session). For reference, in Cinebench SMT on post Haswell cores brings around 35-40% more performance (1T on on core vs 2T SMT enabled ran on one core). I think there is similar gain in Blender.

OK, but this user is saying that Zen has 1.4x IVB IPC, which would put it well beyond any recent Intel CPU core.

That does not pass any sort of basic sanity test. Even AMD's own demos, which I suspect are best-case scenarios (why wouldn't they be -- it's marketing), show 8C/16T Zen roughly matching the throughput of an 8C/16T Broadwell-E.

I think some people are letting their expectations run a little too wild.
 

inf64

Diamond Member
Mar 11, 2011
3,765
4,223
136
OK, but this user is saying that Zen has 1.4x IVB IPC, which would put it well beyond any recent Intel CPU core.

That does not pass any sort of basic sanity test. Even AMD's own demos, which I suspect are best-case scenarios (why wouldn't they be -- it's marketing), show 8C/16T Zen roughly matching the throughput of an 8C/16T Broadwell-E.

I think some people are letting their expectations run a little too wild.

Actually going by his calculations even Broadwell-E has ~40% higher IPC* than IB in Blender.

*IPC here referred is what his software is showing us.
 
Mar 10, 2006
11,715
2,012
126
Actually going by his calculations even Broadwell-E has ~40% higher IPC than IB in Blender.

Right, and that doesn't make sense. At all. HSW was about 10% faster per clock than IVB, and Intel's own claims show Broadwell at ~5.5% faster per clock than Haswell.

(1.1)*(1.055) ~= 1.161.

40% just doesn't make sense, which means that the OP needs to rethink the methodology used to calculate these results.
 

Dannotech

Junior Member
Jul 19, 2016
10
3
36
Right, and that doesn't make sense. At all. HSW was about 10% faster per clock than IVB, and Intel's own claims show Broadwell at ~5.5% faster per clock than Haswell.

(1.1)*(1.055) ~= 1.161.

40% just doesn't make sense, which means that the OP needs to rethink the methodology used to calculate these results.

The math only doesn't add up if AMD lied; that is to say, if the benchmark was not in fact 150 samples but something else much smaller-- which is entirely possible given that they've already blundered the Blender file once already. Otherwise you're just arguing against math.
 
Mar 10, 2006
11,715
2,012
126
The math only doesn't add up if AMD lied; that is to say, if the benchmark was not in fact 150 samples but something else much smaller-- which is entirely possible given that they've already blundered the Blender file once already. Otherwise you're just arguing against math.

I'm not arguing against math. I'm arguing against findings that don't agree with verified facts.
 

Dannotech

Junior Member
Jul 19, 2016
10
3
36
I'm not arguing against math. I'm arguing against findings that don't agree with verified facts.

Granted, its not possible to verify my findings because the software isn't public and therefore you just have to take my word for it. But to your point which is to say that your facts have been verified, it is not to my knowledge that anybody has ever measured and published the IPC of any these processors, and so I think your assertion is baseless. I want you to consider what you're saying, and how that reflects on you in the community-- which is that you are completely discrediting my research based on gut feelings. Provide sources.

I would be happy to put my software on Broadwell-E and get the definitive answer-- if I had access to one.

Until then, lets try and identify the specific problems with the methodology if you want to pokes holes in my research.
 
Mar 10, 2006
11,715
2,012
126
Granted, its not possible to verify my findings because the software isn't public and therefore you just have to take my word for it. But to your point which is to say that your facts have been verified, it is not to my knowledge that anybody has ever measured and published the IPC of any these processors, and so I think your assertion is baseless. I want you to consider what you're saying, and how that reflects on you in the community-- which is that you are completely discrediting my research based on gut feelings. Provide sources.

I'm pointing out that in real-world benchmarks, if you look at the perf/MHz of various architectures, it is very easy to see that Broadwell is not 40% faster per MHz than an Ivy Bridge is. Look at Geekbench 4 sub-tests, for example.

So, again, might be worthwhile for you to check your methodology or to try to figure out what your numbers are actually measuring.
 

inf64

Diamond Member
Mar 11, 2011
3,765
4,223
136
I'm pointing out that in real-world benchmarks, if you look at the perf/MHz of various architectures, it is very easy to see that Broadwell is not 40% faster per MHz than an Ivy Bridge is. Look at Geekbench 4 sub-tests, for example.

So, again, might be worthwhile for you to check your methodology or to try to figure out what your numbers are actually measuring.

I have taken a look at what kind of scores 3770K gets in Blender (see here).
Intel 3770k 4.2 200 107
Intel 3770k 4.2 100 54.2
Intel 3770k 3.9 200 104.13
Intel 3770k 3.9 100 52.83

We can see that one 3770K @ 4.2Ghz and 100 samples scores 54.2s and another entry shows 3.9Ghz clock and a runtime of 52.83s. Similar goes for 200 samples(runtime is 2x longer in both cases Vs 100 samples). For the sake of argument I will average the numbers and assume a 4Ghz IB 4C/8T would score around 55.6s at 100 samples. At 150 samples it would score 55.6x1.5~=83.4s. At 3.5Ghz it would score 95.3s.

We know that 6900K is 8C/16T chip that runs at ~3.5Ghz during Blender. It ran the rendering for 26s during the live stream (Ryzen @ 3.4Ghz finished at approx. the same time or a bit faster).
So we have, for 150 samples:
Broadwell-E 8C/16T @ 3.5Ghz : 26s => Broadwell-E 4C/8T should do it in 2x the time which is approx. 52s.
IB 4C/8T @ 3.5Ghz : 95.3.
52s/95.3s~=0.55 or in percentage terms Broadwell is 45% faster.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Granted, its not possible to verify my findings because the software isn't public and therefore you just have to take my word for it. .
.
.
.
Until then, lets try and identify the specific problems with the methodology if you want to pokes holes in my research.

Publish your code on Github. Otherwise you're just another (new) poster making unsubstantiated claims.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,249
1,694
136
I have taken a look at what kind of scores 3770K gets in Blender (see here).
Intel 3770k 4.2 200 107
Intel 3770k 4.2 100 54.2
Intel 3770k 3.9 200 104.13
Intel 3770k 3.9 100 52.83

We can see that one 3770K @ 4.2Ghz and 100 samples scores 54.2s and another entry shows 3.9Ghz clock and a runtime of 52.83s. Similar goes for 200 samples(runtime is 2x longer in both cases Vs 100 samples). For the sake of argument I will average the numbers and assume a 4Ghz IB 4C/8T would score around 55.6s at 100 samples. At 150 samples it would score 55.6x1.5~=83.4s. At 3.5Ghz it would score 95.3s.

We know that 6900K is 8C/16T chip that runs at ~3.5Ghz during Blender. It ran the rendering for 26s during the live stream (Ryzen @ 3.4Ghz finished at approx. the same time or a bit faster).
So we have, for 150 samples:
Broadwell-E 8C/16T @ 3.5Ghz : 26s => Broadwell-E 4C/8T should do it in 2x the time which is approx. 52s.
IB 4C/8T @ 3.5Ghz : 95.3.
52s/95.3s~=0.55 or in percentage terms Broadwell is 45% faster.

Broadwell-E 6900k didnt finish 150 samples in 26 seconds.

150 samples: 36 seconds
100 samples: 26 seconds

Try again.

Edit: Using your own numbers, the 3770k 4x/8t would score ~61.95 at 100 samples and 3.5 GHz.

Broadwell-E 6900k at 26x2 = 52 seconds.

52/61.95 = 0.839, so according to your methodology here BW-e is actually 16.1% faster IPC than IVB

Hmm. 10% faster for HW vs IVB, then 5.5% faster for BW vs HW.

1.1 * 1.055 = 1.1605, or 16.1% faster.

16.1% = 16.1%

You can try to use stinky math to prove a false point, but the math never lies.
 
Last edited:

dfk7677

Member
Sep 6, 2007
64
21
81
I am not sure if the following are relevant but I will post them.

Using perf command in Ubuntu 16.10, kernel 4.8.0, blender 2.77 (I know they used 2.78a) with a i5 4590 and rendering the given file I got:
346174.063597 task-clock (msec) # 3.757 CPUs utilized
161,132 context-switches # 0.465 K/sec
701 cpu-migrations # 0.002 K/sec
17,614 page-faults # 0.051 K/sec
1,210,632,199,729 cycles # 3.497 GHz
2,556,796,196,202 instructions # 2.11 insn per cycle
310,932,944,114 branches # 898.198 M/sec
3,259,375,950 branch-misses # 1.05% of all branches

92.150175202 seconds time elapsed

Edit: This was done with default settings, so I guess samples=200.

Edit 2:
With blender 2.78a and 150 samples:
http://prnt.sc/dlvzi1
 
Last edited:

Dannotech

Junior Member
Jul 19, 2016
10
3
36
I am not sure if the following are relevant but I will post them.

Using perf command in Ubuntu 16.10, kernel 4.8.0, blender 2.77 (I know they used 2.78a) with a i5 4590 and rendering the given file I got:
346174.063597 task-clock (msec) # 3.757 CPUs utilized
161,132 context-switches # 0.465 K/sec
701 cpu-migrations # 0.002 K/sec
17,614 page-faults # 0.051 K/sec
1,210,632,199,729 cycles # 3.497 GHz
2,556,796,196,202 instructions # 2.11 insn per cycle
310,932,944,114 branches # 898.198 M/sec
3,259,375,950 branch-misses # 1.05% of all branches

92.150175202 seconds time elapsed

Edit: This was done with default settings, so I guess samples=200.

Edit 2:
With blender 2.78a and 150 samples:
http://prnt.sc/dlvzi1

Thanks for that. Not sure but I think that's a Haswell generation i5, am I right? Haswell being 22nm and Broadwell being 14nm IIRC.

I have been trying to calculate what I think would be the IPC of Broadwell-E and my (ahem.. work-in-progress) prediction is currently @ 2.15 IPC, just a smidge slower than than Zen, and your trace of the performance counters in Ubuntu is very very close to that. This helps to provide confirmation. I'm not sure yet if Broadwell is actually any faster than Haswell in terms of IPC but since one is merely a die shrink of the other I'm leaning towards "no" on that point.


Again thanks for that. I owe you a drink. Seriously!
 
Mar 10, 2006
11,715
2,012
126
I have been trying to calculate what I think would be the IPC of Broadwell-E and my (ahem.. work-in-progress) prediction is currently @ 2.15 IPC, just a smidge slower than than Zen, and your trace of the performance counters in Ubuntu is very very close to that. This helps to provide confirmation. I'm not sure yet if Broadwell is actually any faster than Haswell in terms of IPC but since one is merely a die shrink of the other I'm leaning towards "no" on that point.

Intel says it's 5.5% faster on average per clock than Haswell.
 

Dannotech

Junior Member
Jul 19, 2016
10
3
36
Intel says it's 5.5% faster on average per clock than Haswell.

If true, then I've underestimated Broadwell IPC by a few points; it should be 2.21 (identical to Zen) instead of 2.15. But if that were true, Broadwell-E would have finished the Blender benchmark in 34.82 seconds, a full 1/3rd of a second faster than Zen. If you watch the recording, it was just too close to call. So I'm going to stick with the math on this one and stand by my estimate of 2.15.
 

dfk7677

Member
Sep 6, 2007
64
21
81
Strange thing is, when I did the same rendering in Windows, I got 107.9sec with ~1.4IPC...
(IPC calculated by PerfMonitor2)
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |