I can guarantee the code for DGEMM and SGEMM on ARM is just plain horrible and could also be made much faster. But the aim of Geekbench is not to get the absolute fastest code for a given subtest, so IMHO it wouldn't make sense to overtune one of the benchmarks.
I don't know about Broadwell but PL3 doesn't seem to work like that for Haswell.I didn't make any predictions about performance. I made an analysis of how PL3 could work using Intels available data and how that aluminum backplate can play a role in the device performance.
I never said you were making predictions about performance.I didn't make any predictions about performance. I made an analysis of how PL3 could work using Intels available data and how that aluminum backplate can play a role in the device performance.
So what's the conclusion on the "for how long can Broadwell-M turbo" debate? Are we talking milliseconds, seconds or minutes on a standard Ultrabook?
Also, isn't it about time we see actual devices becoming available at least for review by now? We're only 2 away months from Christmas... so if they actually want to sell B-M in quantity, there should be reviews available by now, so awareness can spread to potential buyers, and buyers have time to make a buying decision. And, yes, I'm impatient!
I never said you were making predictions about performance.
I was talking about me, I am not making any predictions on performance until actual devices for the exact reasons you mentioned. Too many moving balls in the air, too many unknowns, we can't solve for the answer.
And we will see real world reviews from trusted sources in the next month so I will be patient.
So what's the conclusion on the "for how long can Broadwell-M turbo" debate? Are we talking milliseconds, seconds or minutes on a standard Ultrabook?
Also, isn't it about time we see actual devices becoming available at least for review by now? We're only 2 away months from Christmas... so if they actually want to sell B-M in quantity, there should be reviews available by now, so awareness can spread to potential buyers, and buyers have time to make a buying decision. And, yes, I'm impatient!
I don't know about Broadwell but PL3 doesn't seem to work like that for Haswell.
If PL3 power is above PL2 it does nothing, instead PL2 limits. If it's set below PL2 then it can spike to max turbo for about 1ms then average out and will do this continually to provide an average power between it and PL2 in relation to the duty cycle. Unlike the time settings for PL1/2, PL3 seems to be based on how long it averages it's duty cycles. For instance setting 50ms with 10% duty will see a spike every 50ms while setting a time for PL1/2 will let PL1/2 run for that time if power limit has been reached which IIRC works out to a max of ~42 days if continually running at the PL level unless Intel program a max window time, i.e. is not set to zero.
Of course it could be that the Power Limiting in Haswell is broken and Broadwell might be different.
And yes, TDP is a specification, not the maximum power the processor can use. I think Intel include that in all their datasheets. If power limits are unlocked / programmable for the device then it can be set well above TDP to ensure no power throttling providing everything else is kept in check.
The tests were carried out on my laptop i7-4700MQWhat device are you talking about ??
Depends on the manufacturer. If the power delivery can be sustained and thermals can be kept in check then indefinitely. However core-m is for low power devices so you should expect some throttling to keep power levels efficient.So what's the conclusion on the "for how long can Broadwell-M turbo" debate? Are we talking milliseconds, seconds or minutes on a standard Ultrabook?
The tests were carried out on my laptop i7-4700MQ
Maybe some pics will help explain a little better using some old software which measures multiplier / load usage in real time. For the examples sampling has been set to max of 1ms. LFM to HFM in green, turbo in yellow.
LFM 8x
HFM 24x
Turbo 4 cores 34x
Turbo 1 core 36x
PL1 48W long time
PL2 50W long time
Constant work load on all logical processors.
PL3 40W ~50ms 10% duty cycle. Note the turbo spike to maximum multiplier, the power for this short time is in excess of PL2 however PL3 will try to keep the average energy between PL3 (40W) and PL2 (50W) in accordance with the duty cycle. For 10% duty cycle it will be near 40J while for 50% duty cycle near 45J.
PL3 40W ~50ms 80% duty cycle. I've cut this to the bottom logical processor so the pic's don't take up to much space.
PL3 40W ~50ms 90% duty cycle.
After a while PL2 kicks in while running 90% duty cycle.
Running with PL3 disabled
Setting PL3 higher than PL2, PL3 60W ~50ms 50% duty cycle. No different to PL3 being disabled as power is clamped by PL2.
Depends on the manufacturer. If the power delivery can be sustained and thermals can be kept in check then indefinitely. However core-m is for low power devices so you should expect some throttling to keep power levels efficient.
Depends on the manufacturer. If the power delivery can be sustained and thermals can be kept in check then indefinitely. However core-m is for low power devices so you should expect some throttling to keep power levels efficient.
Again the point is that GB isn't trying to get the best algo/implementation for each of the test. And because of one of the subtests not being fully tuned for your preferred architecture, should all of GB be dismissed? I'm not trying to say GB is the best benchmark or has no issue, just that you are likely too radical in your conclusionDigging around a bit I was able to find this - http://sourceforge.net/p/math-atlas...7-669A-4488-9972-6326853C00E9@bigpond.net.au/ - which implies that for DGEMM at least the Geekbench results for A15 are about 1/5 of what's possible. So A15 should be 5x faster while Haswell should be 10x faster... yet more reason to pay zero attention to Geekbench for cross-platform results.
If you really think software developers optimize code, you can't be more wrong. The vast majority of devs either don't have time for that (all what matters is time to market) or simply don't even know what a cache or a pipeline is or are using languages that hide low-level details required to really tune things. And again that's not the point of GB...Basically all that Geekbench tells us on cross-platform results is what would happen if software using some of these algorithms was written, tossed into a compiler, and released without regard for performance... While there's no question that such happens on occasion, what's the frequency of such occurring on non-trivial programs?
Yes, I know. But I'm looking for some ballpark figures on a typical Ultrabook design. Are we talking milliseconds, seconds, or minutes that Broadwell-M can turbo to max frequency?
From the results on HSW, PL3 is great for some application that works periodically for a millisecond or so before sleeping as it has a chance of the maximum turbo above the PL2 power limit. Anything that runs longer is still constrained to an average power up to the PL2 limit. Still 1ms can be a few million clocks. Maybe I should have also done some tests at fixed voltage to see if the latency in voltage stepping has some relevance. The IVR seems to do a nice job of control though.But, if you only bench a single application for only a couple of hundreds of seconds, like cinebench, then it could sustain PL2 or even PL3 for the entire benchmark run.
Let's see we have the same socket, same core count and same TDP... hopefully clockspeeds are much higher?14nm's delay seems to be affecting Broadwell-E, which will enter mass production in Q1'16, apparently.
http://chinese.vr-zone.com/131468/i...ass-production-at-2016-first-quater-10212014/
14nm's delay seems to be affecting Broadwell-E, which will enter mass production in Q1'16, apparently.
http://chinese.vr-zone.com/131468/i...ass-production-at-2016-first-quater-10212014/
They don't really have a choice, with 14nm not meeting yield expectations.Makes sense. Intel was bragging about how amazing 22nm yields were on the last earnings call. They're going to milk 22nm for as long as possible.
From the results on HSW, PL3 is great for some application that works periodically for a millisecond or so before sleeping as it has a chance of the maximum turbo above the PL2 power limit. Anything that runs longer is still constrained to an average power up to the PL2 limit. Still 1ms can be a few million clocks. Maybe I should have also done some tests at fixed voltage to see if the latency in voltage stepping has some relevance. The IVR seems to do a nice job of control though.
@Fjodor2001, not much info at this time. You might find this Yoga 3 Pro with 5Y70 review interesting. Some throttling and performance less than Intel's demo. Even has an internal cooling fan, pictured in the article.
Do be aware though that some forms of throttling may be due to external components rather than the CPU.
http://www.ultrabookreview.com/5486-lenovo-yoga-3-pro-review/
The throttling is insane. However, it looks like lenovo has set power to 3.5W.
There is the 12W PL3 spike.
Its bad but also kinda insane that prime 95 can be run (this is an older version without AVX) at 1.1 Ghz on a 2C/4T system with a core power < 2W. Given the uncore and dram power it looks like very little power is available to the cores or igp. Expect much better performance with a higher TDP.
I have to find myself partially agreeing with Charlie D. from SemiAccurate.
The push towards thin and light I personally think is good, but the Yoga 3 Pro shows what happens when you go way, way too far: compromised performance.
This is unacceptable.
In terms of cross-platform benchmarks? None are necessarily that good. I would say that SPECCPU is better than Geekbench for the fact that it's an industry standard developed by a non-profit rather than a for-profit endeavor that specifically claims to be a 'cross-platform processor benchmark'. Sorry, but if Geekbench wants to be such they need to ensure that their software is actually measuring the same effective performance across all platforms, not just throwing the same code at a compiler for each platform.Again the point is that GB isn't trying to get the best algo/implementation for each of the test. And because of one of the subtests not being fully tuned for your preferred architecture, should all of GB be dismissed? I'm not trying to say GB is the best benchmark or has no issue, just that you are likely too radical in your conclusion
According to you is SPECCPU a fair benchmark? And what is a good benchmark for you? What other benchmark exists that has a large DB of results and for which compilers were not overtuned?
If you really think software developers optimize code, you can't be more wrong. The vast majority of devs either don't have time for that (all what matters is time to market) or simply don't even know what a cache or a pipeline is or are using languages that hide low-level details required to really tune things. And again that's not the point of GB...
Makes sense. Intel was bragging about how amazing 22nm yields were on the last earnings call. They're going to milk 22nm for as long as possible.
They don't really have a choice, with 14nm not meeting yield expectations.