Intel Broadwell Thread

witeken · Oct 20, 2014

True, I didn't realize that, thanks.

Khato · Oct 20, 2014

Nothingness said:
I can guarantee the code for DGEMM and SGEMM on ARM is just plain horrible and could also be made much faster. But the aim of Geekbench is not to get the absolute fastest code for a given subtest, so IMHO it wouldn't make sense to overtune one of the benchmarks.

Digging around a bit I was able to find this - http://sourceforge.net/p/math-atlas...7-669A-4488-9972-6326853C00E9@bigpond.net.au/ - which implies that for DGEMM at least the Geekbench results for A15 are about 1/5 of what's possible. So A15 should be 5x faster while Haswell should be 10x faster... yet more reason to pay zero attention to Geekbench for cross-platform results.

Basically all that Geekbench tells us on cross-platform results is what would happen if software using some of these algorithms was written, tossed into a compiler, and released without regard for performance... While there's no question that such happens on occasion, what's the frequency of such occurring on non-trivial programs?

Dufus · Oct 20, 2014

AtenRa said:
I didn't make any predictions about performance. I made an analysis of how PL3 could work using Intels available data and how that aluminum backplate can play a role in the device performance.

I don't know about Broadwell but PL3 doesn't seem to work like that for Haswell.

If PL3 power is above PL2 it does nothing, instead PL2 limits. If it's set below PL2 then it can spike to max turbo for about 1ms then average out and will do this continually to provide an average power between it and PL2 in relation to the duty cycle. Unlike the time settings for PL1/2, PL3 seems to be based on how long it averages it's duty cycles. For instance setting 50ms with 10% duty will see a spike every 50ms while setting a time for PL1/2 will let PL1/2 run for that time if power limit has been reached which IIRC works out to a max of ~42 days if continually running at the PL level unless Intel program a max window time, i.e. is not set to zero.

Of course it could be that the Power Limiting in Haswell is broken and Broadwell might be different.

And yes, TDP is a specification, not the maximum power the processor can use. I think Intel include that in all their datasheets. If power limits are unlocked / programmable for the device then it can be set well above TDP to ensure no power throttling providing everything else is kept in check.

Roland00Address · Oct 20, 2014

AtenRa said:
I didn't make any predictions about performance. I made an analysis of how PL3 could work using Intels available data and how that aluminum backplate can play a role in the device performance.

I never said you were making predictions about performance.

I was talking about me, I am not making any predictions on performance until actual devices for the exact reasons you mentioned. Too many moving balls in the air, too many unknowns, we can't solve for the answer.

And we will see real world reviews from trusted sources in the next month so I will be patient.

Fjodor2001 · Oct 20, 2014

So what's the conclusion on the "for how long can Broadwell-M turbo" debate? Are we talking milliseconds, seconds or minutes on a standard Ultrabook?

Also, isn't it about time we see actual devices becoming available at least for review by now? We're only 2 months away from Christmas... so if they actually want to sell B-M in quantity, there should be reviews available by now, so awareness can spread to potential buyers, and buyers have time to make a buying decision. And, yes, I'm impatient!

witeken · Oct 20, 2014

Fjodor2001 said:
So what's the conclusion on the "for how long can Broadwell-M turbo" debate? Are we talking milliseconds, seconds or minutes on a standard Ultrabook?

Also, isn't it about time we see actual devices becoming available at least for review by now? We're only 2 away months from Christmas... so if they actually want to sell B-M in quantity, there should be reviews available by now, so awareness can spread to potential buyers, and buyers have time to make a buying decision. And, yes, I'm impatient!

http://forums.anandtech.com/showpost.php?p=36825070&postcount=730

This post should give you some ideas about its clock speed.

AtenRa · Oct 20, 2014

Roland00Address said:
I never said you were making predictions about performance.

I was talking about me, I am not making any predictions on performance until actual devices for the exact reasons you mentioned. Too many moving balls in the air, too many unknowns, we can't solve for the answer.

And we will see real world reviews from trusted sources in the next month so I will be patient.

Ahh ok though you were talking about me. Yes I'm too waiting for reviews and retail devices to see Broadwell and then make any conclusions.

AtenRa · Oct 20, 2014

Fjodor2001 said:
So what's the conclusion on the "for how long can Broadwell-M turbo" debate? Are we talking milliseconds, seconds or minutes on a standard Ultrabook?

Also, isn't it about time we see actual devices becoming available at least for review by now? We're only 2 away months from Christmas... so if they actually want to sell B-M in quantity, there should be reviews available by now, so awareness can spread to potential buyers, and buyers have time to make a buying decision. And, yes, I'm impatient!

Im not expecting a lot of volume this year, but retail products may appear in October/November just to make it a 2014 product.

AtenRa · Oct 20, 2014

Dufus said:
I don't know about Broadwell but PL3 doesn't seem to work like that for Haswell.

If PL3 power is above PL2 it does nothing, instead PL2 limits. If it's set below PL2 then it can spike to max turbo for about 1ms then average out and will do this continually to provide an average power between it and PL2 in relation to the duty cycle. Unlike the time settings for PL1/2, PL3 seems to be based on how long it averages it's duty cycles. For instance setting 50ms with 10% duty will see a spike every 50ms while setting a time for PL1/2 will let PL1/2 run for that time if power limit has been reached which IIRC works out to a max of ~42 days if continually running at the PL level unless Intel program a max window time, i.e. is not set to zero.

Of course it could be that the Power Limiting in Haswell is broken and Broadwell might be different.

And yes, TDP is a specification, not the maximum power the processor can use. I think Intel include that in all their datasheets. If power limits are unlocked / programmable for the device then it can be set well above TDP to ensure no power throttling providing everything else is kept in check.

What device are you talking about ??

Dufus · Oct 21, 2014

AtenRa said:
What device are you talking about ??

The tests were carried out on my laptop i7-4700MQ

Maybe some pics will help explain a little better using some old software which measures multiplier / load usage in real time. For the examples sampling has been set to max of 1ms. LFM to HFM in green, turbo in yellow.

LFM 8x
HFM 24x
Turbo 4 cores 34x
Turbo 1 core 36x

PL1 48W long time
PL2 50W long time
Constant work load on all logical processors.

PL3 40W ~50ms 10% duty cycle. Note the turbo spike to maximum multiplier, the power for this short time is in excess of PL2 however PL3 will try to keep the average energy between PL3 (40W) and PL2 (50W) in accordance with the duty cycle. For 10% duty cycle it will be near 40J while for 50% duty cycle near 45J.

PL3 40W ~50ms 80% duty cycle. I've cut this to the bottom logical processor so the pic's don't take up to much space.

PL3 40W ~50ms 90% duty cycle.

After a while PL2 kicks in while running 90% duty cycle.

Running with PL3 disabled

Setting PL3 higher than PL2, PL3 60W ~50ms 50% duty cycle. No different to PL3 being disabled as power is clamped by PL2.

Fjodor2001 said:
So what's the conclusion on the "for how long can Broadwell-M turbo" debate? Are we talking milliseconds, seconds or minutes on a standard Ultrabook?

Depends on the manufacturer. If the power delivery can be sustained and thermals can be kept in check then indefinitely. However core-m is for low power devices so you should expect some throttling to keep power levels efficient.

AtenRa · Oct 21, 2014

Dufus said:
The tests were carried out on my laptop i7-4700MQ

Maybe some pics will help explain a little better using some old software which measures multiplier / load usage in real time. For the examples sampling has been set to max of 1ms. LFM to HFM in green, turbo in yellow.

LFM 8x
HFM 24x
Turbo 4 cores 34x
Turbo 1 core 36x

PL1 48W long time
PL2 50W long time
Constant work load on all logical processors.

PL3 40W ~50ms 10% duty cycle. Note the turbo spike to maximum multiplier, the power for this short time is in excess of PL2 however PL3 will try to keep the average energy between PL3 (40W) and PL2 (50W) in accordance with the duty cycle. For 10% duty cycle it will be near 40J while for 50% duty cycle near 45J.

PL3 40W ~50ms 80% duty cycle. I've cut this to the bottom logical processor so the pic's don't take up to much space.

PL3 40W ~50ms 90% duty cycle.

After a while PL2 kicks in while running 90% duty cycle.

Running with PL3 disabled

Setting PL3 higher than PL2, PL3 60W ~50ms 50% duty cycle. No different to PL3 being disabled as power is clamped by PL2.

Depends on the manufacturer. If the power delivery can be sustained and thermals can be kept in check then indefinitely. However core-m is for low power devices so you should expect some throttling to keep power levels efficient.

Very nice thank you. That clearly illustrates what im saying about PL3 being able to work for long periods of time when you have adequate cooling solution.
Also, it does show that this Laptop is designed for continuous PL2 usage. That is understandable because higher frequency will increase power exponentially. With CPUs made for low power devices you want the highest perf/watt that can be sustained the longer. That is bellow the maximum frequency of the CPU and thus they design the heat-sink and the device to have the best average frequency at PL2 state.

Increasing the heat-sink power capacity like the Tablet at TT article, allows the device(Tablet) to work at PL2 or PL3 state for longer periods of time. For example the particular Tablet could work for longer consecutive periods at PL2 and thus increase its performance but lower its battery life.
But, if you only bench a single application for only a couple of hundreds of seconds, like cinebench, then it could sustain PL2 or even PL3 for the entire benchmark run.

Fjodor2001 · Oct 21, 2014

Dufus said:
Depends on the manufacturer. If the power delivery can be sustained and thermals can be kept in check then indefinitely. However core-m is for low power devices so you should expect some throttling to keep power levels efficient.

Yes, I know. But I'm looking for some ballpark figures on a typical Ultrabook design. Are we talking milliseconds, seconds, or minutes that Broadwell-M can turbo to max frequency?

Nothingness · Oct 21, 2014

Khato said:
Digging around a bit I was able to find this - http://sourceforge.net/p/math-atlas...7-669A-4488-9972-6326853C00E9@bigpond.net.au/ - which implies that for DGEMM at least the Geekbench results for A15 are about 1/5 of what's possible. So A15 should be 5x faster while Haswell should be 10x faster... yet more reason to pay zero attention to Geekbench for cross-platform results.

Again the point is that GB isn't trying to get the best algo/implementation for each of the test. And because of one of the subtests not being fully tuned for your preferred architecture, should all of GB be dismissed? I'm not trying to say GB is the best benchmark or has no issue, just that you are likely too radical in your conclusion

According to you is SPECCPU a fair benchmark? And what is a good benchmark for you? What other benchmark exists that has a large DB of results and for which compilers were not overtuned?

Basically all that Geekbench tells us on cross-platform results is what would happen if software using some of these algorithms was written, tossed into a compiler, and released without regard for performance... While there's no question that such happens on occasion, what's the frequency of such occurring on non-trivial programs?

If you really think software developers optimize code, you can't be more wrong. The vast majority of devs either don't have time for that (all what matters is time to market) or simply don't even know what a cache or a pipeline is or are using languages that hide low-level details required to really tune things. And again that's not the point of GB...

AtenRa · Oct 21, 2014

Fjodor2001 said:
Yes, I know. But I'm looking for some ballpark figures on a typical Ultrabook design. Are we talking milliseconds, seconds, or minutes that Broadwell-M can turbo to max frequency?

According to Intel, up to 10ms to PL3 or 100ms to PL2.

Spikes to highest frequency for 10ms then goes to PL2 then falls to PL1 and then rise to PL3 again etc.

Dufus · Oct 21, 2014

AtenRa said:
But, if you only bench a single application for only a couple of hundreds of seconds, like cinebench, then it could sustain PL2 or even PL3 for the entire benchmark run.

From the results on HSW, PL3 is great for some application that works periodically for a millisecond or so before sleeping as it has a chance of the maximum turbo above the PL2 power limit. Anything that runs longer is still constrained to an average power up to the PL2 limit. Still 1ms can be a few million clocks. Maybe I should have also done some tests at fixed voltage to see if the latency in voltage stepping has some relevance. The IVR seems to do a nice job of control though.

@Fjodor2001, not much info at this time. You might find this Yoga 3 Pro with 5Y70 review interesting. Some throttling and performance less than Intel's demo. Even has an internal cooling fan, pictured in the article.

Do be aware though that some forms of throttling may be due to external components rather than the CPU.

http://www.ultrabookreview.com/5486-lenovo-yoga-3-pro-review/

witeken · Oct 21, 2014

14nm's delay seems to be affecting Broadwell-E, which will enter mass production in Q1'16, apparently.

http://chinese.vr-zone.com/131468/i...ass-production-at-2016-first-quater-10212014/

SAAA · Oct 21, 2014

witeken said:
14nm's delay seems to be affecting Broadwell-E, which will enter mass production in Q1'16, apparently.

http://chinese.vr-zone.com/131468/i...ass-production-at-2016-first-quater-10212014/

Let's see we have the same socket, same core count and same TDP... hopefully clockspeeds are much higher?
At least they didn't reduce power as with ivy-sandy -E versions so any improvements is for performance.
Also considered the die size reduction maybe we can get an 8 core for less than 1000 this time? (it could be the 5930K successor)

Arachnotronic · Oct 21, 2014

witeken said:
14nm's delay seems to be affecting Broadwell-E, which will enter mass production in Q1'16, apparently.

http://chinese.vr-zone.com/131468/i...ass-production-at-2016-first-quater-10212014/

Makes sense. Intel was bragging about how amazing 22nm yields were on the last earnings call. They're going to milk 22nm for as long as possible.

III-V · Oct 21, 2014

Intel17 said:
Makes sense. Intel was bragging about how amazing 22nm yields were on the last earnings call. They're going to milk 22nm for as long as possible.

They don't really have a choice, with 14nm not meeting yield expectations.

Enigmoid · Oct 21, 2014

Dufus said:
From the results on HSW, PL3 is great for some application that works periodically for a millisecond or so before sleeping as it has a chance of the maximum turbo above the PL2 power limit. Anything that runs longer is still constrained to an average power up to the PL2 limit. Still 1ms can be a few million clocks. Maybe I should have also done some tests at fixed voltage to see if the latency in voltage stepping has some relevance. The IVR seems to do a nice job of control though.

@Fjodor2001, not much info at this time. You might find this Yoga 3 Pro with 5Y70 review interesting. Some throttling and performance less than Intel's demo. Even has an internal cooling fan, pictured in the article.

Do be aware though that some forms of throttling may be due to external components rather than the CPU.

http://www.ultrabookreview.com/5486-lenovo-yoga-3-pro-review/

Interesting.

The throttling is insane. However, it looks like lenovo has set power to 3.5W.

There is the 12W PL3 spike.

Its bad but also kinda insane that prime 95 can be run (this is an older version without AVX) at 1.1 Ghz on a 2C/4T system with a core power < 2W. Given the uncore and dram power it looks like very little power is available to the cores or igp. Expect much better performance with a higher TDP.

Roland00Address · Oct 21, 2014

Enigmoid said:
The throttling is insane. However, it looks like lenovo has set power to 3.5W.

There is the 12W PL3 spike.

Its bad but also kinda insane that prime 95 can be run (this is an older version without AVX) at 1.1 Ghz on a 2C/4T system with a core power < 2W. Given the uncore and dram power it looks like very little power is available to the cores or igp. Expect much better performance with a higher TDP.

3.5W *shakes fist*

Arachnotronic · Oct 21, 2014

I have to find myself partially agreeing with Charlie D. from SemiAccurate.

The push towards thin and light I personally think is good, but the Yoga 3 Pro shows what happens when you go way, way too far: compromised performance.

This is unacceptable.

dahorns · Oct 21, 2014

Intel17 said:
I have to find myself partially agreeing with Charlie D. from SemiAccurate.

The push towards thin and light I personally think is good, but the Yoga 3 Pro shows what happens when you go way, way too far: compromised performance.

This is unacceptable.

It does seem like there is still thermal headroom available. From what I have read, it doesn't get anywhere near as hot as say my Surface Pro 3. Hopefully they can increase performance with fixes, either through firmware updates or changes to the final production models. So far, the only reviews/benchmarks I have seen I have been from journalist samples.

Khato · Oct 21, 2014

Nothingness said:
Again the point is that GB isn't trying to get the best algo/implementation for each of the test. And because of one of the subtests not being fully tuned for your preferred architecture, should all of GB be dismissed? I'm not trying to say GB is the best benchmark or has no issue, just that you are likely too radical in your conclusion

According to you is SPECCPU a fair benchmark? And what is a good benchmark for you? What other benchmark exists that has a large DB of results and for which compilers were not overtuned?

In terms of cross-platform benchmarks? None are necessarily that good. I would say that SPECCPU is better than Geekbench for the fact that it's an industry standard developed by a non-profit rather than a for-profit endeavor that specifically claims to be a 'cross-platform processor benchmark'. Sorry, but if Geekbench wants to be such they need to ensure that their software is actually measuring the same effective performance across all platforms, not just throwing the same code at a compiler for each platform.

Here's another way of explaining my problem with Geekbench - instead of measuring the performance of javascript runtime on each platform like sunspider et all they're measuring compiler optimizations for their input algorithms. It's the exact same problem. And yet so many want to pretend that Geekbench results can be freely compared across platforms while dismissing javascript benchmarks.

Nothingness said:
If you really think software developers optimize code, you can't be more wrong. The vast majority of devs either don't have time for that (all what matters is time to market) or simply don't even know what a cache or a pipeline is or are using languages that hide low-level details required to really tune things. And again that's not the point of GB...

Developers of apps? Almost certainly not. There's no point in optimizing trivial workloads. But soon as you start talking about programs which are actually performance sensitive? Uhhhh, yeah. Pretty sure that internet browsers are optimized, and I bet most of the major compression/decompression programs are, then things like cinebench, video transcoding, game engines... and the list goes on and on. Sure they're not going to reach the same level of optimization as in research/HPC, but they'll usually break 50% of theoretical without too much trouble.

But anyway, you're quite correct in that such is not the purpose of Geekbench. As it says, it's supposed to be a 'cross platform processor benchmark' and hence it wouldn't matter if it was reporting 10% of theoretical performance across the board... But it is a problem when it reports 20% of theoretical performance for ARM and 10% for x86 on one benchmark, then some completely different percentage on another... so on so forth. At that point you're not measuring 'cross platform processor' performance, you're measuring cross platform compiler differences for your specific code.

witeken · Oct 21, 2014

Intel17 said:
Makes sense. Intel was bragging about how amazing 22nm yields were on the last earnings call. They're going to milk 22nm for as long as possible.

III-V said:
They don't really have a choice, with 14nm not meeting yield expectations.

No, it should not make any sense. 14nm has a vastly superior cost per transistor, which is not at all compensated by the extra transistors used for Broadwell's 5% IPC increase.

Here's a disturbing thought: even if 14's price/transistor were only 1.4x smaller, this would mean that Intel would break-even at 1.4x lower yields.

Intel Broadwell Thread

Diamond Member

Golden Member

Senior member

Platinum Member

Diamond Member

Diamond Member

Lifer

Lifer

Lifer

Senior member

Lifer

Diamond Member

Platinum Member

Lifer

Senior member

Diamond Member

Senior member

Lifer

Senior member

Platinum Member

Platinum Member

Lifer

Senior member

Golden Member

Diamond Member