Core M v.s. A8X in Geekbench 3

jfpoole · Oct 24, 2014

Intel17 said:
Gotcha. Which subtests (other than obviously crypto) would AArch32 show a significant improvement over ARMv7? Would it be the same floating point tests that show benefit in AArch64?

Only the AES and SHA-1 workloads show an improvement on AArch32 (thanks to the ARMv8 cryptography instructions). AArch32 doesn't include the extra registers or the double-precision NEON instructions that are part of AArch64.

Arachnotronic · Oct 24, 2014

jfpoole said:
Only the AES and SHA-1 workloads show an improvement on AArch32 (thanks to the ARMv8 cryptography instructions). AArch32 doesn't include the extra registers or the double-precision NEON instructions that are part of AArch64.

Thank you.

Nothingness · Oct 25, 2014

Enigmoid said:
Why does GB 3 gain so much on ARM 64 bit?

AArch64 brings IEEE754-compliant SIMD instructions, which helps some FP benchmarks.

IntelUser2000 · Oct 27, 2014

III-V said:
I'm not at all worried about Core M's performance. The problem seems strictly limited to the Yoga 3, despite having anything else with Core M to compare it to. If the manufacturer set the desired TDP at 3.5W, and the chip is indented to run at a desired TDP of 4.5W... well, there's not really much more that needs to be said. It's a 30% difference in TDP, despite being a single watt.

It's a chip that requires fan in a 13-inch form factor and costs in a device $1299 to do so. We're really stretching things into Core M's favor.

But the transition to FinFET with likely a new uArch will be an interesting comparison point compared to Skylake-Y, though I would expect Skylake to be quite a nice product.

I think what's going to happen is Intel is going to hype again using pretuned systems that aren't realistic of real world products and disappoint in reality.

We have a very real chance of ARM guys, not just Apple matching Intel's Core line of chips in the Skylake generation, if they haven't done that already with the A8X/TK1. Maybe all the way up to the U lines. Because its really isn't far off. Winning starts with synthetic first.

Really at this point it makes all the point about "x86 vs ARM ISA" and "Intel 14nm vs Competitors 14nm" really useless. What's the point of showing it on paper if the advantage isn't there? At best, the super pricey 5Y70 performs about 30% better than ARM competition, because the A8X is about on par with 5Y10. But being priced 10x more chip price does allow you to do wonderful things so it makes you wonder what Intel can pull out at 1x the pricing.

Accord99 · Oct 27, 2014

IntelUser2000 said:
Because its really isn't far off. Winning starts with synthetic first.

Only if both sides are equally optimized.

IntelUser2000 · Oct 27, 2014

Accord99 said:
Only if both sides are equally optimized.

Oh boy.

I think if Intel wants a device that really makes non-Apple vendors like Samsung and LG take second look at using it for their Tablets and Smartphones, they needed Core M 5Y10 performance in their Cherry Trail line, not in March of next year, but right now. Because that's where A8X and Tegra K1 is at.

Yet its questionable whether they can do it on a chip that costs 10x more than the competition.

Technical-wise, it won't give them much trouble, because they lived it through back in Netburst days. I was in denial those days, but its not much different now. That's beside the point.

Nothingness · Oct 27, 2014

jfpoole said:
iPhone 6 32-bit result:
http://browser.primatelabs.com/geekbench3/1073585

Interesting, thanks.

http://browser.primatelabs.com/geekbench3/compare/1114595?baseline=1073585

This shows a few things:

There's no AES/SHA1 support in 32-bit mode. Is this a hardware limitation, a tool limitation or a Geekbench limitation? I would expect HW support. Removing AES/SHA1 from scores gives a one core integer score of about 1570 for 64-bit and 1439 for 32-bit.
For integer, 64-bit doesn't change a lot of things for integer code. Two outliers are Dijkstra which slow downs (I guess this is due to heavy cache traffic; pointer chasing due to the algorithm used?); Bzip2 decompress is significantly faster, any hint why this is so?
I'm surprised stream copy is significantly faster in 64-bit mode. gcc optimizes away (standard) stream copy loop to memcpy, so I'd expect optimized implementation to be run for both 32- and 64-bit code, and I'd expect clang to do the same. Can't 32-bit code saturate the memory controller or is there something else slowing it down specific to Geekbench stream copy implementation?

Again thanks for posting information about Geekbench

III-V · Oct 27, 2014

IntelUser2000 said:
It's a chip that requires fan in a 13-inch form factor and costs in a device $1299 to do so. We're really stretching things into Core M's favor.

The Yoga 3 is DOA. That doesn't mean Core M is DOA. The doom and gloom you're spouting over a sample size of one is ridiculous.

Lepton87 · Oct 27, 2014

IntelUser2000 said:
Technical-wise, it won't give them much trouble, because they lived it through back in Netburst days. I was in denial those days, but its not much different now. That's beside the point.

In denial? Did you believe that their than current CPUs like the first incarnation(Willamette) outmatched AMD's counterparts and their own predecessors or was it less radical and you simply believed that high clock-speed achieved through very deep pipeline was the right approach and later iterations would improve tremendously on the idea. To be fair I think it was quite a feat what they have done with a Northwood to a Prescott transition. Yes, it was a massive failure like BD, better process node and yet higher power draw. But, that's not my point. I'm impressed that they have lengthen the pipeline from 20 stages to 31-stage pipeline and they hardly lost any IPC, unfortunately the clock-speeds didn't increase as much as the pipeline length. Northwood topped out at 3.4 GHz if Prescott could actually clock significantly higher as was imagined back then it would be a success. Prescott at 4.5GHz would be what intel supporters imagined as Northwood successor which was pretty successful, Northwood C was actually the best CPU at the time better than K7. It took K8 to surpass it, while AMD improved upon their predecessor a lot Intel didn't but up until the release of K8 the CPUs were surprisingly competitive. Even K8 didn't manage to entirely dominate P4 because of its HT. Pentium D vs Athlon X2 gave a clear winner but like before it wasn't entirely dominated by Athlon there were benchmarks were Pentium D was faster and the price... I remember that X2s were very expensive like HW-E is now. I had X2 3800+ but from what I remember Pentium D was a hell of a lot cheaper and actually a better value and AMD's X2s were a premium products, it's hard to imagine AMD in that role now, even in graphics they threw in the towel and don't compete with Maxwell. Although r290 is a tremendous value after price cuts especially two of them, I would definately chose 2x290 over a GTX980 which is very overpriced even compared to NV's own cards but that's the way it's always been with NV. Remember GTX280 and then radeon 4870 which offered similar performance for half the price? Well, we don't need to go that far back in time, Titan is a great recent example of NV fleecing the customers of their premium products.

Arachnotronic · Oct 27, 2014

III-V said:
The Yoga 3 is DOA. That doesn't mean Core M is DOA. The doom and gloom you're spouting over a sample size of one is ridiculous.

Eh, I don't think Core M will be all that impressive in real life. It doesn't integrate many of the SoC functions needed to be a premium tablet chip, and has that darn 32nm PCH sitting there. Not only does Intel lose efficiency by the mere fact that it's built on 32nm, but I'd imagine there is a power overhead that comes from the communication between PCH and CPU/GPU complex.

Intel won't even be remedying this with Skylake, which will also have a separate on-package PCH (Source: http://www.cpu-world.com/news_2014/2014062601_More_details_on_Skylake_processors.html). I hope it's at least 22-nanometer.

Also, it's pretty easy to dismiss a single result as non-representative, but I think as more Core M systems come out, they'll simply reaffirm the Yoga 3 Pro results.

liahos1 · Oct 27, 2014

maybe we should wait for the ASUS Core M products to come out. they seem to more closely ape the Intel FFRD for core m?

Arachnotronic · Oct 27, 2014

liahos1 said:
maybe we should wait for the ASUS Core M products to come out. they seem the more closely ape the Intel FFRD for core m?

Maybe so.

I'm really just interested in a Broadwell-based rMBA, though with a 15W "U" processor.

liahos1 · Oct 27, 2014

still its an interesting conundrum. intel puts out a FFRD that should perform in a certain way. OEMs then put out different designs to differentiate and hit certain parts of the market - you'd think a company with Lenovo's resources would be able to build something that did not show degradation in performance.

Lepton87 · Oct 27, 2014

Intel17 said:
Eh, I don't think Core M will be all that impressive in real life. It doesn't integrate many of the SoC functions needed to be a premium tablet chip, and has that darn 32nm PCH sitting there. Not only does Intel lose efficiency by the mere fact that it's built on 32nm, but I'd imagine there is a power overhead that comes from the communication between PCH and CPU/GPU complex.

Intel won't even be remedying this with Skylake, which will also have a separate on-package PCH (Source: http://www.cpu-world.com/news_2014/2014062601_More_details_on_Skylake_processors.html). I hope it's at least 22-nanometer.

Also, it's pretty easy to dismiss a single result as non-representative, but I think as more Core M systems come out, they'll simply reaffirm the Yoga 3 Pro results.

That's a shocker, I just assumed that because of the mere fact that it's a tablet chip it would have integrated PCH. I was quite excited about Core M because of it being built on a special power optimised kick-ass 14nm process etc. And now I learn that they ruined it by giving it an off die PCH that's built on a comparatively power hungry node. Is it at least a power optimised variant of its 32nm process? I hope it doesn't completely ruin the CPU like the acient platform ruined the early ATOM because its chipset consumed much more power than the CPU itself.

Arachnotronic · Oct 27, 2014

Lepton87 said:
That's a shocker, I just assumed that because of the mere fact that it's a tablet chip it would have integrated PCH. I was quite excited about Core M because of it being built on a special power optimised kick-ass 14nm process etc. And now I learn that they ruined it by giving it an off die PCH that's built on a comparatively power hungry node. Is it at least a power optimised variant of its 32nm process? I hope it doesn't completely ruin the CPU like the acient platform ruined the early ATOM because its chipset consumed much more power than the CPU itself.

witeken · Oct 27, 2014

Sure, let's all jump on the fear and doubt bandwagon and pretend that the 32nm PCH is going to make Core M dead on arrival, that the 14nm process advantage doesn't exist, that Intel doesn't have a world leading CPU architecture, that Intel doesn't have a mobile-focused GPU architecture with a lot of GFLOPS.

It worries me a bit that Lenovo put a fan in the Pro 3, but I'm going to wait to see how other products perform.

III-V · Oct 28, 2014

Intel17 said:
Eh, I don't think Core M will be all that impressive in real life. It doesn't integrate many of the SoC functions needed to be a premium tablet chip, and has that darn 32nm PCH sitting there. Not only does Intel lose efficiency by the mere fact that it's built on 32nm, but I'd imagine there is a power overhead that comes from the communication between PCH and CPU/GPU complex.

Intel won't even be remedying this with Skylake, which will also have a separate on-package PCH (Source: http://www.cpu-world.com/news_2014/2014062601_More_details_on_Skylake_processors.html). I hope it's at least 22-nanometer.

Also, it's pretty easy to dismiss a single result as non-representative, but I think as more Core M systems come out, they'll simply reaffirm the Yoga 3 Pro results.

Skylake's PCH might be due for a move to 22nm, but Intel's PCH roadmaps are a bit... fuzzy. Intel only just moved to 32nm with Haswell, and prior to that, they were on 65(!!!) nm.

Arachnotronic · Oct 28, 2014

III-V said:
Skylake's PCH might be due for a move to 22nm, but Intel's PCH roadmaps are a bit... fuzzy. Intel only just moved to 32nm with Haswell, and prior to that, they were on 65(!!!) nm.

PCH on 22nm + 22nm eDRAM on the Skylake ULT models should be nice fab fillers.

witeken · Oct 28, 2014

III-V said:
Skylake's PCH might be due for a move to 22nm, but Intel's PCH roadmaps are a bit... fuzzy. Intel only just moved to 32nm with Haswell, and prior to that, they were on 65(!!!) nm.

They obviously move 2 nodes down at a time. Cannonlake will move to 14nm and Core M will probably be integrated.

III-V · Oct 28, 2014

Intel17 said:
PCH on 22nm + 22nm eDRAM on the Skylake ULT models should be nice fab fillers.

Definitely. I don't think they really need to go to 22nm though. It'd be real nice for U-series parts, but Y-series parts should be using a cut-down, integrated PCH anyway.

jfpoole · Oct 28, 2014

Nothingness said:
Interesting, thanks.

http://browser.primatelabs.com/geekbench3/compare/1114595?baseline=1073585

This shows a few things:

There's no AES/SHA1 support in 32-bit mode. Is this a hardware limitation, a tool limitation or a Geekbench limitation? I would expect HW support. Removing AES/SHA1 from scores gives a one core integer score of about 1570 for 64-bit and 1439 for 32-bit.

For integer, 64-bit doesn't change a lot of things for integer code. Two outliers are Dijkstra which slow downs (I guess this is due to heavy cache traffic; pointer chasing due to the algorithm used?); Bzip2 decompress is significantly faster, any hint why this is so?

I'm surprised stream copy is significantly faster in 64-bit mode. gcc optimizes away (standard) stream copy loop to memcpy, so I'd expect optimized implementation to be run for both 32- and 64-bit code, and I'd expect clang to do the same. Can't 32-bit code saturate the memory controller or is there something else slowing it down specific to Geekbench stream copy implementation?

Again thanks for posting information about Geekbench

This is a toolchain limitation.
Dijkstra's data structures contain a lot of pointers, so it's a combination of increased memory transfer and increased cache pressure. I don't know why BZip2 Decompress is faster off the top of my head.
I believe Clang performs the same optimization, which suggests that 32-bit memcpy() is slower than 64-bit memcpy(). Without knowing any of the details of the memcpy() implementations it's hard to say if this is an issue with 32-bit code not being able to saturate the memory controller or if it's an issue with the 32-bit implementation of memcpy() on 64-bit hardware.

IntelUser2000 · Oct 30, 2014

Lepton87 said:
In denial? Did you believe that their than current CPUs like the first incarnation(Willamette) outmatched AMD's counterparts and their own predecessors or was it less radical and you simply believed that high clock-speed achieved through very deep pipeline was the right approach and later iterations would improve tremendously on the idea.

The latter. Well, actually the former too in the hypothetical future world where they would increase clock speeds by 600MHz every 3 quarters or so and reach 6-7GHz with Prescott.

It was when Core 2 was out and the subsequent years after that how bad Netburst was and every supporter of Intel at that time were completely misled by the marketing department, and technical department, and the managers, and well... everybody.

Good products have a CLEAR lead. They come out faster because they aren't delayed, they use slightly less power than marketed, they perform little better than they are marketed and even the products are better positioned.

There's a slide by Intel(hardware.fr used to have it I think but they seemed to have pulled it) that says that subsequent iterations of running benchmarks would reduce performance by 30%! That's a difference of getting 2.8 points in Cinebench and dropping down to 1.96!

Now I see the point of using thermal headroom to boost performance. Sandy Bridge did it well. But on a $200-300 chip that's going to run applications that's demanding and long running throttling like Core M is unacceptable. Yes you heard me right. Throttling! It's not a 2.8 point Cinebench R11.5 chip, its a 2.0 one.

liahos1 said:
maybe we should wait for the ASUS Core M products to come out.

That sucks too. It's a limitation of trying to squeeze too-hot of a chip into a system that's too-thin.

http://tweakers.net/reviews/3751/4/...processor-getest-synthetische-benchmarks.html

III-V said:
The doom and gloom you're spouting over a sample size of one is ridiculous.

Two. Look above.

witeken said:
Sure, let's all jump on the fear and doubt bandwagon and pretend that the 32nm PCH is going to make Core M dead on arrival, that the 14nm process advantage doesn't exist, that Intel doesn't have a world leading CPU architecture, that Intel doesn't have a mobile-focused GPU architecture with a lot of GFLOPS.

Since Intel's process "LEAD" is so closely correlated to co-optimizing between the process and logic design teams, they are one and the same. Intel claims 3.5 years lead in process. That's 2 generations worth.

Each generation brings 20% perf or 30% power reduction. The options are 1. 1.44x perf gain 2. 0.49x the power 3. 1.2x perf gain and 0.7x the power. Now that's not what we are seeing whether we look at Bay Trail, Core M, or Haswell U/Y parts against TK1 and A8X.

I think they used to have "3.5 years" lead when comparing against significantly underfunded teams like AMD. So their co-optimizing makes it so it has double the advantage, one in design and one in process. Against much better funded ARM designers they are probably 1 generation(or 2 years) at the most. Realistically I think its about a year lead, exactly corresponding to the timeline difference of Intel's 14nm and others 14nm, or Intel 22nm vs Competitor 20nm. That's a 10% perf gain or 15% power reduction. Not enough to overcome other factors like poor design and positioning.

Core M v.s. A8X in Geekbench 3

Member

Lifer

Platinum Member

Elite Member

Platinum Member

Elite Member

Platinum Member

Senior member

Platinum Member

Lifer

Senior member

Lifer

Senior member

Platinum Member

Lifer

Diamond Member

Senior member

Lifer

Diamond Member

Senior member

Member

Elite Member