Geekbench 3 Sandy Bridge v.s. Apple Cyclone IPC comparison

FwFred · Sep 20, 2013

Impressive IPC on Geekbench, but if we are going to start comparing vs. Core I think we need more than toy benchmarks. I'm not convinced even POWER8 would look that much better @ 1.3GHz on these.

Also, I think there is too much focus on IPC. We know much more about Core, Atom, Krait, and A15's overall design parameters. We know next to nothing about Cyclone. What frequency can it run in an iPad with how many cores? If cyclone is 1.4-1.5Ghz dual core in an iPad, I think we can stop the silly comparisons to Core and focus on Bay Trail, Snapdragon 800, Tegra 4.

Another interesting point, if Cyclone is locked at 1.3GHz the entire run, why is such a capable design team leaving so much performance on the table? If you don't have a good turbo-like capability, you are behind the curve.

seitur · Sep 20, 2013

Idontcare said:
I wouldn't mind if Apple steps in to fill the competitive void that AMD has created. Anything to keep the MPU business' feet to the fire.

I wonder how much of the A7 was Jim Keller's handiwork (in terms of project management), and if so then I wonder how much of it might bleed over into future AMD chips?

+1

Preety much this.

PS. Couple this with those rumors about Apple sniffing around fab business and we can have interesting Intel vs Apple future battle.

mikk · Sep 20, 2013

Here is a Geekbench 3 result from a 1,3 Ghz Haswell 2 cores (DDR3-2133): http://browser.primatelabs.com/geekbench3/71024

According to Geekbench 3 Haswell IPC is slightly worse than Cyclone. It's safe to say this must be pure nonsense.

Blitzvogel · Sep 20, 2013

I've noticed that Geekbench loves on the mobile SoCs.

Nothingness · Sep 20, 2013

mikk said:
Here is a Geekbench 3 result from a 1,3 Ghz Haswell 2 cores (DDR3-2133): http://browser.primatelabs.com/geekbench3/71024

According to Geekbench 3 Haswell IPC is slightly worse than Cyclone. It's safe to say this must be pure nonsense.

That or you don't get that Apple traded frequency for IPC. Is that so hard to understand and admit?

mikk · Sep 20, 2013

Nothingness said:
That or you don't get that Apple traded frequency for IPC. Is that so hard to understand and admit?

Considering that Geekbench v2 gave us flawed scores it is not hard to understand that Geekbench produces some nonsense.

Headfoot · Sep 20, 2013

Nothingness said:
That or you don't get that Apple traded frequency for IPC. Is that so hard to understand and admit?

Go look at the execution resources purportedly inside of Cyclone vs those inside of Haswell. It's literally impossible for it to have higher IPC than Haswell absent a complete misstep on Intel's part or poor software compilation/utilization of those resources. Now which of those seems more likely?

epidemis · Sep 20, 2013

Ventanni said:
I agree, but it's not about how you'll never run SB run at 1.6ghz, but a near clock-for-clock comparison of the two architectures.

Cyclone is one sick chip, but we [all] also have to remember that SB is two generations old too.
He's got a point.

'

Intel is going to lose the macbook air business soon.

Exophase · Sep 20, 2013

It would be good if people used the correct term here, perf/MHz, not IPC. The two processors aren't even running the same instructions.

Headfoot said:
Go look at the execution resources purportedly inside of Cyclone vs those inside of Haswell. It's literally impossible for it to have higher IPC than Haswell absent a complete misstep on Intel's part or poor software compilation/utilization of those resources. Now which of those seems more likely?

Execution purportedly inside of Cyclone according to whom? Apple hasn't said anything, I don't believe compiler source is revealing anything and we don't even have die shots.

There is also more to perf/MHz than execution resources and this is what Nothingness is saying. The other side of it latency and stalls. A processor designed for a lower maximum clock speed have latencies that correspond to a smaller number of clock cycles. The fastest critical path operations in a synchronous CPU design will almost always be designed to take a fixed number of cycles, regardless of clock speed.

Let's take an extreme look at this, a processor that runs at 1MHz. You could probably do it so accesses straight to main memory only took one cycle, no cache or prefetching needed. And forget about worrying about branch misprediction. Now take it even further, let's make this a 100KHz processor that's really the same 1MHz processor internally. Now it can seemingly execute 10 instructions simultaneously, regardless of dependencies. There should be little doubt that this processor would easily exceed the perf/MHz of Haswell.

That's a very extreme and unrealistic example to illustrate a point. But even going from a design target of ~4GHz to ~1.5GHz gives you a pretty big advantage in how you accommodate timings.

Magic Carpet · Sep 20, 2013

meloz said:
While 'Cyclone' is impressive enough as a mobile part, these type of comparision between such vastly different architectures designs do not really tell anything.

SNB was not designed for 1.6 GHz. It was primarily designed to operate at twice that speed. Of course when you restrict it to lower speeds and TDPs a whole lot of lesser powered architectures become competitive.

But while a SNB can always clock down to 1.6 GHz, can a 'Cyclone' clock up to 3.2 GHz? Operating at lower frequencies is easy for a CPU that is designed for higher ones, but vice versa might not be possible and -even if possible- might not yield expected linear performance gains with frequency since there might be architectural bottlenecks fundamental to the design.

Aside, you were wise to not benchmark Cyclone against Haswell running software compiled with AVX2 support. Even at 1.6 GHz Haswell would utterly destroy the Cyclone.

This.

And thanks to Intel17 for an interesting thread, I've enjoyed reading it

meloz · Sep 20, 2013

Exophase said:
It would be good if people used the correct term here, perf/MHz, not IPC. The two processors aren't even running the same instructions.

Right you are.

And even performance/MHz is irrelevant in the bigger picture. Two processors can achieve similar performance, one at 1 GHz and other at 1.5 GHz.

Which is better?

Answer is whichever consumes less power and/or costs less, depending on how much weightage you give to performance/watt and performance/$. One of the reasons Xeon continues to rule the server segment inspite of absurd profit margin Intel enjoys on each CPU is because it gives a strong performance/watt and also strong density. Sure a single Xeon might consume 130 watts at peak load but it also does a lot of work with that power.

Aside, this is also what is so funny about Intel's Baytrail: everyone is going on about whether or not Intel have done enough to match or beat the competition in performance/watt. Again, after a certain performance/watt point it does not matter: the new Atom is already looking like a failure thanks to Intel's pricing. Unless they change their attitude Intel should not even bother with Airmont, that will be an even bigger failure.

I sound like an utter sourpuss, sorry Intel17, but you have made the typical moped versus bus comparision to calculate the cost per person per mile. And in this case the poor bus is artificially constrained to half its capacity. Sure, lots of data, and all of it utterly meaningless.

FwFred · Sep 20, 2013

meloz said:
Right you are.
the new Atom is already looking like a failure thanks to Intel's pricing. Unless they change their attitude Intel should not even bother with Airmont, that will be an even bigger failure.

Where have you seen anything concrete about Baytrail pricing? All I've seen is expect Baytrail in $199 netbooks. I doubt they are charging anything exorbitant.

Did Intel say what part will be in the $99 tablets? Regardless of the part, I fail to see any evidence Intel is more expensive than Qualcomm/Nvidia/Samsung. Mediatek/Rockchip/Allwinner may be another story.

edit: dailytech says $99 Baytrail tablets

Headfoot · Sep 20, 2013

Exophase said:
It would be good if people used the correct term here, perf/MHz, not IPC. The two processors aren't even running the same instructions.

Execution purportedly inside of Cyclone according to whom? Apple hasn't said anything, I don't believe compiler source is revealing anything and we don't even have die shots.

There is also more to perf/MHz than execution resources and this is what Nothingness is saying. The other side of it latency and stalls. A processor designed for a lower maximum clock speed have latencies that correspond to a smaller number of clock cycles. The fastest critical path operations in a synchronous CPU design will almost always be designed to take a fixed number of cycles, regardless of clock speed.

Let's take an extreme look at this, a processor that runs at 1MHz. You could probably do it so accesses straight to main memory only took one cycle, no cache or prefetching needed. And forget about worrying about branch misprediction. Now take it even further, let's make this a 100KHz processor that's really the same 1MHz processor internally. Now it can seemingly execute 10 instructions simultaneously, regardless of dependencies. There should be little doubt that this processor would easily exceed the perf/MHz of Haswell.

That's a very extreme and unrealistic example to illustrate a point. But even going from a design target of ~4GHz to ~1.5GHz gives you a pretty big advantage in how you accommodate timings.

So you honestly believe Cyclone performs better clock for clock than Haswell?

Blandge · Sep 20, 2013

There is so much that goes into comparing microarchitectures that it's impossible to determine relative performance without know more details. Specifically, the workload and compiler can completely deform the functionality of any computation engine. Modern CPUs have so many gotchas that if a workload or compiler does something the wrong way it can completely tank performance. We can't say for sure that anything like that is happening here, but it possible that Cyclone may perform better than Haswell in some specific use cases.

The important question is how do they perform when using a mature software stack that does useful work. Anything else is purely academic, and nearly completely useless unless we have ALL of the important details in hand to analyze microarchitecural oddities. Is the code and/or dissassembly of the sub-benchmarks used in this Geekbench version available for analysis? That would be interesting to see.

ARM likes to use benchmarks like Geekbench to show how close their performance is to Intel when doing something like simple math in a loop, or some highly optimized single threaded encryption algorithm. This is something that their microarchitecture is designed to do, and indeed it does it very well (even compared to huge x86 cores), but it gives the false impression that you have a central processing unit that can do what has taken AMD and Intel 20 years to achieve. However, once you want to do something real that requires multiple cores running different software that has less-than perfect optimization (coded by some intern), swapping in and out threads every couple microseconds, moving data from cache to cache, core to core, getting stuck at the back of some buffer in the sideband fabic that's being held up by an interrupt that is waiting for other interrupts and data flying is from all sources of IO through USB, PCIe, SATA, and whatever other orifices that make up the device, and finally you understand that all of that purely academic research you have doing counts for nothing and you are left with a steaming pile a garbage that costs $15 and can do simple math really well.

This is the exact reason why until recently ARM has been considered a second class citizen that is reserved for microcontrollers with extremely tight, highly optimized hand written assembly (or maybe something fancy like C), and this is the exact reason why a lot of us have a hard time being impressed by posting some good benchmark scores. Let me take a snapshot of my desktop at 11pm on a Saturday night and see how well Cyclone handles Firefox with 20 tabs, excel, word, Battlefield 3, antivirus, skype and 50 other processes running on top of Windows 8. But at least it can do SHA encryption really fast.

The impressive part about modern x86 CPUs (AMD and Intel) isn't that they have a "really high IPC". It's that they can handle the chaos that is the Desktop OS with grace, and also service a whole range of workloads from handsets to servers and perform admirably. If all you care about is SHA256 encryption or Sobel edge detection then I can make you an ASIC that will blow your socks off.

Moral of the Story: GPUs and other various Co-Processors on an SoC do all of these operations in Geekbench way better than an GPCPU. The CPU is never going to do any of the things on this list for any significant amount of time, so why the do we even care?

Important things: Multitasking, Locking, synchronization and coherency behavior, Interrupts, IO and Memory Bandwidth and Latency, Caching, Branch Prediction and stalls.

Not Important things: Mathematical calculations that a GPU or other co-processors will do better.

Obviously it all depends on the usecase, and in many cases an ARM microprocessor is the best choice, but this post was a response to "ARM can replace Haswell. Source: These Geekbench scores"

mikk · Sep 20, 2013

meloz said:
the new Atom is already looking like a failure thanks to Intel's pricing.

What Intel pricing do you refer?

Khato · Sep 20, 2013

FwFred said:
Another interesting point, if Cyclone is locked at 1.3GHz the entire run, why is such a capable design team leaving so much performance on the table? If you don't have a good turbo-like capability, you are behind the curve.

This point has been bugging me as well, especially when comparing the geekbench component scores for the iPhone 5 vs iPhane 5s when both run 32 bit. If you've ever compared a processor against its previous generation at the same frequency, well, typically you'll see a few tests that see 30%+ gains, most in the 0-10% range, a few actually decreasing in performance, and then then 50%+ gains on floating point from doubling width/number of execution units. But comparing A6 against A7 has every single test except for the floating point Mandelbrot showing at least a 20% performance gain.

Now I could see how they'd get that kind of performance gain from one generation to the next if either the previous generation was extremely non-optimized or if it wasn't a refinement of the previous design. But neither of those applies in this case - Swift was already a pretty good design, and unless Apple has at least two separate CPU design teams working in parallel I don't see how they'd have had the time to do anything more than improve upon Swift.

But an interesting thing happens if you adjust the numbers for A7 to assume a 'turbo' in the range of 1.6-1.7 GHz - they suddenly look like what you'd expect to see. A few of the integer tests showing 30%+ gains, with most in the single digit range and the rest staying in the margin of error/showing slight reductions. Likewise, the floating point tests are then single digit gains/roughly even for those that aren't affected by whatever A7 doubled, a marked reduction for Mandelbrot implying that a shift in floating point resources created a contention, and 50%+ gains in the remaining tests thanks to taking advantage of doubled resources.

The above is, of course, just conjecture based on what fits the results. While it'd be quite odd, it's certainly possible that Apple somehow managed that level of performance increase without touching frequency... and then got even more performance on top of that when running the tests in the 64 bit mode. To put the magnitude of the gain that the A7 in 64 bit holds over the A6 in these tests (ignoring AES and SHA1 as they're clearly benefitting from acceleration) into perspective - it's comparable to the gains seen going from a 3 GHz Pentium 4 to a 3 GHz Core 2 Duo... and we all know that gain was only possible because of how poor of a Performance/MHz design the Pentium 4 was.

NostaSeronx · Sep 20, 2013

Geekbench 3:
ICC/VS for Windows
GCC for Android/Linux
LLVM Clang for iOS

Nothingness · Sep 20, 2013

NostaSeronx said:
Geekbench 3:
ICC/VS for Windows
GCC for Android/Linux
LLVM Clang for iOS

This is wrong:
- Windows: VS
- Android: gcc
- Linux/iOS: clang

mikk · Sep 20, 2013

Nothingness said:
This is wrong:
- Windows: VS
- Android: gcc
- Linux/iOS: clang

Source?

CHADBOGA · Sep 20, 2013

epidemis said:
'

Intel is going to lose the macbook air business soon.

No it won't.

mikk said:
What Intel pricing do you refer?

The pricing he has no knowledge of, so has just made up.

Exophase · Sep 20, 2013

Nothingness said:
This is wrong:
- Windows: VS
- Android: gcc
- Linux/iOS: clang

He believes that Visual Studio uses ICC for its code generation, which we both know is wrong (he states a lot of weird and false things as fact w/o source, readers should take caution)

This is the source for the compilers used for Geekbench: http://www.realworldtech.com/forum/?threadid=135540&curpostid=136174

CHADBOGA said:
The pricing he has no knowledge of, so has just made up.

Possibly also pricing Intel has published:

http://ark.intel.com/products/76760/Intel-Atom-Processor-Z3770-2M-Cache-up-to-2_39-GHz

$37 isn't that bad but it's still probably a decent bit more expensive than most competitors out right now. Actual big OEMs may be paying less.

mikk · Sep 20, 2013

Exophase said:
Possibly also pricing Intel has published:

http://ark.intel.com/products/76760/Intel-Atom-Processor-Z3770-2M-Cache-up-to-2_39-GHz

$37 isn't that bad but it's still probably a decent bit more expensive than most competitors out right now. Actual big OEMs may be paying less.

You can be sure that big OEMs like Acer, Asus etc. get nice discounts because they need much more than 1k CPUs. Bay Trail-T prices are lower than its predecessor and it's low enough so that the CPU itself charges a small portion of the overall device price. Even when some competitors are $10 cheaper, it makes a small difference for the overall tablet price. As long as Intel has superior performance or superior perf/watt a small extra charge is worth it. Furthermore all announced Bay Trail-T tablets are Windows 8 devices, Win8 license is not for free. Android tablets prices should be even lower than this:

Acer: Updated version of 8-inch "Bay Trail" W3-810. Battery: 8 hours. Price: $349
ASUS: 10.1-inch "Bay Trail" Transformer Book Trio T100TA. Battery: 12 hours. Price: $329. (This is a complement to the pricier 13.3-inch Transformer Book.)
Dell: 8-inch "Bay Trail" Venue. Battery: 10+ hours. Price: $299
Dell: 10.8-inch codenamed "Midland" running "BayTrail." Battery life: 9 hours (replaceable). Price: $399
Lenovo: 8-inch "Bay Trail" Miix 8. Battery: 8 hours. Price: $249
Lenovo: 10.1-inch "Bay Trail" Miix 2. Battery: 8 hours. Price: $449.
Toshiba: 8-inch "Bay Trail" Encore. Battery: 6-7 hours. Price: $329

CHADBOGA · Sep 20, 2013

Exophase said:
Actual big OEMs may be paying less.

You don't say.

So I repeat, for your benefit it seems:

The pricing he(and YOU) has no knowledge of, so has just made up.

NostaSeronx · Sep 20, 2013

Exophase said:
He believes that Visual Studio uses ICC for its code generation

It uses ICC with Visual Studio, Visual Studio by itself will give generic vectors for AMD/Intel/VIA. Both AMD and VIA only get scalar instructions though so it is pretty obvious which compiler is being used.

rgallant · Sep 20, 2013

BallaTheFeared said:
Now let see them scale it up to 5GHz SB :|

I agree
also can it play CRYSIS @ 2560 x 1440 max settings or maybe I missed that bench mark.
-it would like a smart car passing you on the highway doing 220 mph on a windy day.er not going to happen.

Geekbench 3 Sandy Bridge v.s. Apple Cyclone IPC comparison

Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Member

Diamond Member

Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Golden Member