Apple A9X the new mobile SoC king

Nothingness · Sep 12, 2015

Headcool said:
Take a look at the second page of the Phoronix review you posted. There is a list of all compiler flags used. The flags used for the i3 include --with-arch-32=i686 and --with-tune=generic. I have not found an explanation for these flags, but from my understanding it only generates i686 code without SSE or AVX at all. On the other side it uses --target=x86_64-linux-gnu which indicates 64bit support and thus includes SSE. Regardless which interpretation is correct AVX and FMA seems not to be used.

The flags you show are the flags used to compile gcc itself, not how code is generated.

It looks like most tests are compiled with -O3, but are lacking -march=native which would enable the use of AVX(2) where applicable. That's unfortunate, but alas a reflection of what happens in real life: given the stupid strategy of Intel to segment parts of their ISA, you can't use such flags by default.

Thala · Sep 12, 2015

What's the point of a floating point benchmark that deliberately doesn't use the vast majority of the provided floating point hardware?

There are several reasons for this in a benchmark.

For one given the limitation of C, where you cannot explicitly express data parallelism, which you need for SIMD, you would rely on auto vectorizations of the compiler.
And second, when looking at general purpose CPU performance, a worse CPU architecture would get away with just providing wide SIMD data-pathes.

In summary i think it is perfectly fine, that a benchmark restricts itself to FPU neglecting any SIMD extensions.

Sweepr · Sep 12, 2015

Khato said:
Let's see, time in seconds for C-Ray v1.1 is 79.67 for the i3 versus 84.3 for the A57... but what's this? A Z3735F takes 171.95 versus 205.93 for an N2820? Oh, that explains it - multithreading. Yup, 4 A57 cores almost match the 2 cores in the i3 - real impressive...

Then for Stockfish v2014-11-26 no analysis is necessary, 5832 for the i3 versus 11461 for the A57, so again basically half the speed per core since this one definitely is single threaded.

John the Ripper is the only one where A57 is actually competitive - the N2820 is faster than the Z3735F as it should be for a single-threaded benchmark, so core count isn't at play. But the A57 at 1681 is still quite close to the i3 at 1979. Either way, for most tests an A57 core is about half of an i3 core, the X1 just gets to being close on one of them due to having twice the number of cores.

This.

http://openbenchmarking.org/embed.php?i=1507289-BE-TEGRAX15998&sha=c7e55e5&p=2

http://openbenchmarking.org/embed.php?i=1507289-BE-TEGRAX15998&sha=4899bb2&p=2

But hey, according to some there's an ultimate benchmark for x86 vs ARM called Geekbench and we should blindly believe A9X will be faster than the Core i7 Macbook Pro 13''.

Nothingness · Sep 12, 2015

Sweepr said:
This.

This result seems significant.

http://openbenchmarking.org/embed.php?i=1507289-BE-TEGRAX15998&sha=c7e55e5&p=2

Click to expand...

For compilation, disk speed plays a role. -> useless comparison

http://openbenchmarking.org/embed.php?i=1507289-BE-TEGRAX15998&sha=4899bb2&p=2

OpenSSL doesn't seem to have assembly code for AArch64 while it has for x86-64. -> useless comparison (after confirmation)

But hey, according to some there's an ultimate benchmark for x86 vs ARM called Geekbench and we should blindly believe A9X will be faster than the Core i7 Macbook Pro 13''.

Your benchmark selection doesn't look better on the surface at least :biggrin:

Anyway as always: using a single benchmark to compare CPU is dumb. We'll see what SPEC tells us (provided Anandtech uses it again).

Thanatosis · Sep 12, 2015

Headcool how much is intel paying these days for friendly voices?

SAAA · Sep 12, 2015

Thanatosis said:
Headcool how much is intel paying these days for friendly voices?

They pay you? Where can I sign for this? I'm a student in need any kind of money I can get, even if I have to spit out crap on forums everyday...

Seriously, the argument is about A9X beating some laptop chips according to a single benchmark.
I can understand why many don't believe that, especially when those chips are on similar nodes and one supposedly uses a fraction of the power? Yeah, sure...

Let me spit some numbers: intel 14nm is 3x times more efficient than 22nm because incoming xeon phi is 3+ times faster than the previous one with the same TDP. Also 3x the single thread performance!!!
The real problem is that we are stuck on quad cores, high end is server and it's getting all the "big" numbers we dream of with more frequency, IPC and cores, exactly like Apple's latest chip.

Nothingness · Sep 12, 2015

Headcool said:
They are misleading if you take just take the score of any browser instead of the best one.
The reason you can't ignore them is because browsing is one of the most important tasks performed on a mobile device.
Especially WebXPRT is important because it benchmarks tasks people actually perform an mobile devices.

Principled Technologies, the makers of WebXPRT, are sponsored by Intel.

http://www.principledtechnologies.com/benchmarkxprt/faq

Intel is a sponsor and member of the BenchmarkXPRT Development Community, and contributes to the development process of the XPRT family of benchmarks.

http://www.vrworld.com/2014/11/03/shades-sysmark-2001-intel-may-webxprt-problem/

In the fine print of the terms of use of WebXPRT 2013, the benchmarking suite used to show the Intel advantage, the following is disclosed:
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Intel is a sponsor and member of the BenchmarkXPRT Development Community, and was the major developer of the XPRT family of benchmarks.

Do you really want to defend WebXPRT and keep on spitting at Geekbench?

asendra · Sep 12, 2015

Nothingness said:
Principled Technologies, the makers of WebXPRT, are sponsored by Intel.

http://www.principledtechnologies.com/benchmarkxprt/faq
http://www.vrworld.com/2014/11/03/shades-sysmark-2001-intel-may-webxprt-problem/
Do you really want to defend WebXPRT and keep on spitting at Geekbench?

LOL

Space69 · Sep 12, 2015

Nothingness said:
That's unfortunate, but alas a reflection of what happens in real life.

This is great - I thought the purpose was to evaluate the performance of cpus, but I was clearly mistaken. 'Geekbench - the complete emulation of a mediocre developer', this sure explain some things for me. :biggrin:

Nothingness · Sep 12, 2015

Space69 said:
This is great - I thought the purpose was to evaluate the performance of cpus, but I was clearly mistaken. 'Geekbench - the complete emulation of a mediocre developer', this sure explain some things for me. :biggrin:

Hmm, you must have missed the point, this was not about Geekbench. Or perhaps are you obsessed with Geekbench and see it everywhere? :twisted:

Accord99 · Sep 12, 2015

Thala said:
b) An optimal implementation of DGEMM on i7 4770 would use AVX most likely via Intel Math Kernel Libraries. Geekbench deliberately chose not to use NEON/SSE/AVX extensions.

Because that would go against the reason for Geekbench, which is to make current Apple CPUs look good and everything else bad.

Space69 · Sep 12, 2015

I apologize if it was not about the Geekbench - but regardless it would have explained a lot. I'll have to admit that I do see the mention of Geekbench in a lof of threads here and I simply can't phantom why, but to be honest many of the other so call benchmarks are not much better.

Accord99 · Sep 12, 2015

Space69 said:
I apologize if it was not about the Geekbench - but regardless it would have explained a lot. I'll have to admit that I do see the mention of Geekbench in a lof of threads here and I simply can't phantom why, but to be honest many of the other so call benchmarks are not much better.

In the two years since the A7 was released, there has been virtually no meaningful cross-platform CPU benchmark released. It's still geekbench or javascript benchmarks.

Thala · Sep 12, 2015

I'll have to admit that I do see the mention of Geekbench in a lof of threads here and I simply can't phantom why, but to be honest many of the other so call benchmarks are not much better.

Not much better? Most of the other Benchmarks mentioned in this thread are much worse, in particular those Java Script benchmarks.
There is not a single valid reason of why Geekbench is bad. In fact it is one of the better cross platform low level benchmarks out there.

Some valid criticism i already mentioned a few pages before, that is the inclusion of memory bandwidth benchmarks skewing the results towards faster memory interfaces and the inclusion of SHA/AES benchmarks skewing the results towards architectures, which have hardware support for it.

Because that would go against the reason for Geekbench, which is to make current Apple CPUs look good and everything else bad.

Not this again. I already explained, why it is problematic to use vector instructions in benchmarks.

That's unfortunate, but alas a reflection of what happens in real life: given the stupid strategy of Intel to segment parts of their ISA, you can't use such flags by default.

Same for ARM, NEON and FPU is optional. For ARM there is a good reason though. In the embedded world, when you use say Cortex A5/R5 as controller, you would like to safe those 0.08mm^2 required for NEON (28nm TSMC)

Rakehellion · Sep 12, 2015

raghu78 said:
10 PCI-E lanes in a tablet SoC is stupid.

Internal SSDs are on PCI-E now.

tential · Sep 12, 2015

ShintaiDK said:
Bold statements again without benchmarks or anything else. Remember the 30-40% GPU share claim?

Its funny you only want to compare it to PS3. Yet CPU wise its a free run?

Either its close to PS4/Xbox One or its not close to Core M either.

If terms of CPU, you may see some memory bottlednecked benchmarks where A9/A9X does very well due to 2x bandwidth. But there is a world of difference between a computational one and memory limited.

He also claimed there would be a mobile variant of Fury that would easily game at 4k. ya... lol

Lonyo · Sep 12, 2015

thunng8 said:
As per usual, they used furmark and Prime95 to get the peak figures. Only stresses GPU/CPU.

So that delta of 23W for the Core M in the Macbook is purely the delta when loading the CPU/GPU.

Impressive that they have a delta of 23w for the CPU when the battery managed to last 2.58hours at full load.
That means with a 39.7wHr battery the system was draining an average of 15.36w.
So to have a CPU/GPU delta of higher than the average drain over the entire people is impressive.
Maybe because that discrepancy could be either instantaneous, or thrown off by something.

If it was actually capable of drawing 30w consistently, then the battery would be dead in about 1hr 20 minutes and not 2hr 35minutes.

Equally their power test shows that the whole system when idle only draws 1.7w, with screen on and absolutely idle, or 5.22w when doing a light workload and 5.6 playing a video.

Further, how do you measure power consumption? Presumably in order to measure the power consumption min/avg/max they test while on mains power, since they can't measure the battery draw, so you are adding in the charger and efficiencies, and potentially some battery fudgery in your peak draw figures, so you aren't getting an accurate picture of how much power is actually being used.

Hi-Fi Man · Sep 12, 2015

It's funny that some suggest Apple is quickly moving to replace x86-64 with ARMv8. The ARM ISA is lacking in terms of capabilities when compared to x86-64 and Power v2.06+ right now. This is intentional because mobile devices (not laptops) aren't expected to run content creation or server type applications however, in order for ARM to be a viable ISA for Apple, ARM will need to implement (or make standard) more advanced instructions into the ISA for things like 256-bit SIMD, virtualization and TXT. The ARM ISA hasn't fully matured but once it does, then we can talk.

For those who compare previous transitions (PowerPC to Intel x86) it's not the same here. Power v2.03 had similar functionality to x86-32/64 at the time and in some cases more, ARM doesn't.

Mondozei · Sep 13, 2015

Idontcare said:
I think you may have gathered the wrong impression from the ARM vs. x86 discussions.

That ARM could (and would) someday reach x86 class performance was never really in question. What was in question was whether they would do it without equally reaching x86 class prices, die sizes, transistor budgets, and power consumption.

Those are non-issues. Power consumption a problem on ARM? Are you serious? Ditto the rest.

I also remember those discussions. In fact I created one of the threads in this subforum on this very topic. Most people weren't concerned with price or power budget but with performance, and a lot of arguments around the ecosystem was also used(arguably a stronger argument).

Idontcare said:
Apple is being ridiculously intelligent here. They are going to let the market decide which they prefer on the basis of form factor and software.

Does the market want iPad pro's and the accompanying ARM itunes apps to replace their mac books and accompanying wintel x86 software?

Apple is making it such that they don't have to be the company that decides one for the other, people and their wallets will make that choice for Apple in the coming 3-4 years.

I didn't know you had special access to Apple's HQ and their corporate master plans.

Seriously, though, there are a lot of ways selling the ARM ecosystem and a 2-in-1 is not on the top of the list. iPad sales have been falling for years. Will the pro magically ressurect them? Has 2-in-1s helped the Android tablet space?

While nobody knows Apple's long game for sure, it fundamentally makes sense to push for Apple-made SoCs in their products. And yes, that includes Macbooks and Macs.

It would be consistent with everything they have done so far and as such, it would be a mistake to read too much into a specific product, like the iPad pro.

BTW: another overlooked area where a very strong SoC could help a lot is autonomous driving, specifically for deep learning.

Headcool · Sep 13, 2015

Nothingness said:
It looks like most tests are compiled with -O3, but are lacking -march=native which would enable the use of AVX(2) where applicable. That's unfortunate, but alas a reflection of what happens in real life: given the stupid strategy of Intel to segment parts of their ISA, you can't use such flags by default.

That depends what you want to benchmark. If you want to benchmark CPUs, it is not ok. If you want to benchmark ecosystems, it is ok. But if you do latter you would not only encounter applications without AVX support but also without any ARM support at all. X86 has broad support in server, desktop and mobile (via Android) applications. ARM only has broad support in mobile applications, but nearly no support in desktop and server applications. So if you really want to benchmark the ecosystems, the ARM ecosystem would loose by a huge amount.

The segmenation of the ISA is not a problem at all. Just add CPU-dispatching. Libraries that support AVX usually always contain a path for CPUs that only support SSE and sometimes even for the i686 or i386 subset.

Thala said:
There are several reasons for this in a benchmark.

For one given the limitation of C, where you cannot explicitly express data parallelism, which you need for SIMD, you would rely on auto vectorizations of the compiler.
And second, when looking at general purpose CPU performance, a worse CPU architecture would get away with just providing wide SIMD data-pathes.

In summary i think it is perfectly fine, that a benchmark restricts itself to FPU neglecting any SIMD extensions.

C and C++ have better support for SIMD programming than most other languages. Every major C/C++ compiler like gcc, clang, icpc, msvc does support inline-assembly for x86 and SSE/AVX-intrinsics.

Again it is easy to provide a path for AVX capable CPUs and a path for SSE-only capable CPUs.

To prevent that worse CPU architectures would get away with just providing wide SIMD-data-paths it is necessary to make benchmarks more complex and more realistic. Look at the raytracing part of geekbench. It is a raytracer, any cs student could code. It is a primitive piece of code.
A real professional used raytracer like for instance vray is hundred times more complex. It doesn't run the same code again and again. There is much more diversity in code. Of course wider SIMD-units would also be benefical in vray, because a real raytracer always use SIMD everywhere it is possible. But it also heavenly tests things like branch-prediction, cache-prefetching, etc.
Of course such complex software is not available on iOS. Actually there are almost no real-world applications user run on iOS or Android that really stresses a high-end mobile SoC.
That makes it really difficult to make real-world benchmarks on mobile devices.

I don't think it is a problem if a benchmark is not vectorized, if it is application logic that the average programer would write. But the algorithms geekbench uses don't belong into this category. The algorithms geekbench uses are normally heavenly vectorized and optimized.

SAAA said:
Seriously, the argument is about A9X beating some laptop chips according to a single benchmark.
I can understand why many don't believe that, especially when those chips are on similar nodes and one supposedly uses a fraction of the power? Yeah, sure...

It is about not believing that it does. It is about using geekbench as only reference and than thinking the "desktop class performance" claim is true. If I look at wide setting of benchmarks between a8x and core m, even an 80% faster a9x would not match the broadwell core m, let alone the skylake core m.
Thus it is not "desktop class performance", not even "notebook class performance" and even fails to hit the "premium tablet class performance".
It is a good high-end tablet SoC that is sufficient for almost every consumer task normally performed on a tablet, but unsufficient for usage in a productive environment.

Nothingness said:
Principled Technologies, the makers of WebXPRT, are sponsored by Intel.

http://www.principledtechnologies.com/benchmarkxprt/faq
http://www.vrworld.com/2014/11/03/shades-sysmark-2001-intel-may-webxprt-problem/
Do you really want to defend WebXPRT and keep on spitting at Geekbench?

Sure, unless you have a real technical argument that specifies how WebXPRT discriminates ARM in favour of x86.
But since it is a HTML 5 & Javascript benchmark it executes the same code for x86 and ARM. And in comparison to Geekbench it actually benchmarks things user typically perform on a mobile device.
So you can't really accuse WebXPRT for using unfair/unrealistic use cases.
But if you find something concrete, I'm open for it.

Zodiark1593 · Sep 13, 2015

If someone can code a half-decent Branched Path Trace benchmark for ARM and x86, there should be enough complexity to stress many parts of a cpu arch and even memory architecture. A Blender port to ARM would be just the ticket.

jhu · Sep 13, 2015

Zodiark1593 said:
If someone can code a half-decent Branched Path Trace benchmark for ARM and x86, there should be enough complexity to stress many parts of a cpu arch and even memory architecture. A Blender port to ARM would be just the ticket.

Here you go

shady28 · Sep 13, 2015

jhu said:
Here you go

Heh there is irony here.

So if i am reading that correctly...

Peeps are going to use a benchmark that shows an FX-8350 smoking an i5-4570 in order to prove that Intel is faster than an Apple A9.

We're doomed.

Headcool · Sep 13, 2015

shady28 said:
Heh there is irony here.

So if i am reading that correctly...

Peeps are going to use a benchmark that shows an FX-8350 smoking an i5-4570 in order to prove that Intel is faster than an Apple A9.

We're doomed.

And one where an Atom Z3735F gets the same score as a Core i5 3317U...

Blender is a nice hobbyist tool, but there is a reason it is not often used in professional Productions. And its renderer is often considered as its biggest weakness.

Also there should be a much higher diversity in the scene. Something with multiple light sources, reflection, refraction, some kind of particle system like smoke and fire, high poly count, complex materials that require subsurface scattering, rigid body physics, fluid simulation, etc should be used for a good ray tracing benchmark.

jhu · Sep 13, 2015

shady28 said:
Heh there is irony here.

So if i am reading that correctly...

Peeps are going to use a benchmark that shows an FX-8350 smoking an i5-4570 in order to prove that Intel is faster than an Apple A9.

We're doomed.

I wouldn't exactly say the FX8350 "smokes" the i5-4750. It takes the FX 8 threads to beat a 4 thread CPU that's clocked lower.

Apple A9X the new mobile SoC king

Diamond Member

Golden Member

Diamond Member

Diamond Member

Member

Senior member

Diamond Member

Member

Member

Diamond Member

Platinum Member

Member

Platinum Member

Golden Member

Lifer

Diamond Member

Lifer

Senior member

Golden Member

Junior Member

Platinum Member

Lifer

Platinum Member

Junior Member

Lifer