Apple A9X the new mobile SoC king

jhu · Sep 13, 2015

Headcool said:
And one where an Atom Z3735F gets the same score as a Core i5 3317U...

Look closer: it takes the Atom 4 cores to do what the Core i5 3317U does using 1 thread.

gdansk · Sep 13, 2015

Geekbench, notably the SHA2 section, is total garbage. It doesn't use x64's dedicated SHA instructions.

386DX · Sep 14, 2015

thunng8 said:
I'm not going to detabe with you anymore on this. The numbers speak for themselves.

Max load on an ipad air 2 is 11W vs 29W on the macbook.

Are you seriously saying that a faster SSD, more RAM and marginally bigger screen uses 18W? That is ridiculous.

You only have to look at the idle and max to see isolate the CPU/GPU components as the max is measured when loading the CPU/GPU.

The delta on the ipad air 2 is 6W and macbook is 23W.

Seriously people need to use their brain a bit. Notebook check is measuring power from the wall not the SoC, they are essentially just measuring the size of the power adapter the device comes with. There is no real easy way of measuring the SoC power. If the iPad 2 Air came with a 5W adapter you'll get a max reading of 5W (of course your battery would probably not charge when device is on). Drawing more power from the wall doesn't mean the CPU is actually drawing all that power as the extra power draw is used to charge the battery at a quicker pace.

Like you said the Max load on an iPad 2 Air (as measured by the wall) is 11W and the macbook 29W... I'll give you two guesses on what the wattage on the power adapter that comes with these device are. 10W and 29W... Coincidence? I think not. Once you factor in adapter efficiency you get the 11W and 29W numbers.

thunng8 · Sep 14, 2015

386DX said:
Seriously people need to use their brain a bit. Notebook check is measuring power from the wall not the SoC, they are essentially just measuring the size of the power adapter the device comes with. There is no real easy way of measuring the SoC power. If the iPad 2 Air came with a 5W adapter you'll get a max reading of 5W (of course your battery would probably not charge when device is on). Drawing more power from the wall doesn't mean the CPU is actually drawing all that power as the extra power draw is used to charge the battery at a quicker pace.

Like you said the Max load on an iPad 2 Air (as measured by the wall) is 11W and the macbook 29W... I'll give you two guesses on what the wattage on the power adapter that comes with these device are. 10W and 29W... Coincidence? I think not. Once you factor in adapter efficiency you get the 11W and 29W numbers.

I think you should get your facts straight. Apple designed their power adapters to have enough power their device at maximum load without losing battery. Hence maximum load is near maximum output of the power adapter.

There are many examples of maximum power draw nowhere near the maximum power output of the power supply. eg.

http://www.notebookcheck.net/Asus-Zenbook-UX305-Subnotebook-Review.136543.0.html

45W power supply with maximum load of 30.3W. In this can even when the notebook is running at maximum possible load - the notebook can also be charged.

So before calling people to use their brains - you should use yours.

knutinh · Sep 14, 2015

Headcool said:
...
I don't think it is a problem if a benchmark is not vectorized, if it is application logic that the average programer would write. But the algorithms geekbench uses don't belong into this category. The algorithms geekbench uses are normally heavenly vectorized and optimized.
...

The more I learn about software optimization, the more sceptical I am about cpu benchmarks.

Usually, you are testing a particular software implementation, a compiler and some piece of hardware jointly. Trying to compare two pieces of hardware this way is hard.

My experience is that software implementations (and compilers) tend to be quite suboptimal on a pure performance basis, but the degree of "optimalness" is highly non-linear and hard to predict. Thus, small changes in code (or compiler) can result in large changes in performance.

It is often hard to know if a piece of software is close to "optimal". So (often) you don't even know robustly that if you are close to hw limits. In simple cases, you might be able to determine that e.g. memory limits your application and that this memory traffic is unavoidable. But often the problem is so complex (and the hardware is so hard to fathom) that such estimates are crude.

Testing the "sw ecosystem" might be equally relevant. I.e. how many man-hours does it take to implement operation X at a running speed of N seconds? How many dollars worth of tools/licenses? How many prospective customers will the platform offer the developer to distribute costs on? If the answer is that "this platform allows the dev to run some given matrix multiplications at one million/second by using a free library", then that may (or may not) be more relevant than "this platform allows the dev to run those same matrix mults at four million/second by investing 6 months of development time writing assembly and understanding the quirks of cache implementations".

One might expect that any platform/task combination will have some unique performance vs "cost" curve. Measuring absolute hardware limits only tells you the (expected) asymptote of that curve for those willing to put endless effort into the project, while details of that curve tells you more about what you can get for a more moderate effort.

That said, if your box is going to be devoted to one or a few tasks (calculate FFTs or run Quake or whatever), then measuring performance for the application of interest is going to tell you how fast that application is going to run on two or more hw platforms. Until the next recompile at least.

-k

Idontcare · Sep 14, 2015

knutinh said:
The more I learn about software optimization, the more sceptical I am about cpu benchmarks.

Usually, you are testing a particular software implementation, a compiler and some piece of hardware jointly. Trying to compare two pieces of hardware this way is hard.

My experience is that software implementations (and compilers) tend to be quite suboptimal on a pure performance basis, but the degree of "optimalness" is highly non-linear and hard to predict. Thus, small changes in code (or compiler) can result in large changes in performance.

It is often hard to know if a piece of software is close to "optimal". So (often) you don't even know robustly that if you are close to hw limits. In simple cases, you might be able to determine that e.g. memory limits your application and that this memory traffic is unavoidable. But often the problem is so complex (and the hardware is so hard to fathom) that such estimates are crude.

Testing the "sw ecosystem" might be equally relevant. I.e. how many man-hours does it take to implement operation X at a running speed of N seconds? How many dollars worth of tools/licenses? How many prospective customers will the platform offer the developer to distribute costs on? If the answer is that "this platform allows the dev to run some given matrix multiplications at one million/second by using a free library", then that may (or may not) be more relevant than "this platform allows the dev to run those same matrix mults at four million/second by investing 6 months of development time writing assembly and understanding the quirks of cache implementations".

One might expect that any platform/task combination will have some unique performance vs "cost" curve. Measuring absolute hardware limits only tells you the (expected) asymptote of that curve for those willing to put endless effort into the project, while details of that curve tells you more about what you can get for a more moderate effort.

That said, if your box is going to be devoted to one or a few tasks (calculate FFTs or run Quake or whatever), then measuring performance for the application of interest is going to tell you how fast that application is going to run on two or more hw platforms. Until the next recompile at least.

-k

Comes down to why anyone wants the performance in the first place.

If you want the performance because you want to generate higher benchmark numbers, then making ever more optimal compiles for ever higher benchmark scores is going to be a priority.

If, however, you want the performance because you have some software suite or application in mind, then all you really care about when it comes to benchmarks are two things - (1) is the benchmark a reasonable enough proxy for my software suite or application of interest?, and (2) what is the relative performance difference between two or more sets of hardware configurations?

If 3DMark is not a meaningful proxy of gaming, then it does not matter the performance derived from 3DMark benching if you are looking to ascertain price/performance assessments of gaming hardware, regardless how optimal or unoptimal 3DMark has been compiled.

Conversely, if handbrake is an app of interest to you, it does not matter how well optimized or unoptimized Handbrake is for various hardware, all that matters to you is how well it performs on various hardware.

A real-life example for myself is TMPGEnc. Some say it is well optimized for Intel hardware and thus will always turn out faster encode times on an Intel CPU than a comparably priced AMD CPU. For me this distinction is irrelevant, all I am interested in is price/performance for the software as it comes to me from the distributor as I cannot acquire a hypothetically better optimized version of it for an AMD processor, so allegations of compiler optimization bias are irrelevant in this situation.

tempestglen · Sep 17, 2015

I predict that
A9 duel core with 1.8Ghz
A9X Tri core with 2Ghz, GT6850 GPU

From chinese information.

thunng8 · Sep 18, 2015

tempestglen said:
I predict that
A9 duel core with 1.8Ghz
A9X Tri core with 2Ghz, GT6850 GPU

From chinese information.

We will have to wait for benchmarks but it does look like Apple quoted performance increases are for singled threaded tasks.

I.e. In apple's tech specs for iPad it notes that:
- a8 in the mini 4 is 1.3x faster than a7
- a8x in the air 2 is 1.4x faster than a7
- a9x in the pro is 2.5x faster than a7

link: http://www.apple.com/au/ipad/compare/

A8x and a8 difference (even when both are running at 1.5ghz) in single thread can be explained by the larger l2 cache 2mb vs 1mb and faster memory interface.

If they are quoting multithreaded perf they would have put a8x much higher than a8 because of core counts.

Arachnotronic · Sep 18, 2015

thunng8 said:
We will have to wait for benchmarks but it does look like Apple quoted performance increases are for singled threaded tasks.

I.e. In apple's tech specs for iPad it notes that:
- a8 in the mini 4 is 1.3x faster than a7
- a8x in the air 2 is 1.4x faster than a7
- a9x in the pro is 2.5x faster than a7

A8x and a8 difference (even when both are running at 1.5ghz) in single thread can be explained by the larger l2 cache 2mb vs 1mb and faster memory interface.

If they are quoting multithreaded perf they would have put a8x much higher than a8 because of core counts.

Yeah, if Apple's performance increase claims are at iso core counts, then they're damn impressive.

Nothingness · Sep 18, 2015

Arachnotronic said:
Yeah, if Apple's performance increase claims are at iso core counts, then they're damn impressive.

If the information quoted here based on Chinese ministry of industry A9 is 1.8 GHz with 2 cores, we are talking of about ~30% better IPC, which would definitely be impressive. I'll wait for benchmarks...

cytg111 · Sep 18, 2015

Idontcare said:
A real-life example for myself is TMPGEnc. Some say it is well optimized for Intel hardware and thus will always turn out faster encode times on an Intel CPU than a comparably priced AMD CPU. For me this distinction is irrelevant, all I am interested in is price/performance for the software as it comes to me from the distributor as I cannot acquire a hypothetically better optimized version of it for an AMD processor, so allegations of compiler optimization bias are irrelevant in this situation.

- Excatly this. At the end of the day, the only thing that matters.

Dresdenboy · Sep 18, 2015

cytg111 said:
- Excatly this. At the end of the day, the only thing that matters.

A positive side effect of nearly owning the market. Same is to a lesser extent true for Nvidia.

raghu78 · Sep 18, 2015

Arachnotronic said:
Yeah, if Apple's performance increase claims are at iso core counts, then they're damn impressive.

A9X is 80% faster than A9 according to Apple. Assuming thats a pure multithreaded statement we still get only 33% improvement from a 4th core. The rest of 30% or more improvement will have to come from increased clocks and higher IPC. There is no way IPC can be improved 30% or more on an already impressive high performance CPU core. A9X is definitely quad core and A9 is tri-core. You only need to see Cyclone to Enhanced Cyclone or Broadwell to Skylake to see that a 10% IPC improvement itself is at the upper end of the improvements we can expect on a existing high performance CPU core. The only other way to drastically improve IPC is through new CPU instructions and extending the ISA.

Arachnotronic · Sep 18, 2015

raghu78 said:
A9X is 80% faster than A9 according to Apple. Assuming thats a pure multithreaded statement we still get only 33% improvement from a 4th core. The rest of 30% or more improvement will have to come from increased clocks and higher IPC. There is no way IPC can be improved 30% or more on an already impressive high performance CPU core. A9X is definitely quad core and A9 is tri-core. You only need to see Cyclone to Enhanced Cyclone or Broadwell to Skylake to see that a 10% IPC improvement itself is at the upper end of the improvements we can expect on a existing high performance CPU core. The only other way to drastically improve IPC is through new CPU instructions and extending the ISA.

The die shot of A9 shows a dual core config, not a tri-core. I think Apple was able to deliver a very large singlethread performance boost with A9, and A9X may be a tri-core.

The reason I am confident that this is not a tri-core design is that you can very clearly see L2$ in the block that I have outlined. The rectangle that comes off of this block on the bottom left might have been a CPU core, but it doesn't look anything like the two CPU cores that I see inside of my blue rectangle.

I will be very interested to see if Apple has been able to deliver on these pretty impressive claims. If the A9 CPU runs at 1.8GHz, then Apple is getting some serious perf/clock improvements.

thunng8 · Sep 19, 2015

Yes, I agree. From die shot A9 is definitely dual core.

So that 70% claimed increase if true is very impressive

dark zero · Sep 19, 2015

Maybe it might be a real 55%, still is FAR better than Intel improvements.

ShintaiDK · Sep 19, 2015

Its "up to". And lets see how much comes from bandwidth increase. And then tested with a benchmark that doesnt cripple certain uarchs on purpose.

The A10 will also be produced on TSMCs 16FF+ instead of Samsungs 14FF.

Mondozei · Sep 20, 2015

When should we be able to see the first benchmarks of the A9? I've read that even though the official release date is the 24th there have been pushbacks on delivery estimates.

asendra · Sep 20, 2015

Normally the embargo ends on Wednesday before the release. Though normally have been some leaks in geekbench or similar a few days earlier even.

I'm actually kind of amazed there hasn't been any yet given that this years there has been 1week more than other years.

Leo9 · Sep 22, 2015

A9 Geekbench :

https://twitter.com/MoonshineDesign/status/646158938799378432/photo/1?ref_src=twsrc^tfw

Eug · Sep 22, 2015

https://twitter.com/MoonshineDesign/media

1.8 GHz, 2 GB RAM, and almost as fast as the tri-core iPad Air 2's A8X (which gets ~4500) in Geekbench 3 multi-core:

I predict ~6300 for A9X multi-core, at 2.0 GHz. Or minimum ~5700 at 1.8 GHz.

Arachnotronic · Sep 22, 2015

So basically A9 is a Core M inside of a phone, per Geekbench 3.

Sweet.

Nothingness · Sep 22, 2015

Too bad we don't have the Lua and Dijkstra scores.

IPC about 10% better on integer and 20% on FP. Even though that's less than the 30% that would have come from the Apple claim of 70% faster, that's still very good.

Arachnotronic · Sep 22, 2015

Nothingness said:
Too bad we don't have the Lua and Dijkstra scores.

IPC about 10% better on integer and 20% on FP. Even though that's less than the 30% that would have come from the Apple claim of 70% faster, that's still very good.

Indeed. Even more impressive that they improved IPC nicely while also scaling up clocks (though I'm sure the move to FinFETs helped a lot there).

Anyway, wish I were lucky enough to get my iP6S today...the 25th cannot come soon enough.

Eug · Sep 22, 2015

Note that as usual, numbers are skewed by SHA.

Apple A9X the new mobile SoC king

Lifer

Platinum Member

Member

Member

Member

Elite Member

Member

Member

Lifer

Platinum Member

Lifer

Golden Member

Diamond Member

Lifer

Member

Platinum Member

Lifer

Golden Member

Member

Junior Member

Lifer

Lifer

Platinum Member

Lifer

Lifer