Apple A7 is now 64-bit

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,753
1,311
126
Tentative A7 analysis:



We publish this with the caveat that these are best guesses – we have not done any real circuit extraction to confirm them. The dual-core CPU and cache make up ~17% of the die area, and the quad-core GPU and shared logic about 22%. The CPU itself is not packed the same way as the A6 (see below), it looks much more like a conventional automated layout; although Linley Gwennap thinks that it’s still Apple designed, not the first ARM A53/57 usage. There’s a great review of the A7’s capability over at AnandTech.

We know from our analysis of the 32 nm A6 chip that the 6 transistor SRAM cell area was ~0.15µm2, so if we shrink that, we can guesstimate the 28 nm 6T SRAM cell to be ~0.12 µm2. If we further allow a conservative 40%-50% utilization to allow for the row and column circuitry, then we get densities of ~1 MB for the L2 cache, and ~256 KB for the L1 cache.

One thing we haven’t identified is the memory block used for the data from the fingerprint sensor – Jony Ives, in his video about the sensor, clearly states that the information is stored securely in the A7 and emphasized it with this shot below.

 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,769
1,429
136
Tentative A7 analysis:



We publish this with the caveat that these are best guesses – we have not done any real circuit extraction to confirm them. The dual-core CPU and cache make up ~17% of the die area, and the quad-core GPU and shared logic about 22%. The CPU itself is not packed the same way as the A6 (see below), it looks much more like a conventional automated layout;
I don't think L1 are correctly placed.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I don't think L1 are correctly placed.

They quite obviously are not. I don't know how someone at Chipworks made this kind of mistake. The blocks just diagonal of the L2 and to the opposite side of core are the L1 caches. My guess is the one closer to the L2 is dcache. Stuff just to the left of those blocks can include tags, TLB entries, and buffers. Area in the center to the right of the L2 SRAM arrays would be L2 tags.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
They quite obviously are not. I don't know how someone at Chipworks made this kind of mistake. The blocks just diagonal of the L2 and to the opposite side of core are the L1 caches. My guess is the one closer to the L2 is dcache. Stuff just to the left of those blocks can include tags, TLB entries, and buffers. Area in the center to the right of the L2 SRAM arrays would be L2 tags.

O.K. that makes more sense, as I was thinking that the areas diagonal and towards the opposite side of the cores looked like split L1$ (I/D) - if not, WTH would the be! Thanks!
 

ancientarcher

Member
Sep 30, 2013
39
1
66
A basic question on the power of the Apple A7.
On Geekbench 3, the A7 scores around 1400 for a single core (@1.3GHz). The Intel Core i7-3840QM scores around 3,500 on a single core basis (@2.8Ghz). The TDP of A7 is 1.5-2W and the TDP of the core i7-3840QM is around 45W (but then people say that is only for the CPU, the integrated GPU is not included).

So, theoretically Apple can double the A7 frequency to 2.6GHz and the performance will increase to 2,800 per core with power quadrupling to 6-8W TDP, so basically getting 80% of core i7 score with 1/6th the power consumption. I realise, you can't just double the clock rate and there are other factors, but does that show you how close the A7 architecture is to the best that Intel has got. By the way, the Core i7-3840QM suggested retail price is $568.

If that is the performance that Apple gets from 28nm planar design, then clearly it can take the same design, ramp up the clock rate in 20nm next year and get a processor with single core performance equal to the best of Intel?

Is the above broadly correct?
If it is, then what stops apple from having two different (or 3) processor design teams geared to deliver cores for different TDPs. One for phones (lowest TDP), another for macbook pro and air and the third for the imacs (the highest power consumption levels)?
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
A basic question on the power of the Apple A7.
On Geekbench 3, the A7 scores around 1400 for a single core (@1.3GHz). The Intel Core i7-3840QM scores around 3,500 on a single core basis (@2.8Ghz). The TDP of A7 is 1.5-2W and the TDP of the core i7-3840QM is around 45W (but then people say that is only for the CPU, the integrated GPU is not included).

So, theoretically Apple can double the A7 frequency to 2.6GHz and the performance will increase to 2,800 per core with power quadrupling to 6-8W TDP, so basically getting 80% of core i7 score with 1/6th the power consumption. I realise, you can't just double the clock rate and there are other factors, but does that show you how close the A7 architecture is to the best that Intel has got. By the way, the Core i7-3840QM suggested retail price is $568.

If that is the performance that Apple gets from 28nm planar design, then clearly it can take the same design, ramp up the clock rate in 20nm next year and get a processor with single core performance equal to the best of Intel?

Is the above broadly correct?
If it is, then what stops apple from having two different (or 3) processor design teams geared to deliver cores for different TDPs. One for phones (lowest TDP), another for macbook pro and air and the third for the imacs (the highest power consumption levels)?

The problem, as already shown in other threads, is that geekbench is not a good way of showing cross platform performance. In short, the real world performance is lower on the A7 than geekbench shows vs for example x86. The A7 "magic" is already dead.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
The problem, as already shown in other threads, is that geekbench is not a good way of showing cross platform performance. In short, the real world performance is lower on the A7 than geekbench shows vs for example x86. The A7 "magic" is already dead.

No it's not, the only "Magic" was the crypto scores because the are offloaded to hardware - which is still a good thing, just not comparable to CPU based crypto scores (and x86 has hardware accelerated crypto as well).

The are still small to significant gains in the A7 due to the implementation of AArch64. The move to 32 GPRs being the one of the biggest reasons for some of the gains, double wide FP units also add to the performance. Importantly, this is in a SoC that is still below 2 Watts peak load (and typically much less). If you don't believe, you can go over to RWT and try arguing with some much smarter guys there. Even Hans posted a die shot here and made mention of the much larger performance bump than expect.

Please note, I'm not an Apple 'fanboi', I'm just impressed with what Apple/ARM was able to do with such a low power CPU on a 1/2 node shrink.
 

ancientarcher

Member
Sep 30, 2013
39
1
66
The problem, as already shown in other threads, is that geekbench is not a good way of showing cross platform performance. In short, the real world performance is lower on the A7 than geekbench shows vs for example x86. The A7 "magic" is already dead.

Well, everyone says that, but no one has an alternative. I am just using Geekbench as a substitute for performance. But my main point is: can't apple (or any of the ARM design guys, on their 64bit ARM designed processors)
1) Double the clock rate of the A7 (or its soon to be born brothers from qualcomm etc)
2) Double the GPU area (which is more scalable anyways)
3) Move to a new process node (which is going to happen anyways)

and bring out a processor which is comparable in performance to the core i7 series Intel processors but much lower in power consumption?

After all, the best of breed intel core i-7 processors have south of 1.5bn transistors and the A7 has 1bn transistors... ARM architecture can't be so sh*t compared to x86 that they can't have at least comparable performance!
 

Nothingness

Platinum Member
Jul 3, 2013
2,769
1,429
136
Well, everyone says that, but no one has an alternative. I am just using Geekbench as a substitute for performance. But my main point is: can't apple (or any of the ARM design guys, on their 64bit ARM designed processors)
1) Double the clock rate of the A7 (or its soon to be born brothers from qualcomm etc)
Doubling frequency is not an easy task if your design has been thought for that from early in the design process. We don't know what Apple did, but given the apparent efficiency on Geekbench, they went for high perf/MHz instead of high frequency (or A7 frequency is already higher than the 1.3GHz found).

2) Double the GPU area (which is more scalable anyways)
3) Move to a new process node (which is going to happen anyways)

and bring out a processor which is comparable in performance to the core i7 series Intel processors but much lower in power consumption?

After all, the best of breed intel core i-7 processors have south of 1.5bn transistors and the A7 has 1bn transistors... ARM architecture can't be so sh*t compared to x86 that they can't have at least comparable performance!
No, high performance doesn't imply x86 for sure. But it took Intel many years and iterations to reach the level they are at. I doubt Apple will be able have a competing chips before 2 or 3 years (assuming one iteration per year), and even then Intel will have moved on.

All of the above is pure speculation
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
The problem, as already shown in other threads, is that geekbench is not a good way of showing cross platform performance. In short, the real world performance is lower on the A7 than geekbench shows vs for example x86. The A7 "magic" is already dead.
Which are the real world measurements you're referring to? Please don't list synthetic/scripted browser benchmarks, benchmarks depending on some single vendor optimized execution engine, or compiler-biased ones (e.g. GCC vs. ICC).
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Which are the real world measurements you're referring to? Please don't list synthetic/scripted browser benchmarks, benchmarks depending on some single vendor optimized execution engine, or compiler-biased ones (e.g. GCC vs. ICC).

Compiler biased? So anythign compiled with clang is Apple biased I assume.
 

ancientarcher

Member
Sep 30, 2013
39
1
66
Doubling frequency is not an easy task if your design has been thought for that from early in the design process. We don't know what Apple did, but given the apparent efficiency on Geekbench, they went for high perf/MHz instead of high frequency (or A7 frequency is already higher than the 1.3GHz found).

No, high performance doesn't imply x86 for sure. But it took Intel many years and iterations to reach the level they are at. I doubt Apple will be able have a competing chips before 2 or 3 years (assuming one iteration per year), and even then Intel will have moved on.

All of the above is pure speculation

Speculation, it might be, but consider the facts below:

1) AMD said that one of the reasons they are signing the architecture licensing deal with ARM is because of the ease of designing with ARM. From what I remember reading, it cuts down the time to design significantly and the number of people by ~1/5th ish

2) Legacy is a burden as well as a benefit. Intel has benefited from the status of x86 in legacy systems today and will continue to benefit. But that also brings the burden of carrying on with bits of inefficient legacy blocks in their designs. Whereas ARM had the advantage of designing a 64bit architecture from the ground up, only having to maintain backward compatibility with their 32bit arch, not for stuff from 20 years ago (which is the case with x86)

So, don't for a moment assume that Intel and the ARM design camp are on a level playing field. They are not!
 

Nothingness

Platinum Member
Jul 3, 2013
2,769
1,429
136
1) AMD said that one of the reasons they are signing the architecture licensing deal with ARM is because of the ease of designing with ARM. From what I remember reading, it cuts down the time to design significantly and the number of people by ~1/5th ish
That's probably true because AMD will be using an existing CPU design, instead of designing their own. Designing a CPU from the ground up is probably as expensive for anyone as it is for Intel, when you're targeting performance.

2) Legacy is a burden as well as a benefit. Intel has benefited from the status of x86 in legacy systems today and will continue to benefit. But that also brings the burden of carrying on with bits of inefficient legacy blocks in their designs. Whereas ARM had the advantage of designing a 64bit architecture from the ground up, only having to maintain backward compatibility with their 32bit arch, not for stuff from 20 years ago (which is the case with x86)
ARM 32-bit instruction set has been accumulating stuff for 20 years too, though certainly not anything as bad as x87 :biggrin:

But yeah Aarch64 is definitely way cleaner than x86.

So, don't for a moment assume that Intel and the ARM design camp are on a level playing field. They are not!
From a high performance chip design point of view, I am sorry but they are

EDIT: BTW what I called "speculation" was my answer, not a critics of your post!
 

ancientarcher

Member
Sep 30, 2013
39
1
66
From a high performance chip design point of view, I am sorry but they are

!

As Neils Bohr said "Prediction is very difficult, especially if it's about the future." Some of the prediction must fall into speculation by default :biggrin:

My point was that the performance is not so different between those two. Just designing a core which can take clock rates of 2.6-3GHz and Apple has a chip which can theoretically (at least as per Geekbench) compare with the best of breed low power core i7 which costs 20x more. Not that ARM chips don't run at those clock rates, just look at the snapdragon 800.

Designing a chip from the ground up will be difficult. But modifying the A7 so that it can operate at higher frequencies - is that so difficult??? And yes, you have to modify for the 20nm node of course and the 14nm finfet after that, but thats what chip designers are paid to do...
 

Nothingness

Platinum Member
Jul 3, 2013
2,769
1,429
136
Designing a chip from the ground up will be difficult. But modifying the A7 so that it can operate at higher frequencies - is that so difficult???
Yes, that is difficult, very difficult. In fact high frequency and low frequency mean different micro-architectures.

This comes from the fact that at a given frequency your electrons can travel through let's say 30 layers of logic per cycle; if you start increasing frequency, given that electron speed is constant, you'll have less layers of logic, hence you'll be able to do less work (that's a simplification, but it's close enough to reality).

So if for instance, Apple had time to multiply two numbers because this requires 30 layers of logic, if they increase freq, which implies less layers, then they'll have to split multiplication into two cycles.

Hope I made this clear enough...
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
1) AMD said that one of the reasons they are signing the architecture licensing deal with ARM is because of the ease of designing with ARM. From what I remember reading, it cuts down the time to design significantly and the number of people by ~1/5th ish
There could be a) simple reuse of existing Cortex macros, b) a modified design (also saves work), c) a redesign from ground up.

A LinkedIn profile suggests, that AMD is working on some modifications. The owner mentioned the ARM scheduler.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
There could be a) simple reuse of existing Cortex macros, b) a modified design (also saves work), c) a redesign from ground up.

A LinkedIn profile suggests, that AMD is working on some modifications. The owner mentioned the ARM scheduler.

Are you sure the guy isn't just citing previous experience while under employment at ARM?

ARM core licensees have not been in the habit of modifying the core outside of the configurable options that ARM provides. Probably because it now lies on them to redo validation which is a big part of the design cost.
 

TuxDave

Lifer
Oct 8, 2002
10,572
3
71
But my main point is: can't apple (or any of the ARM design guys, on their 64bit ARM designed processors)
1) Double the clock rate of the A7 (or its soon to be born brothers from qualcomm etc)
2) Double the GPU area (which is more scalable anyways)
3) Move to a new process node (which is going to happen anyways)

They absolutely can do all of the above. It just takes time, money and talent.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
There's no guarantee you can double the peak clock speed of a uarch by throwing a lot of money at it. Normally you'd end up with a totally different uarch - leaving that much clock speed on the table is a sign of a poorly balanced design.

It's possible Apple can further optimize the timings by spending more time tuning things (including physical layout) without changing the behavioral specification of the uarch, but 2x seems like a lot given the experience of their design teams. Probably something is going to have to start giving in terms of cycle counts.
 

ancientarcher

Member
Sep 30, 2013
39
1
66
Yes, that is difficult, very difficult. In fact high frequency and low frequency mean different micro-architectures.

This comes from the fact that at a given frequency your electrons can travel through let's say 30 layers of logic per cycle; if you start increasing frequency, given that electron speed is constant, you'll have less layers of logic, hence you'll be able to do less work (that's a simplification, but it's close enough to reality).

So if for instance, Apple had time to multiply two numbers because this requires 30 layers of logic, if they increase freq, which implies less layers, then they'll have to split multiplication into two cycles.

Hope I made this clear enough...

Fair enough. You and Exophase have made it clear that it is not trivial. However, am I correct in assuming that it is still possible to design a core (a little different from the A7 obviously) which is nearly 2x the performance of the A7 cyclone. Maybe with 20nm (with performance gains of 20-30% over the current node), Apple (and/or the others) can theoretically reach core i7 levels of performance at a TDP of ~8W (or may be slightly higher power consumption).

I know that we can't extrapolate the performance from A7, but clearly there is a world where 8-15W TDP ARM cores would work. And who is to say that Apple didn't have parallel teams working on cores with these power envelope levels as well... After all the ARMv8 instruction set was out 2 years ago...

I guess we will find out when we find out.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |