Discussion Apple Silicon SoC thread

Page 364 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,924
1,525
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

MS_AT

Senior member
Jul 15, 2024
364
798
96
Intel build used 64GB and I doubt using 64GB on 9950X will make it beat the M4 Max let alone make up for the power consumption difference.
Thanks for pointing it out, I have took another look at the data. Since I am not using this benchmarking framework myself I am not sure, do I read this right that Intel and AMD cpus were using gcc 13.2 to compile the codebases in the test?

M4Max entry is listing gcc 16, clang 16 and Xcode 16. This is curious because gcc 16 does not exist yet. Little digging and it seems that MacOS will alias gcc on clang if gcc itself is not installed on the system.

So if I read it right then M4Max was compiling with clang. And x64 with GCC. At work clang vs gcc on the same machine can be from as fast to almost twice as fast as gcc.

So tl:dr it seems to me different compilers were used and from my experience M4Max was using the faster one.Unless I misread the data that is, then feel free to ignore my rambling

Of course at iso compiler comparison M4Max might win, and compiler choice will not reduce the power draw, but I would like to see such an apples to apples comparison first
 

poke01

Platinum Member
Mar 8, 2022
2,581
3,409
106
Thanks for pointing it out, I have took another look at the data. Since I am not using this benchmarking framework myself I am not sure, do I read this right that Intel and AMD cpus were using gcc 13.2 to compile the codebases in the test?

M4Max entry is listing gcc 16, clang 16 and Xcode 16. This is curious because gcc 16 does not exist yet. Little digging and it seems that MacOS will alias gcc on clang if gcc itself is not installed on the system.

So if I read it right then M4Max was compiling with clang. And x64 with GCC. At work clang vs gcc on the same machine can be from as fast to almost twice as fast as gcc.

So tl:dr it seems to me different compilers were used and from my experience M4Max was using the faster one.Unless I misread the data that is, then feel free to ignore my rambling

Of course at iso compiler comparison M4Max might win, and compiler choice will not reduce the power draw, but I would like to see such an apples to apples comparison first
Good catch.

I really hope Intel has something great with Nova Lake I can get the longer compile times but power consumption difference is too much. I do think Artic Wolf will diminish this if the rumoured IPC increase is true.
 

Doug S

Platinum Member
Feb 8, 2020
2,888
4,911
136

The one part that really stood out to me:

Total energy used to complete one thread is therefore over 23 J when run on P cores, and less than 1.7 J when run on E cores. E cores therefore use only 7% of the energy that P cores do performing the same task.
 

Eug

Lifer
Mar 11, 2000
23,924
1,525
126
The one part that really stood out to me:
I remember back in the day there was an ad/demo from Cyrix/VIA showing they could run a game on their chip with no heatsink without it being fried to an crisp, and that was considered absolutely crazy and shocking. However, that chip was way, way slower than the other x86 chips at the time, maybe >5 years behind.

Contrast that to M4.
 

johnsonwax

Member
Jun 27, 2024
96
160
66
At low QoS, and so running at 1GHz…
Running at full speed (ie as an augmentation to P-cores) it’s not quite so impressive, but still pretty good.
I don't think they were ever designed to augment P-cores as in a unix-based OS, there are always plenty of other processes to task the E-cores with, and the scheduler is designed to ensure that those processes don't touch the P-cores, which is probably the bigger performance benefit.
 
Reactions: moinmoin

name99

Senior member
Sep 11, 2010
526
412
136
I don't think they were ever designed to augment P-cores as in a unix-based OS, there are always plenty of other processes to task the E-cores with, and the scheduler is designed to ensure that those processes don't touch the P-cores, which is probably the bigger performance benefit.
I am referring to when enough "P"-threads exist that some "P-work" overflows to E-cores.
It is primarily in those circumstances that the E-cores run at ~2.6GHz rather than 1GHz, delivering a rather better performance(!), but also at not quite so low an energy usage.
 

digitaldreamer

Junior Member
Mar 23, 2007
20
14
81
I am referring to when enough "P"-threads exist that some "P-work" overflows to E-cores.
It is primarily in those circumstances that the E-cores run at ~2.6GHz rather than 1GHz, delivering a rather better performance(!), but also at not quite so low an energy usage.
Which brings up a question I've always wondered: How does the core know when to run at full clock speed or not? And, how does the microprocessor know what threads to run on the P-cores and which to run on the E-cores?
 

The Hardcard

Senior member
Oct 19, 2021
271
351
106
Which brings up a question I've always wondered: How does the core know when to run at full clock speed or not? And, how does the microprocessor know what threads to run on the P-cores and which to run on the E-cores?

The frequency ramp is just a time function, if the code is not complete at particular intervals, the core will ramp to top frequency. Within milliseconds. macOS has several Quality of Service levels (I believe 4 are provided to developers), with all but the lowest going to an e core, quickly maxing it out, switching to the P cores and then filling the P cores. The e cores engage when there are more threads than P cores.

Apple’s OSes have a thread pool system originally called Grand Central Dispatch (now just Dispatch) that is the preferred way for developers to go multithreaded.

The exception is QoS 9, which allows a developer to declare their code as background level. These will stay on an e core at low speed. Background level code never leaves the e cores. If only background threads are running, as many as 30 or more of them will jam into the e cores, with the P cores remaining off. However, the frequency will rise to max if there are multiple background tasks.
 

name99

Senior member
Sep 11, 2010
526
412
136
Which brings up a question I've always wondered: How does the core know when to run at full clock speed or not? And, how does the microprocessor know what threads to run on the P-cores and which to run on the E-cores?
The OS has some control, for example background tasks are marked as such (given a QoS flag) and will run on an E-core at lowest frequency. All well-written macOS/iOS code indicates a QoS (the role of the code, eg background code or UI code or whatever).

Other tasks indicate when they need to be completed, and give intermediate progress reports; frequency can be set based on this. This might seem a rare case, but think games (where you need to complete the frame, but being faster than 60Hz or whatever won't help) or media playback, or even UI animations.

Meanwhile at the CPU level itself, the hardware can track the type of code that is running and slow the clock if appropriate. For example if the code is primarily moving data around (so that most of the time the CPU is waiting on DRAM) after this happens a few times, power will be saved by reducing the CPU clock until we exit this DRAM-limited portion of the code.

It's an imperfect business but Apple, far more so than anyone else, has made it work well through a combination of
- aggressive tracking of many many metrics across every IP block in the SoC
- reporting those metrics to centralized controllers (eg first to a CPU, then upward to a cluster controller, then upward to a SoC frequency/power controller)
- giving the OS quick and easy access to some of those metrics
- aggressively encouraging developers (and making it easy to do so) to indicate the ROLE of each piece of code they provide to the OS, so that the OS knows how important it is.
 

name99

Senior member
Sep 11, 2010
526
412
136
Don’t rule it out. In 2026, there will be a president willing to set sky high tariffs, even 200 to 500 percent to force what he thinks should happen and there is chatter that some on his team want US companies to use US fabs, for now that would be Intel.
Apple are probably endlessly engaged in talking to Intel about what they offer, and looking at their tools. This is just common sense - when I was there we, for example, spent some time talking to IBM about how Cell worked and what we could do with it. Then people see who's present at what parking lot (or even someone within Intel spills who was at a meeting last week) and the rumors begin.

But talking doesn't turn into action unless the item being talked about actually delivers some value. That wasn't true for Cell, and I suspect it won't be true for Intel as of 2026.
However I wouldn't be surprised if Apple try to "compile" the A20 design to Intel fab specs and (depending on what progress looks like, and maybe whether Intel will spring for the masks...) even try to run a test wafer and see what happens.

It's obviously in Intel's interests to go along with this as much as possible. Nothing would be more convincing that their foundry is not a joke than landing Apple. And even if the 2026 Apple test run fails, presumably the fab should learn from it what the most demanding customer in the world requires.

At the technical level, the next interesting step is BSPDN. GAA, sure, but that's more of the same, and everyone's doing the same thing. With BSPDN, Intel has chosen a path that is (probably) easier but also (probably) not as performant or powerful as TSMC. This (maybe) gets Intel bragging rights and the ability to cheer FRIST!!! for a year, but I suspect Apple will prioritize the delay, and the extra capabilities of the TSMC path over Intel. Apple are already filing patents for how to design denser SRAM based on backside signal routing, and as far as I can tell, PowerVIA, at least what we have seen of it, doesn't give them the functionality they need for this.
 

jdubs03

Golden Member
Oct 1, 2013
1,079
746
136
Don’t rule it out. In 2026, there will be a president willing to set sky high tariffs, even 200 to 500 percent to force what he thinks should happen and there is chatter that some on his team want US companies to use US fabs, for now that would be Intel.
Despite all the talk, and his prior statements going back over 30 years, his economic team will surely know that he’s about to tank the technology sectors and specifically Apple if he implements there sky high tariffs. 60% is what he originally said and that would be pretty bad as it is.
 

DrMrLordX

Lifer
Apr 27, 2000
22,184
11,887
136
Don’t rule it out. In 2026, there will be a president willing to set sky high tariffs, even 200 to 500 percent to force what he thinks should happen and there is chatter that some on his team want US companies to use US fabs, for now that would be Intel.
All that manufacturing is moving to Vietnam anyway to avoid high costs in China.
 

Doug S

Platinum Member
Feb 8, 2020
2,888
4,911
136
However I wouldn't be surprised if Apple try to "compile" the A20 design to Intel fab specs and (depending on what progress looks like, and maybe whether Intel will spring for the masks...) even try to run a test wafer and see what happens.

Really doubt they'd use a full sized SoC like A20 for such testing. I'm willing to bet they already have a test chip design that has just enough functionality to verify performance and yield that they already use in early stages (pre risk) to characterize future TSMC processes. It would be a lot cheaper to port that design to Intel's foundry than a full featured SoC.

If Apple starts using Intel's foundry they'd start small with something lower volume that doesn't impact the timelines of their most critical products. Maybe make SoCs for the Watch - if a new model was delayed by a quarter or two it doesn't impact them too much. Or maybe make cellular modems for the Watch and non-Pro iPad, neither are critical to the functionality of the product since it is optional in those and at least in the Watch's case lower spec than the ones going into the phones.
 

oak8292

Member
Sep 14, 2016
112
116
116
Really doubt they'd use a full sized SoC like A20 for such testing. I'm willing to bet they already have a test chip design that has just enough functionality to verify performance and yield that they already use in early stages (pre risk) to characterize future TSMC processes. It would be a lot cheaper to port that design to Intel's foundry than a full featured SoC.

If Apple starts using Intel's foundry they'd start small with something lower volume that doesn't impact the timelines of their most critical products. Maybe make SoCs for the Watch - if a new model was delayed by a quarter or two it doesn't impact them too much. Or maybe make cellular modems for the Watch and non-Pro iPad, neither are critical to the functionality of the product since it is optional in those and at least in the Watch's case lower spec than the ones going into the phones.
Apple probably isn’t doing test chips at Intel as ARM is working with Intel to provide physical IP on ARM cores. The relative performance of the ARM cores will give Apple the data they need and then they will deal with capacity issues.

“IFS and Arm will undertake design technology co-optimization (DTCO), in which chip design and process technologies are optimized together to improve power, performance, area and cost (PPAC) for Arm cores targeting Intel 18A process technology.”

 

johnsonwax

Member
Jun 27, 2024
96
160
66
Don’t rule it out. In 2026, there will be a president willing to set sky high tariffs, even 200 to 500 percent to force what he thinks should happen and there is chatter that some on his team want US companies to use US fabs, for now that would be Intel.
That presumes Intel would be able to make an A20 that wasn't slower than a TSMC A19. If there were such a market disruption, I don't see how Apple could avoid a cycle without new silicon.

I mean, Apple's A/M series volume is roughly equal to Intel's total volume - including their stuff on older nodes. There is no universe in which even if Intel had a competitive node, they would remotely have the volume that Apple requires.
 
Reactions: smalM
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |