Discussion Apple Silicon SoC thread

Page 289 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,752
1,284
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
5,593
8,770
136
Yes you’re right the Zen 4c is smaller without any cache. Zen 4c is also weaker in 1t and delivers less performance than a Zen 4 core.

A Zen 4 (without cache)core should be about the same size as the A17 core excluding cache. I have edited the post.

I think we can dismiss Apples cores being fat. They are not at all bad area wise considering the performance.

This is not an attempt to make fun. Both AMD and Apple make cores that suit their respective markets first. Both make excellent CPU cores but in the end execution to the end user matters and Apple delivers the highest IPC for now.

I think that's being a little generous towards Apple. I'm not saying they have bad area utilization at all (looking at you, Intel), obviously their core designs are industry leading in PPA, but it is a decently fat core given equal nodes and frequencies.

Zen 4 Core + L2 = 3.84 mm2 with max boost of ~5.7 GHz.
Zen 4c Core + L2 = 2.48 mm2 with max boost of ~3.7 GHz
M2 Core + L2 ~ 2.76 mm2 + 4.3 mm2 = 7.06 mm2 with max boost of 3.5 GHz

So, on the same node with similar max frequencies, the M2 core (including L2) is 73% 185% bigger than Zen4c. Obviously that comes with much better IPC, but the trade-off is definitely there.

Edit: Fixed M2 core area, previously it was just the L2 core area. Adding also that L2 is shared for the M2 whereas L2 is private with Zen 4 with a shared L3. If you want to just look at core area without L2 cache, you get the following:

Zen 4 Core = 2.56 mm2 with max boost of ~5.7 GHz.
Zen 4c Core = 1.43 mm2 with max boost of ~3.7 GHz.
M2 core = 2.76 mm2 with max boost of 3.5 GHz.

So you are left with the M2 core being 8% bigger than a Zen 4 core and 93% bigger than a Zen 4c core.
 
Last edited:

adroc_thurston

Diamond Member
Jul 2, 2023
3,319
4,788
96
So, on the same node with similar max frequencies, the M2 core (including L2) is 73% bigger than Zen4c. Obviously that comes with much better IPC, but the trade-off is definitely there.
You need to wait like 2 more days (and then for someone to pry poor Strix away from a rando FP8 laptop) to have a more even-grounds comparison.
 

Hitman928

Diamond Member
Apr 15, 2012
5,593
8,770
136
You need to wait like 2 more days (and then for someone to pry poor Strix away from a rando FP8 laptop) to have a more even-grounds comparison.

What do you mean by even-grounds? Zen4c vs M2 is very even-grounds. If you mean that we’ll see how Zen 5 can compete on PPA when they allow the designers to make a fatter core, then, sure, that will obviously be an interesting comparison.
 
Reactions: Tlh97

roger_k

Member
Sep 23, 2021
102
215
86
I think that's being a little generous towards Apple. I'm not saying they have bad area utilization at all (looking at you, Intel), obviously their core designs are industry leading in PPA, but it is a decently fat core given equal nodes and frequencies.

Zen 4 Core + L2 = 3.84 mm2 with max boost of ~5.7 GHz.
Zen 4c Core + L2 = 2.48 mm2 with max boost of ~3.7 GHz
M2 Core + L2 ~ 4.3 mm2 with max boost of 3.5 GHz

So, on the same node with similar max frequencies, the M2 core (including L2) is 73% bigger than Zen4c. Obviously that comes with much better IPC, but the trade-off is definitely there.

Erm, Zen4 has what, 1MB L2 cache per core, and M2 has 4MB? Of course Apple ends up being slightly larger, SRAM is not free. The M2 P-core core itself (sans cache) is practically identical to Zen4 at 2.6mm2 (which is interesting if one considers that Apple has wider arch, more L1, and much larger buffers). And sure, Zen4c is smaller, but it's also half the speed.
 
Reactions: Viknet

adroc_thurston

Diamond Member
Jul 2, 2023
3,319
4,788
96
Erm, Zen4 has what, 1MB L2 cache per core, and M2 has 4MB? Of course Apple ends up being slightly larger, SRAM is not free. The M2 P-core core itself (sans cache) is practically identical to Zen4 at 2.6mm2 (which is interesting if one considers that Apple has wider arch, more L1, and much larger buffers). And sure, Zen4c is smaller, but it's also half the speed.
They use different caching strategies.
Apple is hueg L1 with hueg shared L2.
AMD is tiny L1, sizeable private L2 and shared L3.
The latter is a distinctly server thing Apple can just skip.
 

roger_k

Member
Sep 23, 2021
102
215
86
They use different caching strategies.
Apple is hueg L1 with hueg shared L2.
AMD is tiny L1, sizeable private L2 and shared L3.
The latter is a distinctly server thing Apple can just skip.

Exactly.

Apple essentially merges the "traditional" CPU L3 into L2, between-cluster communication is handled via the SoC-level cache from what I understand. A meaningful comparison should also consider the architectural differences. Apple's CPU cluster essentially plays the same role as AMD's CCX, which the notable difference that Apple's clusters are smaller (but they moved to 6-core clusters in M3 Pro/Max).

If one really wants to compare structure areas, one should probably look at the cluser level.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,319
4,788
96
Apple essentially merges the "traditional" CPU L3 into L2
Well no, Apple stuff is traditional, shared L3 for SMPs is a relatively new thing.
L1 + shared or even private L2 is how we used to roll.

Then we can do LNL caching to enter True Hell, which has private L0, private L1, private L2, shared L3 and memside SLC. Because why not?
If one really wants to compare structure areas, one should probably look at the cluser level.
Bingo!
But even that is kinda tough since AMD dense cluster is cachelet, and classic aims for a much higher fmax.
 

Doug S

Platinum Member
Feb 8, 2020
2,481
4,037
136
I've said it before I'll say it again. I'm amazed at people's ability to ignore actual accomplishments instead substituting their projections about what they expect/hope/dream will happen in the future.

Apple releases the fastest single threaded CPU on the planet - IN A FREAKIN' TABLET - and people are like "oh but they didn't increase IPC enough in the 7 months since the previous Apple Silicon CPU came out, stick a fork in them because Intel/AMD/Qualcomm/magic fairies are going to leave them in the dust if what I want to believe about the future happens AND Apple stands still!"

How about waiting until someone beats Apple's IPC in actual unbiased testing, or even decisively beats Apple's single threaded performance at any IPC, before throwing dirt on their grave? BTW, I assume that at least the latter will happen pretty soon, because CPU performance is a game of leapfrog. But I would hold off dancing on Apple's grave when that happens because I've seen a lot of that in the last year or two then M4 came out and some people have had to take massive injections of copium to handle the reality it dealt them.

Already people want to put asterisks on Apple's GB6 results because of SME, while they were more than happy to take the gains from AVX512. You don't like GB6, fine. Let's use SPEC then - and not the vendor reported scam scores, but run using relatively standard open source compilers (gcc/llvm/clang) using one or two flags, no special malloc libraries, etc. Then we don't have to worry too much about either SME or AVX2/AVX512/AVX10, because compilers are rarely able to figure out how to use either when compiling from ordinary C source code.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,319
4,788
96
How about waiting until someone beats Apple's IPC in actual unbiased testing, or even decisively beats Apple's single threaded performance at any IPC, before throwing dirt on their grave?
TWO MORE DAYS
while they were more than happy to take the gains from AVX512.
That's actually usable SIMD unlike SME which does dumb GEMM that can be offloaded to no less than two other blobs on Apple's own SoCs. Please.
Now, if Apple had proper SVE2 implementation, would've been great. alas.
Let's use SPEC then - and not the vendor reported scam scores, but run using relatively standard open source compilers (gcc/llvm/clang) using one or two flags
SPEC hasn't been kind to Apple since M3.
And uhhhh, let's just say certain people will try.
 
Reactions: igor_kavinski

Hitman928

Diamond Member
Apr 15, 2012
5,593
8,770
136
Apple delivers excellent 1t at even at low clocks, using low power and while maintaining area. The cores and design deserve praise.

Of course, which is why I said they are the PPA leaders for the entire industry. But they also aren't magicians and are using a decently fat core to get to where they are. There's nothing wrong with that, just pointing it out is all.

Erm, Zen4 has what, 1MB L2 cache per core, and M2 has 4MB? Of course Apple ends up being slightly larger, SRAM is not free. The M2 P-core core itself (sans cache) is practically identical to Zen4 at 2.6mm2 (which is interesting if one considers that Apple has wider arch, more L1, and much larger buffers). And sure, Zen4c is smaller, but it's also half the speed.

The L2 cache on M2 is 16 MB. I've edited my post as I mistakenly used the size of just the L2 cache before. If you include the L2 cache, the size difference is crazy (185% M2 to Zen4c when L2 is included). It's not exactly 1:1 because as @adroc_thurston pointed out, the cache hierarchy/scheme is different, but even if you include the L3 for Zen, there is still the 8 MB SLC on M2 that the cores have access to as well. If you want to just look at the cores, sans L2 cache, the size difference between Zen4c and M2 is still quite stark (M2 is 93% bigger) and the M2 core is still slightly larger than Zen 4 (non-c). Zen 4 at 5.7 GHz is a little faster than M2 (~13%) but obviously blows out power to get there, which is why Apple's designs are the PPA leaders.
 
Last edited:

FlameTail

Diamond Member
Dec 15, 2021
3,150
1,800
106
It is pretty interesting the entire industry seems to have converged at Firestorm-class IPC.

For those who are unaware, Firestorm is the name of the P-core in Apple M1. I would define "Firestorm-Class IPC" as being +/- 20% to that of the IPC of Firestorm.

Apple M4-P, ARM Cortex X925 and Qualcomm Oryon V1 are all Firestorm-class cores. According to Adroc, AMD will also be joining the podium with Zen5.
 

poke01

Golden Member
Mar 8, 2022
1,386
1,600
106
It is pretty interesting the entire industry seems to have converged at Firestorm-class IPC.

For those who are unaware, Firestorm is the name of the P-core in Apple M1. I would define "Firestorm-Class IPC" as being +/- 20% to that of the IPC of Firestorm.

Apple M4-P, ARM Cortex X925 and Qualcomm Oryon V1 are all Firestorm-class cores. According to Adroc, AMD will also be joining the podium with Zen5.
Firestorm was indeed true to its name. It’s still blazing today.
 
Reactions: Henry swagger

FlameTail

Diamond Member
Dec 15, 2021
3,150
1,800
106
Swift, Cyclone, Typhoon, Twister, Hurricane, Monsoon, Vortex, Lightning, Firestorm, Avalanche, Everest...

What a legendary journey it has been.
 

roger_k

Member
Sep 23, 2021
102
215
86
TWO MORE DAYS

What happens in two more days?

That's actually usable SIMD unlike SME which does dumb GEMM that can be offloaded to no less than two other blobs on Apple's own SoCs.

SME is a fairly usable vector SIMD unless you need flexible permute. 250 GFLOPS vector FMA/cluster is nothign to sneeze about.

SPEC hasn't been kind to Apple since M3.

How come? Aren't they leading the industry in pretty much every single SPEC subtest?
 

roger_k

Member
Sep 23, 2021
102
215
86
Yes, as Adroc believes.

I wish I had their blind faith

BTW, this is M4 compared to Zen 4 iso-clock (without Object Detection). That's quite a distance to close for team red.



P.S. And just for fun, this is the same comparison between Zen4 and M1. Turns out they have identical IPC! In GB6 Blur subtest at least...

 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |