Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

poke01 · Jun 18, 2024

FlameTail said:
Is nobody going to talk about 170 mm² die area for M4?

This is significant.

Just imagine M4 Max….

FlameTail · Jun 18, 2024

M1 : 118 mm²
M2 : 151 mm²
M3 : 146 mm²
M4 : 170 mm²

We are headed to smashing the 200 mm² mark with M5...

Hitman928 · Jun 18, 2024

FlameTail said:
@Hulk Can you labelling the die areas of invidual components for this M4 die shot, like you did with LNL?

Assuming the full SoC area shown is (edit) 165 mm^2, this is what I get for the CPU cores and caches.

Hitman928 · Jun 18, 2024

FlameTail said:
M1 : 118 mm²
M2 : 151 mm²
M3 : 146 mm²
M4 : 170 mm²

We are headed to smashing the 200 mm² mark with M5...

I get ~165 mm^2 for M4 based on that image. I will adjust the core and cache areas to reflect this.

zacharychieply · Jun 18, 2024

poke01 said:
Qualcomm does have more talent than Apple now in CPU department.

You would think that dedicated chip companies will be far ahead of Apple but it doesn’t look limit it. Qualcomm will push clocks this year and next year as well a improving the architecture a bit.

I really don’t get why you have this fetish of Apple abandoning arm64 ISA anytime soon when they signed a deal with ARM for 40 years. RISC-V is also too immature now and is not needed for Apple.

I agree with your point but do you know what year the aggrement expires?

Doug S · Jun 18, 2024

FlameTail said:
M1 : 118 mm²
M2 : 151 mm²
M3 : 146 mm²
M4 : 170 mm²

We are headed to smashing the 200 mm² mark with M5...

This is a problem why, exactly?

The handy online die calculator shows a 13mm x 13mm die (169 mm^2) results in 353 candidates per wafer, so lets call it 300 good dies per wafer. If N3E costs $24K per wafer (I'm just guessing) that's $80 per chip. Versus what, !$40 per chip with M1?

Throw in a 50% uptick for testing/packaging (not including LPDDR) and you're at $120 versus $60 with M1. Check how that compares with Intel/AMD chip pricing. Or Qualcomm X Elite pricing. And consider that Apple is targeting only the premium market. They're fine.

I will say given how tiny the NPU is in relation to the overall SoC, I'll bet any additional area in M5 is there. M4 taped out before the ChatGPT hype went stratospheric, before Apple decided to make the "Apple Intelligence" push, so they weren't able to expand it. M5's NPU will be at least double, if not triple the size I'll wager.

name99 · Jun 18, 2024

FlameTail said:
Btw, it looks like M4 has an 8-core Neural Engine?

You can't take these labels too seriously, especially preliminary versions before people have had a chance to comment.
Each block labeled NPU is two of what Apple calls a Neural Engine, basically a convolutional engine (small amount of storage, hardware loop, and 256 Multiply-Accumulators plus some others tuff (ReLU, etc).
Also in the ANE, but not included in the boxes are the area to the left. Mainly the Planar Engine and a large Buffer (equivalent of L2 cache).

Actually I take the first part of the above comment back.
If you can track down a good picture of the M3 NPU, each of the 8 blocks is not symmetric in the way that it appears in lower quality images; it is in fact quite asymmetric!
The relative areas look the same, so it's unclear what's going on. The obvious guess is that Apple consolidated the previous design (256 MACs per core) into larger 512 MACs per core. Maybe this makes sense if the convolutions of interest are large enough to justify operating on 512-sized blocks? And presumably by doing this you reduce overhead area and power.
(BTW I THINK in the M3 diagram they left out the Planar Engine part, so the spur at the bottom is just the "L2")

name99 · Jun 18, 2024

Nothingness said:
I've decided to ignore him. I can't stand people who don't want to learn. He doesn't even know how register renaming works and wants to lecture everyone. The kind of guy who never admits he was wrong, as if being wrong was a problem... unless one insists on being wrong even when presented with evidence. And as he is polluting every thread I read, that was too much for me. It seems I'm not the only one to feel the pain.

Oh well, life goes on

Make aggressive use of the ignore feature! I've enjoyed the forums much more since I started doing that.

name99 · Jun 18, 2024

FlameTail said:
Is nobody going to talk about 170 mm² die area for M4?

This is significant.

Is it? M2 was 155mm^2.
I can't find a number for M3 but maybe the target for M class is around 170mm^2? Like the target for A class seems to be around 100, but also seems to shift from about 85 up to 120 or so.

name99 · Jun 18, 2024

FlameTail said:
M1 : 118 mm²
M2 : 151 mm²
M3 : 146 mm²
M4 : 170 mm²

We are headed to smashing the 200 mm² mark with M5...

I think it's more that the first version on a new process tends to be smaller, then as we get new versions on essentially the same process they grow, until the cycle repeats with the new process.
That's what we saw with A series.

ikjadoon · Jun 18, 2024

FlameTail said:
M1 : 118 mm²
M2 : 151 mm²
M3 : 146 mm²
M4 : 170 mm²

We are headed to smashing the 200 mm² mark with M5...

Out of curiosity, I spot-checked other CPUs.

MTL-U (2+8+2): ~174 mm2
AMD Zen4 APU (8-core): 178 mm2
Qualcomm X1 (12-core): ~171 mm2

Apple seems to be mostly catching up to their competitors.

//

The M4, ironically as an iPad-first chip, may be Apple further graduating away from its A12X lineage: expanded from 4+4 to 4+6, meaty 4.38 GHz clocks, and, yes, a larger area.

I note Apple only gives you the full 10-cores with the 1TB / 2TB mainboards, so perhaps Apple is not getting good enough yields to "waste" 10-good-core dies on the likely higher-volume 256 GB / 512GB mainboards.

Doug S · Jun 19, 2024

ikjadoon said:
Out of curiosity, I spot-checked other CPUs.

MTL-U (2+8+2): ~174 mm2
AMD Zen4 APU (8-core): 178 mm2
Qualcomm X1 (12-core): ~171 mm2

Apple seems to be mostly catching up to their competitors.

//

The M4, ironically as an iPad-first chip, may be Apple further graduating away from its A12X lineage: expanded from 4+4 to 4+6, meaty 4.38 GHz clocks, and, yes, a larger area.

I note Apple only gives you the full 10-cores with the 1TB / 2TB mainboards, so perhaps Apple is not getting good enough yields to "waste" 10-good-core dies on the likely higher-volume 256 GB / 512GB mainboards.

That's mostly a market segmentation thing; if you want all 10 cores you gotta step up and pay for the better configuration. Has little to do with yields, I'll bet 95% of the 256/512 models have 10 good cores even though they're sold with fewer. The odds of a defect in one of those cores versus all that other area on the chip are pretty small. I don't understand the rhyme or reason by which sometimes they segment based on number of GPU cores, sometimes on number of CPU cores. With NPU cores about to step up in prominence maybe they start segmenting on them next.

If they wanted that segmentation to be meaningful for yields they'd segment on all three at once. So maybe you get x/y/z cores on the "high end" and the "entry level" config is <x/<y/<z. Heck maybe they take out a set of LPDDR controllers while they're at it, those things are as big as a CPU core...

FlameTail · Jun 19, 2024

ikjadoon said:
I note Apple only gives you the full 10-cores with the 1TB / 2TB mainboards, so perhaps Apple is not getting good enough yields to "waste" 10-good-core dies on the likely higher-volume 256 GB / 512GB mainboard

Apple's binning process is certainly intriguing. They exclusively bin by core count, not clock speed.

Qualcomm for contrast, seems like they are dicing pineapples with their Hamoa die (X Elite. I wonder how much of that is due to yield and how much due to the intention for artifical segmentation...

SteinFG · Jun 19, 2024

I think M4 has grown in size mainly due to core count increase (4+4 to 4+6), and bigger NPU.
M3 Pro has 12 cores (6+6), will apple increase it to 6+8 on M4 Pro? My guess is no.
M3 Max has 16 cores (12+4), they probably won't increase it there too.

FlameTail · Jun 19, 2024

SteinFG said:
I think M4 has Grown in size mainly due to core count increase (4+4 to 4+6), and bigger NPU.
M3 Pro has 12 cores (6+6), will apple increase it to 6+8 on M4 Pro? My guess is no.
M3 Max ahs 16 cores (12+4), they probably won't increase it there too.

N3E (M4) is less denser than N3B (M3), as well.

SiliconFly · Jun 19, 2024

FlameTail said:
N3E (M4) is less denser than N3B (M3), as well.

Some of the key reasons for the N3B -> N3E move is cheaper, easier designs & better yields. Not because it's superior, cos it's not.

FlameTail · Jun 19, 2024

SiliconFly said:
Some of the key reasons for the N3B -> N3E move is cheaper, easier designs & better yields. Not because it's superior, cos it's not.

+ Power
+ Performance
+ Cost
+ Yields
- Density

repoman27 · Jun 19, 2024

How are y'all getting die size estimates when this is the first N3E chip, uses different libraries (2-1 finFLEX and 3-2 finFLEX), and is based on new microarchitechtures? What feature sizes are you using to determine scale? GPU cores?

I wouldn't be surprised if die size did increase given the base M4 now has twice as many (4) Thunderbolt ports, and I think I can make out a pair of updated / larger display engines, which would track with Apple's claims of an upgrade in that area. So ridiculous that this chip is thus far only available in an iPad. PCIe lanes are back down to 5 on the M4, versus 6 on the M2 and M3 though.

FlameTail · Jun 19, 2024

repoman27 said:
How are y'all getting die size estimates when this is the first N3E chip, uses different libraries (2-1 finFLEX and 3-2 finFLEX), and is based on new microarchitechtures? What feature sizes are you using to determine scale? GPU cores?

I wouldn't be surprised if die size did increase given the base M4 now has twice as many (4) Thunderbolt ports, and I think I can make out a pair of updated / larger display engines, which would track with Apple's claims of an upgrade in that area. So ridiculous that this chip is thus far only available in an iPad. PCIe lanes are back down to 5 on the M4, versus 6 on the M2 and M3 though.

This:

Mopetar · Jun 19, 2024

That requires too much mental effort to work out. Can you post a die shot picture with a banana in it so that we can have an easier time working out the size?

FlameTail · Jun 19, 2024

repoman27 said:
the base M4 now has twice as many (4) Thunderbolt ports, and I think I can make out a pair of updated / larger display engines, which would track with Apple's claims of an upgrade in that area. PCIe lanes are back down to 5 on the M4, versus 6 on the M2 and M3 though.

wow, how did you guess all of that? Maybe you should label the die shot like this M3 one:

name99 · Jun 19, 2024

FlameTail said:
Apple's binning process is certainly intriguing. They exclusively bin by core count, not clock speed.

Qualcomm for contrast, seems like they are dicing pineapples with their Hamoa die (X Elite. I wonder how much of that is due to yield and how much due to the intention for artifical segmentation...

Not NECESSARILY...

Remember that Apple reuses their SoCs in a variety of products. You can never predict the details, but more or less what we probably see is
- sub-optimal A's go into Apple TV (possibly slightly lower speed, but more likely the ones that are slightly more power hungry)
- sub-optimal M's go into Mac Minis or iMacs (same thing, power hunger)
- sub-optimal S's (watch SiP) go into HomePod Mini
etc etc
Remember there's also Apple Display, and full-sized HomePod, and maybe I forgot something.

Of course some of these are only updated every two or three years.
On the other hand, are we CERTAIN that, say, an Apple TV that's nominally based on an A15 is always based on an A15? Presumably as long as they keep making A15's for older phones they'll keep routing the lousy ones to aTVs; but if they run out of A15's why not just use A16's clocked at the right frequency to make them appear much the same in performance...
And likewise for other hardware of this sort. (HomePods, Apple Display, etc).

Mac Mini/iMac is really the only one where they couldn't get away with this.

The Hardcard · Jun 19, 2024

Doug S said:
This is a problem why, exactly?

The handy online die calculator shows a 13mm x 13mm die (169 mm^2) results in 353 candidates per wafer, so lets call it 300 good dies per wafer. If N3E costs $24K per wafer (I'm just guessing) that's $80 per chip. Versus what, !$40 per chip with M1?

Throw in a 50% uptick for testing/packaging (not including LPDDR) and you're at $120 versus $60 with M1. Check how that compares with Intel/AMD chip pricing. Or Qualcomm X Elite pricing. And consider that Apple is targeting only the premium market. They're fine.

I will say given how tiny the NPU is in relation to the overall SoC, I'll bet any additional area in M5 is there. M4 taped out before the ChatGPT hype went stratospheric, before Apple decided to make the "Apple Intelligence" push, so they weren't able to expand it. M5's NPU will be at least double, if not triple the size I'll wager.

Has anyone seen any information on whether the neural engine has access to all the memory bandwidth available to the larger chips? Mac Studios have been selling a lot to machine learning researchers, because of the ability to run large models on Max and Ultra devices. But these are running on the GPU because language models, especially are bandwidth limited In addition to the actual compute available.

But looking at the dies the ANE appears to be on the far side Of the CPU L2 cache. if Apple intends to significantly boost AI capabilities, it seem to me that they would need to make sure those IP blocks have access to full memory bandwidth. do they have such access?

repoman27 · Jun 19, 2024

FlameTail said:
This:
View attachment 101464

You can't expect me to notice something that obvious. 🤣

Based on that, I measured 13.10 mm x 12.71 mm = 166.5 mm², which is right in the same neighborhood of what y'all came up with already.

FlameTail said:
wow, how did you guess all of that? Maybe you should label the die shot like this M3 one:

The Thunderbolt and PCIe blocks are super easy to pick out. I didn't spend much time on this, so apologies if it's hard to read.

The blue blocks are what I suspect to be the display controllers. It looks like there might be three this go round, but we'll probably have to wait for an M4 Mac to know for sure.

SpudLobby · Jun 19, 2024

name99 said:
Not NECESSARILY...

Remember that Apple reuses their SoCs in a variety of products. You can never predict the details, but more or less what we probably see is
- sub-optimal A's go into Apple TV (possibly slightly lower speed, but more likely the ones that are slightly more power hungry)
- sub-optimal M's go into Mac Minis or iMacs (same thing, power hunger)
- sub-optimal S's (watch SiP) go into HomePod Mini
etc etc
Remember there's also Apple Display, and full-sized HomePod, and maybe I forgot something.

Of course some of these are only updated every two or three years.
On the other hand, are we CERTAIN that, say, an Apple TV that's nominally based on an A15 is always based on an A15? Presumably as long as they keep making A15's for older phones they'll keep routing the lousy ones to aTVs; but if they run out of A15's why not just use A16's clocked at the right frequency to make them appear much the same in performance...
And likewise for other hardware of this sort. (HomePods, Apple Display, etc).

Mac Mini/iMac is really the only one where they couldn't get away with this.

This is almost certainly true but the degree to which there is variance and/or where the majority lies is important

and the mobile products are by and far the ones with the most volume. Apple’s phydes (Intrinsity IP help?) + binning + architecture is just really impressive

Discussion Apple Silicon SoC thread

Lifer

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Junior Member

Platinum Member

Senior member

Senior member

Senior member

Senior member

Senior member

Platinum Member

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Senior member

Senior member