Discussion Apple Silicon SoC thread

Page 297 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,871
1,438
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

zacharychieply

Junior Member
Apr 22, 2020
9
4
81
Qualcomm does have more talent than Apple now in CPU department.

You would think that dedicated chip companies will be far ahead of Apple but it doesn’t look limit it. Qualcomm will push clocks this year and next year as well a improving the architecture a bit.

I really don’t get why you have this fetish of Apple abandoning arm64 ISA anytime soon when they signed a deal with ARM for 40 years. RISC-V is also too immature now and is not needed for Apple.
I agree with your point but do you know what year the aggrement expires?
 

Doug S

Platinum Member
Feb 8, 2020
2,836
4,820
136
M1 : 118 mm²
M2 : 151 mm²
M3 : 146 mm²
M4 : 170 mm²

We are headed to smashing the 200 mm² mark with M5...

This is a problem why, exactly?

The handy online die calculator shows a 13mm x 13mm die (169 mm^2) results in 353 candidates per wafer, so lets call it 300 good dies per wafer. If N3E costs $24K per wafer (I'm just guessing) that's $80 per chip. Versus what, !$40 per chip with M1?

Throw in a 50% uptick for testing/packaging (not including LPDDR) and you're at $120 versus $60 with M1. Check how that compares with Intel/AMD chip pricing. Or Qualcomm X Elite pricing. And consider that Apple is targeting only the premium market. They're fine.

I will say given how tiny the NPU is in relation to the overall SoC, I'll bet any additional area in M5 is there. M4 taped out before the ChatGPT hype went stratospheric, before Apple decided to make the "Apple Intelligence" push, so they weren't able to expand it. M5's NPU will be at least double, if not triple the size I'll wager.
 

name99

Senior member
Sep 11, 2010
511
395
136
Btw, it looks like M4 has an 8-core Neural Engine?
You can't take these labels too seriously, especially preliminary versions before people have had a chance to comment.
Each block labeled NPU is two of what Apple calls a Neural Engine, basically a convolutional engine (small amount of storage, hardware loop, and 256 Multiply-Accumulators plus some others tuff (ReLU, etc).
Also in the ANE, but not included in the boxes are the area to the left. Mainly the Planar Engine and a large Buffer (equivalent of L2 cache).


Actually I take the first part of the above comment back.
If you can track down a good picture of the M3 NPU, each of the 8 blocks is not symmetric in the way that it appears in lower quality images; it is in fact quite asymmetric!
The relative areas look the same, so it's unclear what's going on. The obvious guess is that Apple consolidated the previous design (256 MACs per core) into larger 512 MACs per core. Maybe this makes sense if the convolutions of interest are large enough to justify operating on 512-sized blocks? And presumably by doing this you reduce overhead area and power.
(BTW I THINK in the M3 diagram they left out the Planar Engine part, so the spur at the bottom is just the "L2")
 
Last edited:
Reactions: SarahKerrigan

name99

Senior member
Sep 11, 2010
511
395
136
I've decided to ignore him. I can't stand people who don't want to learn. He doesn't even know how register renaming works and wants to lecture everyone. The kind of guy who never admits he was wrong, as if being wrong was a problem... unless one insists on being wrong even when presented with evidence. And as he is polluting every thread I read, that was too much for me. It seems I'm not the only one to feel the pain.

Oh well, life goes on
Make aggressive use of the ignore feature! I've enjoyed the forums much more since I started doing that.
 
Reactions: Tlh97 and Thibsie

name99

Senior member
Sep 11, 2010
511
395
136
Is nobody going to talk about 170 mm² die area for M4?

This is significant.
Is it? M2 was 155mm^2.
I can't find a number for M3 but maybe the target for M class is around 170mm^2? Like the target for A class seems to be around 100, but also seems to shift from about 85 up to 120 or so.
 

name99

Senior member
Sep 11, 2010
511
395
136
M1 : 118 mm²
M2 : 151 mm²
M3 : 146 mm²
M4 : 170 mm²

We are headed to smashing the 200 mm² mark with M5...
I think it's more that the first version on a new process tends to be smaller, then as we get new versions on essentially the same process they grow, until the cycle repeats with the new process.
That's what we saw with A series.
 

ikjadoon

Senior member
Sep 4, 2006
235
513
146
M1 : 118 mm²
M2 : 151 mm²
M3 : 146 mm²
M4 : 170 mm²

We are headed to smashing the 200 mm² mark with M5...

Out of curiosity, I spot-checked other CPUs.

MTL-U (2+8+2): ~174 mm2
AMD Zen4 APU (8-core): 178 mm2
Qualcomm X1 (12-core): ~171 mm2

Apple seems to be mostly catching up to their competitors.

//

The M4, ironically as an iPad-first chip, may be Apple further graduating away from its A12X lineage: expanded from 4+4 to 4+6, meaty 4.38 GHz clocks, and, yes, a larger area.

I note Apple only gives you the full 10-cores with the 1TB / 2TB mainboards, so perhaps Apple is not getting good enough yields to "waste" 10-good-core dies on the likely higher-volume 256 GB / 512GB mainboards.
 
Reactions: name99

Doug S

Platinum Member
Feb 8, 2020
2,836
4,820
136
Out of curiosity, I spot-checked other CPUs.

MTL-U (2+8+2): ~174 mm2
AMD Zen4 APU (8-core): 178 mm2
Qualcomm X1 (12-core): ~171 mm2

Apple seems to be mostly catching up to their competitors.

//

The M4, ironically as an iPad-first chip, may be Apple further graduating away from its A12X lineage: expanded from 4+4 to 4+6, meaty 4.38 GHz clocks, and, yes, a larger area.

I note Apple only gives you the full 10-cores with the 1TB / 2TB mainboards, so perhaps Apple is not getting good enough yields to "waste" 10-good-core dies on the likely higher-volume 256 GB / 512GB mainboards.

That's mostly a market segmentation thing; if you want all 10 cores you gotta step up and pay for the better configuration. Has little to do with yields, I'll bet 95% of the 256/512 models have 10 good cores even though they're sold with fewer. The odds of a defect in one of those cores versus all that other area on the chip are pretty small. I don't understand the rhyme or reason by which sometimes they segment based on number of GPU cores, sometimes on number of CPU cores. With NPU cores about to step up in prominence maybe they start segmenting on them next.

If they wanted that segmentation to be meaningful for yields they'd segment on all three at once. So maybe you get x/y/z cores on the "high end" and the "entry level" config is <x/<y/<z. Heck maybe they take out a set of LPDDR controllers while they're at it, those things are as big as a CPU core...
 

FlameTail

Diamond Member
Dec 15, 2021
4,095
2,465
106
I note Apple only gives you the full 10-cores with the 1TB / 2TB mainboards, so perhaps Apple is not getting good enough yields to "waste" 10-good-core dies on the likely higher-volume 256 GB / 512GB mainboard
Apple's binning process is certainly intriguing. They exclusively bin by core count, not clock speed.

Qualcomm for contrast, seems like they are dicing pineapples with their Hamoa die (X Elite. I wonder how much of that is due to yield and how much due to the intention for artifical segmentation...
 

SteinFG

Senior member
Dec 29, 2021
664
786
106
I think M4 has grown in size mainly due to core count increase (4+4 to 4+6), and bigger NPU.
M3 Pro has 12 cores (6+6), will apple increase it to 6+8 on M4 Pro? My guess is no.
M3 Max has 16 cores (12+4), they probably won't increase it there too.
 
Last edited:
Reactions: Orfosaurio

FlameTail

Diamond Member
Dec 15, 2021
4,095
2,465
106
I think M4 has Grown in size mainly due to core count increase (4+4 to 4+6), and bigger NPU.
M3 Pro has 12 cores (6+6), will apple increase it to 6+8 on M4 Pro? My guess is no.
M3 Max ahs 16 cores (12+4), they probably won't increase it there too.
N3E (M4) is less denser than N3B (M3), as well.
 
Reactions: Orfosaurio

repoman27

Senior member
Dec 17, 2018
381
536
136
How are y'all getting die size estimates when this is the first N3E chip, uses different libraries (2-1 finFLEX and 3-2 finFLEX), and is based on new microarchitechtures? What feature sizes are you using to determine scale? GPU cores?

I wouldn't be surprised if die size did increase given the base M4 now has twice as many (4) Thunderbolt ports, and I think I can make out a pair of updated / larger display engines, which would track with Apple's claims of an upgrade in that area. So ridiculous that this chip is thus far only available in an iPad. PCIe lanes are back down to 5 on the M4, versus 6 on the M2 and M3 though.
 

FlameTail

Diamond Member
Dec 15, 2021
4,095
2,465
106
How are y'all getting die size estimates when this is the first N3E chip, uses different libraries (2-1 finFLEX and 3-2 finFLEX), and is based on new microarchitechtures? What feature sizes are you using to determine scale? GPU cores?

I wouldn't be surprised if die size did increase given the base M4 now has twice as many (4) Thunderbolt ports, and I think I can make out a pair of updated / larger display engines, which would track with Apple's claims of an upgrade in that area. So ridiculous that this chip is thus far only available in an iPad. PCIe lanes are back down to 5 on the M4, versus 6 on the M2 and M3 though.
This:
 
Reactions: Mopetar and Eug

FlameTail

Diamond Member
Dec 15, 2021
4,095
2,465
106
the base M4 now has twice as many (4) Thunderbolt ports, and I think I can make out a pair of updated / larger display engines, which would track with Apple's claims of an upgrade in that area. PCIe lanes are back down to 5 on the M4, versus 6 on the M2 and M3 though.
wow, how did you guess all of that? Maybe you should label the die shot like this M3 one:
 

name99

Senior member
Sep 11, 2010
511
395
136
Apple's binning process is certainly intriguing. They exclusively bin by core count, not clock speed.

Qualcomm for contrast, seems like they are dicing pineapples with their Hamoa die (X Elite. I wonder how much of that is due to yield and how much due to the intention for artifical segmentation...
Not NECESSARILY...

Remember that Apple reuses their SoCs in a variety of products. You can never predict the details, but more or less what we probably see is
- sub-optimal A's go into Apple TV (possibly slightly lower speed, but more likely the ones that are slightly more power hungry)
- sub-optimal M's go into Mac Minis or iMacs (same thing, power hunger)
- sub-optimal S's (watch SiP) go into HomePod Mini
etc etc
Remember there's also Apple Display, and full-sized HomePod, and maybe I forgot something.


Of course some of these are only updated every two or three years.
On the other hand, are we CERTAIN that, say, an Apple TV that's nominally based on an A15 is always based on an A15? Presumably as long as they keep making A15's for older phones they'll keep routing the lousy ones to aTVs; but if they run out of A15's why not just use A16's clocked at the right frequency to make them appear much the same in performance...
And likewise for other hardware of this sort. (HomePods, Apple Display, etc).

Mac Mini/iMac is really the only one where they couldn't get away with this.
 
Reactions: Orfosaurio

The Hardcard

Senior member
Oct 19, 2021
252
332
106
This is a problem why, exactly?

The handy online die calculator shows a 13mm x 13mm die (169 mm^2) results in 353 candidates per wafer, so lets call it 300 good dies per wafer. If N3E costs $24K per wafer (I'm just guessing) that's $80 per chip. Versus what, !$40 per chip with M1?

Throw in a 50% uptick for testing/packaging (not including LPDDR) and you're at $120 versus $60 with M1. Check how that compares with Intel/AMD chip pricing. Or Qualcomm X Elite pricing. And consider that Apple is targeting only the premium market. They're fine.

I will say given how tiny the NPU is in relation to the overall SoC, I'll bet any additional area in M5 is there. M4 taped out before the ChatGPT hype went stratospheric, before Apple decided to make the "Apple Intelligence" push, so they weren't able to expand it. M5's NPU will be at least double, if not triple the size I'll wager.
Has anyone seen any information on whether the neural engine has access to all the memory bandwidth available to the larger chips? Mac Studios have been selling a lot to machine learning researchers, because of the ability to run large models on Max and Ultra devices. But these are running on the GPU because language models, especially are bandwidth limited In addition to the actual compute available.

But looking at the dies the ANE appears to be on the far side Of the CPU L2 cache. if Apple intends to significantly boost AI capabilities, it seem to me that they would need to make sure those IP blocks have access to full memory bandwidth. do they have such access?
 

repoman27

Senior member
Dec 17, 2018
381
536
136
You can't expect me to notice something that obvious. 🤣

Based on that, I measured 13.10 mm x 12.71 mm = 166.5 mm², which is right in the same neighborhood of what y'all came up with already.

wow, how did you guess all of that? Maybe you should label the die shot like this M3 one:
The Thunderbolt and PCIe blocks are super easy to pick out. I didn't spend much time on this, so apologies if it's hard to read.


The blue blocks are what I suspect to be the display controllers. It looks like there might be three this go round, but we'll probably have to wait for an M4 Mac to know for sure.
 
Reactions: name99 and Mopetar

SpudLobby

Senior member
May 18, 2022
991
682
106
Not NECESSARILY...

Remember that Apple reuses their SoCs in a variety of products. You can never predict the details, but more or less what we probably see is
- sub-optimal A's go into Apple TV (possibly slightly lower speed, but more likely the ones that are slightly more power hungry)
- sub-optimal M's go into Mac Minis or iMacs (same thing, power hunger)
- sub-optimal S's (watch SiP) go into HomePod Mini
etc etc
Remember there's also Apple Display, and full-sized HomePod, and maybe I forgot something.


Of course some of these are only updated every two or three years.
On the other hand, are we CERTAIN that, say, an Apple TV that's nominally based on an A15 is always based on an A15? Presumably as long as they keep making A15's for older phones they'll keep routing the lousy ones to aTVs; but if they run out of A15's why not just use A16's clocked at the right frequency to make them appear much the same in performance...
And likewise for other hardware of this sort. (HomePods, Apple Display, etc).

Mac Mini/iMac is really the only one where they couldn't get away with this.
This is almost certainly true but the degree to which there is variance and/or where the majority lies is important


and the mobile products are by and far the ones with the most volume. Apple’s phydes (Intrinsity IP help?) + binning + architecture is just really impressive
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |