Discussion Apple Silicon SoC thread

Page 83 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,752
1,285
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
As long as Apple has access to TSMCs next gen node, and don't do a bad job of designing a cpu and gpu, they're going to be more efficient than their pc counter parts. If we had an AMD 5 nm based laptop cpu and a 5 nm GPU, the numbers would be different. So kudos to Apple for pushing new technology. Personally I just want to buy a video card at MSRP
That is simply not true. At best about 25% of their advantage is caused by the process. Just take a look at what TSMC officially claims and then take a look at Apple's superiority again.
 
Reactions: Schmide

Eug

Lifer
Mar 11, 2000
23,752
1,285
126
If https://twitter.com/Locuza_/status/1450296155477319683?s=20 is correct, the M1 Pro/Max die shots aren't clean and weren't scaled correctly, which puts some error bars on the die size estimates.

It seems that with the M1 Pro/Max GPU's, Apple are more than willing to put up with big dies to maximize efficiency, to an extreme extent if the slides are to be believed. I suspect that for the binned GPUs (14/24core) the increased clocks from running less cores in the same thermal budget would bridge quite a bit of the compute gap vs the fully enabled versions.
Yes, I mentioned that earlier. If my on paper measurements for M1 Max were accurate, it would be 21.8 mm x 20 mm = 436 mm2. However, just comparing M1 Pro to M1 Max it would seem that the measurements are at least 5% off in each axis just from scaling. So 20.7 mm x 19 mm = 393 mm, which is a lot closer to that reported 383.5 mm2.

Working backwards, it seems a more accurate estimate for M1 Max would be about 20.4 mm x 18.8 mm = 383.5 mm2.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,010
1,608
136
That is simply not true. At best about 25% of their advantage is caused by the process. Just take a look at what TSMC officially claims and then take a look at Apple's superiority again.

it depends, TSMC claims are for identical ICs, and they are +25% perf at same power draw or -40% power draw at iso perf. If clocks are not pushed up, the process improvements are more in line with the second number. Moreover, we don't know how exactly the Intel7 (ex 10nm) and TSMC7 compare in regard of energy efficiency, but judging by the X86 competition, I would dare to say that TSMC holds an advantage there, too, for the moment.
 

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146
That is simply not true. At best about 25% of their advantage is caused by the process. Just take a look at what TSMC officially claims and then take a look at Apple's superiority again.

When it comes to the GPU portion Apple's advantage is pretty clear. It's the very definition of a wide-but-slow part. Matching the efficiency there is going to be incredibly difficult for AMD/Nvidia/Intel because none of them are willing to throw that much silicon at the problem. Just a reminder, N5 is more costly per transistor than N7 is, and M1 Max packs more of them than an A100 (though ofc, no HBM yet).

The CPU portion though? Considering Zen 3 cores at roughly 3.6GHz pull as much power as the M1 in single-threaded loads surely you can see how much a boost to power efficiency - even if it's only 15% or so - can make a difference. Of course, it wouldn't be enough on it's own to make up the full difference between the two - even assuming a 15% bump Apple would still hold a 15-20% 1T performance lead at the same power, but at that point it's really not that far off at all.

The far bigger discrepancy between AMD/Intel and Apple that really needs to be addressed is in battery life if you ask me. The M1 Max is clearly worse than the M1 in this department yet it still leaves both AMD and Intel in the dust.
 
Reactions: Tlh97

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
When it comes to the GPU portion Apple's advantage is pretty clear. It's the very definition of a wide-but-slow part. Matching the efficiency there is going to be incredibly difficult for AMD/Nvidia/Intel because none of them are willing to throw that much silicon at the problem. Just a reminder, N5 is more costly per transistor than N7 is, and M1 Max packs more of them than an A100 (though ofc, no HBM yet).

The CPU portion though? Considering Zen 3 cores at roughly 3.6GHz pull as much power as the M1 in single-threaded loads surely you can see how much a boost to power efficiency - even if it's only 15% or so - can make a difference. Of course, it wouldn't be enough on it's own to make up the full difference between the two - even assuming a 15% bump Apple would still hold a 15-20% 1T performance lead at the same power, but at that point it's really not that far off at all.

The far bigger discrepancy between AMD/Intel and Apple that really needs to be addressed is in battery life if you ask me. The M1 Max is clearly worse than the M1 in this department yet it still leaves both AMD and Intel in the dust.

A big reason that the CPU is so efficient is because Apple are willing to throw silicon at the problem- in the form of enormous caches. There is 24MB of L2 (though each core might only be able to access 12MB), and Andrei thinks that the M1 Pro has 32MB of System Level Cache, and the M1 Max has 64MB of System Level Cache. That's huge. That's like if AMD bumped up the cache size in their Zen CCX to 12MB, and then added another big fat memory-side cache on top of that.
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
it depends, TSMC claims are for identical ICs, and they are +25% perf at same power draw or -40% power draw at iso perf. If clocks are not pushed up, the process improvements are more in line with the second number. Moreover, we don't know how exactly the Intel7 (ex 10nm) and TSMC7 compare in regard of energy efficiency, but judging by the X86 competition, I would dare to say that TSMC holds an advantage there, too, for the moment.

Of course, unless something drastic happens in the next 2-3 years, x86 OEMs will inherently have much higher clock targets compared to what Apple is/has been doing. So to assume that process improvements will apply to both uniformly makes little sense.

Similarly, handwaving Apple's process node advantage solely to money assumes that refining a process to work well at ~5ghz out of the box is no more difficult than refining a process to work well at ~3GHz, which is itself a dubious assumption.
 
Last edited:

repoman27

Senior member
Dec 17, 2018
381
535
136
The die photos showing the Pro and Max dies side by side are pretty interesting. As noted by others you can see the "chop" location, but below the additional GPU cores there are some replicated structures. From left to right:

#1 "random schmear" is mirrored above the SLC block on the left
#2 "chips surrounding a bigger chip" mirrored above the SLC block on the right
#3 "more empty schmear" I don't see this one mirrored above
#4 "E and backwards E" is mirrored above #2

So what are they? One is clearly another two display controllers. What else does the Max have twice as many of as the Pro?

The big mystery is what is #3? It stands to reason the "something new" would be for off chip communication to other M1 Max dies in a larger system like Mac Pro. That block is not big enough to be a full fabric, so Apple will need an I/O die like AMD uses rather than having it built in like IBM, at least for this generation.

The I/O die will implement the fabric, and include DDR5 controllers for DIMM slots hanging off it. With up to 256 GB of LPDDR5 (unless larger LPDDR5 stacks are possible...anyone know how big those can get?) with 1.6 TB/sec of memory bandwidth in a 4 M1 Max Mac Pro it'll be fairly NUMAy when you hit the much slower DDR5 DIMMs.
#1 video encode engine?
#2 is 16-core NPU block, which is interesting because I expected them to double that to 32-core, but it doesn't appear to be enabled (in the M1 Max at least).
#3 not sure.
#4 ProRes encode/decode engine?

There actually doesn't appear to be any I/O on the bottom edge of the die. Thunderbolt and PCIe are all along the top side.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
I’m not impressed with the 3.5 lb weight of the 14” though.
Haha, that stuck out to me as being quite on the surprising heavy side as well. Exactly what makes it that heavy I wonder? Though M1 Pro 13" is also quite heavy, only M1 Air is what I'd want.

This is just based on Apple’s specs but here ya go:

View attachment 51607
Hm, I initially would have preferred an M1 Pro in 14" but it's starting to look like a plain M1 may well be the better choice for pure portability and efficiency. Looking forward to seeing details what causes these rather unexpected differences, all down to the screen?
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Not that impressive considering the circumstances. When they're using the most advanced process technology available to them with a relatively large die size and trashing software compatibility altogether, it had better win in at least one of the metrics ...

What's more is that they refuse to do apt comparisons or even to make some comparisons altogether so their graphs contains garbage information too. Geekbench leaks aren't interesting either since I'd categorize them with the "synthetic crap" section as well so these numbers aren't even worth remotely extrapolating for performance comparisons in many real world applications ...
 
Jul 27, 2020
17,939
11,703
116
Can't seem to find anything online comparing dual channel, quad channel and six channel Geekbench multicore scores on the same CPU. Is Geekbench multicore test even able to benefit from 400GB/s bandwidth of M1 Max?
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Is Geekbench multicore test even able to benefit from 400GB/s bandwidth of M1 Max?

The real questions are: is there any other software apart from HPC/ML libraries or Linpack that use that bandwidth at all and how irrelevant is such software to notebook format. And also how much latency had to be sacrificed to support those extra channels and what is the impact of that extra latency on real workloads that chip will be running.

On latency front i think extra system level cache will help mitigate some of penalties but it obviuosly remains to be seen in real tests.
 
Reactions: coercitiv

Heartbreaker

Diamond Member
Apr 3, 2006
4,262
5,259
136
Why did people expect 1.7x faster multithread? We know the little cores of the M1 are about 1/3 the speed of the big cores. If you take 4 + 4 * .33 and 8 + 2 *.33 as the multithread performance of M1 and M1 Pro/Max, and assume the same 3.2 GHz clock rate, it comes out to around 62.5% faster. Since nothing scales in perfect linear fashion getting 60% boost is about what you'd expect.

Tests that can use all the bandwidth they can get will do better than that but they are the exception not the rule.

I had done this calculation before Apple showed their numbers and got exactly 1.7x as well, but I used efficiency core as 25% of the main cores.

4p + 4e * 25% = 5 ( M1)
8p + 2e * 25% = 8.5 (M1 Pro/Max)
8.5/5 = 1.7

But I was just going by my vague memory of the relative power of an efficiency core.
 
Reactions: Eug

biostud

Lifer
Feb 27, 2003
18,400
4,965
136
That is simply not true. At best about 25% of their advantage is caused by the process. Just take a look at what TSMC officially claims and then take a look at Apple's superiority again.
What is not true? I've only stated that Apple has created a SoC that is more efficient than the x86 counterparts currently available on the market, and it will be interesting to compare it to x86 and next gen GPUs made on same 5nm technology. I'm not saying that it is all about process technology, but that is part of it.
 

Eug

Lifer
Mar 11, 2000
23,752
1,285
126
I had done this calculation before Apple showed their numbers and got exactly 1.7x as well, but I used efficiency core as 25% of the main cores.

4p + 4e * 25% = 5 ( M1)
8p + 2e * 25% = 8.5 (M1 Pro/Max)
8.5/5 = 1.7

But I was just going by my vague memory of the relative power of an efficiency core.
According to this set of tests, M1 Icestorm efficiency cores are anywhere from 18% to 52% as fast as M1 Firestorm performance cores.


Time for Icestorm to complete task vs Firestorm:

190% for assembly language
330% at SIMD (Accelerate) library functions
280% for simple Swift
550% for "idiomatic" Swift processing


Note that for AppleArchive compression, the Icestorm cores alone take 717% of the time to complete vs all M1 cores together, meaning that for this action, they are 14% as fast as the overall speed.
 
Last edited:

StinkyPinky

Diamond Member
Jul 6, 2002
6,830
877
126
Haha, that stuck out to me as being quite on the surprising heavy side as well. Exactly what makes it that heavy I wonder? Though M1 Pro 13" is also quite heavy, only M1 Air is what I'd want.


Hm, I initially would have preferred an M1 Pro in 14" but it's starting to look like a plain M1 may well be the better choice for pure portability and efficiency. Looking forward to seeing details what causes these rather unexpected differences, all down to the screen?

I feel like an air with a cut down M1 Pro CPU (like 12 core GPU) would sell like hotcakes.

Will also be interesting to see if they update their smaller imacs to include the M1 Pro, it's a desktop so why not.
 
Reactions: moinmoin

Eug

Lifer
Mar 11, 2000
23,752
1,285
126
This is not surprising to me but it does lead to interesting conclusions about their desktop parts. I presume the new mac pro and maybe even imac will use a desktop version of this with more cores on the GPU/CPU.
Nah, I’m thinking the 30” iMac will use the exact same M1 Pro and M1 Max chips. Maybe not the 24” though. And I think the Mac mini will also get them. In the very least the Mac mini will get the M1 Pro.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,262
5,259
136
Nah, I’m thinking the 30” iMac will use the exact same M1 Pro and M1 Max chips. Maybe not the 24” though. And I think the Mac mini will also get them. In the very least the Mac mini will get the M1 Pro.

I concur. M1 Pro/Max will first be used to remove the rest of the Intel lineup. So Big iMac, and replace the 6 core Intel Mac Mini (so at least pro) .

They could theoretically put it in the small iMac, but that won't be a priority.

Then all that is left will be the Mac Pro. It's still murky what that might look like, and thus still exciting.
 
Reactions: scannall

nxre

Member
Nov 19, 2020
60
103
66
These seem to be using the last-gen cores, which kinda lines up to the rumours the refresh was meant to come in WWCD but got delayed due to supply issues. Not that it makes much of a difference, given A15 mainly seems to focus on increasing efficiency, which doesn't matter as much on their pro laptops.
I'm still questioning why they didn't just go with 4E cores, given the space they take is negligible compared to the massive die they are making, and it helps boost MC scores.
 

Doug S

Platinum Member
Feb 8, 2020
2,493
4,059
136
You may be right, but FWIW, this is what Apple had to say about it:


“The CPU in M1 Pro and M1 Max delivers up to 70 percent faster CPU performance than M1, so tasks like compiling projects in Xcode are faster than ever.”

Well, when you say "up to" it can mean a lot of things. I would expect that Apple has an internal benchmark suite they believe is representative of how their customers use their products. If you get a range of results, with the ones that benefit a bit more from memory bandwidth reaching 70%, and a few that don't scale well for MT dragging down the average to 60%.

Its really irrelevant, we will have benchmarks soon and I'm sure some will easily exceed 70% - something that's totally bandwidth dependent like STREAM might be over 3x faster on M1 Max.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,262
5,259
136
These seem to be using the last-gen cores, which kinda lines up to the rumours the refresh was meant to come in WWCD but got delayed due to supply issues. Not that it makes much of a difference, given A15 mainly seems to focus on increasing efficiency, which doesn't matter as much on their pro laptops.
I'm still questioning why they didn't just go with 4E cores, given the space they take is negligible compared to the massive die they are making, and it helps boost MC scores.

There seems to be negligible difference between A14 and A15 Performance cores. The Efficiency cores improved a bit, but with only 2 of them here, their impact would likely be unnoticed.

So it could be the new cores. Hardly really matters.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |