Discussion Apple Silicon SoC thread

Page 355 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,924
1,525
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

MS_AT

Senior member
Jul 15, 2024
365
798
96
That's GB6.
I would not expect Geekbench to regress with x64 support when they introduce things like SME on ARM.
There is actually a difference in one of the subtests that is above noise (almost 10%), but only in ST and it doesn't seem it has any notable influence on the average. So without a few runs hard to say if it was one time anomaly. But well, I think this discussion should move then GB specific thread.

On topic:

Was there any reviewer that tried to even look into M family performance counters and do similar analysis that Chips&Cheese do but for Apple chips? Are Perf Counters even available for mortals?
 

digitaldreamer

Junior Member
Mar 23, 2007
20
14
81
Reactions: igor_kavinski

name99

Senior member
Sep 11, 2010
526
412
136
I would not expect Geekbench to regress with x64 support when they introduce things like SME on ARM.

There is actually a difference in one of the subtests that is above noise (almost 10%), but only in ST and it doesn't seem it has any notable influence on the average. So without a few runs hard to say if it was one time anomaly. But well, I think this discussion should move then GB specific thread.

On topic:

Was there any reviewer that tried to even look into M family performance counters and do similar analysis that Chips&Cheese do but for Apple chips? Are Perf Counters even available for mortals?
Yes perf counters are available. Yes they can be used.

The BIG problem with all the standard sites (Chips and Cheese, Geekerwan, James Aslan) is that they are not especially curious. They will run microbenchmarks or look at perf counters and then say what they say (eg "The branch predictor looks same level as Intel" or "Fetch is not as good as ARM which can fetch across two taken branches per cycle" or "The memory forwarding is OK, but nothing special, slower than Intel")
all of which is *technically* true but fails to deal with the obvious question: OK then, why is Apple so much faster?

That's the question I answered in my PDFs. And that's the question about which all these x86-based sites have ZERO curiosity. They're just not interested in investigating how Apple does things DIFFERENTLY, it's enough for them to say "Well, when it comes to Apple doing some x86 thing, they do it OK"; they don't have the imagination or curiosity to see what Apple does that is NOT just an x86 thing.

It's pathetic, truly pathetic, and yet it's true.

Meanwhile, for people who get it. who do understand what Apple is doing that's a decade beyond where Intel and ARM are living, let me give a few choice recent patents:


https://patents.google.com/patent/US12067398B1

https://patents.google.com/patent/US12001847B1
 

jdubs03

Golden Member
Oct 1, 2013
1,079
746
136
Yes perf counters are available. Yes they can be used.

The BIG problem with all the standard sites (Chips and Cheese, Geekerwan, James Aslan) is that they are not especially curious. They will run microbenchmarks or look at perf counters and then say what they say (eg "The branch predictor looks same level as Intel" or "Fetch is not as good as ARM which can fetch across two taken branches per cycle" or "The memory forwarding is OK, but nothing special, slower than Intel")
all of which is *technically* true but fails to deal with the obvious question: OK then, why is Apple so much faster?

That's the question I answered in my PDFs. And that's the question about which all these x86-based sites have ZERO curiosity. They're just not interested in investigating how Apple does things DIFFERENTLY, it's enough for them to say "Well, when it comes to Apple doing some x86 thing, they do it OK"; they don't have the imagination or curiosity to see what Apple does that is NOT just an x86 thing.

It's pathetic, truly pathetic, and yet it's true.

Meanwhile, for people who get it. who do understand what Apple is doing that's a decade beyond where Intel and ARM are living, let me give a few choice recent patents:


https://patents.google.com/patent/US12067398B1

https://patents.google.com/patent/US12001847B1
I’d be curious about these PDFs!
 
Reactions: igor_kavinski

MS_AT

Senior member
Jul 15, 2024
365
798
96
Yes perf counters are available. Yes they can be used.

The BIG problem with all the standard sites (Chips and Cheese, Geekerwan, James Aslan) is that they are not especially curious. They will run microbenchmarks or look at perf counters and then say what they say (eg "The branch predictor looks same level as Intel" or "Fetch is not as good as ARM which can fetch across two taken branches per cycle" or "The memory forwarding is OK, but nothing special, slower than Intel")
all of which is *technically* true but fails to deal with the obvious question: OK then, why is Apple so much faster?

That's the question I answered in my PDFs. And that's the question about which all these x86-based sites have ZERO curiosity. They're just not interested in investigating how Apple does things DIFFERENTLY, it's enough for them to say "Well, when it comes to Apple doing some x86 thing, they do it OK"; they don't have the imagination or curiosity to see what Apple does that is NOT just an x86 thing.

It's pathetic, truly pathetic, and yet it's true.

Meanwhile, for people who get it. who do understand what Apple is doing that's a decade beyond where Intel and ARM are living, let me give a few choice recent patents:


https://patents.google.com/patent/US12067398B1

https://patents.google.com/patent/US12001847B1
I remember I run into googledocs in the making that was trying to describe M1 back in the day, but since this was work in progress I thought I would come back to it later and.... I lost the link Were you the author / one of the authors and could you share the link?
 

name99

Senior member
Sep 11, 2010
526
412
136
Will M5 exceed 5000 in GB6?

M5
5 GHz
5000 GB6

Poetic.
It's not, IMHO, impossible. N2 (ie GAA) allows for another frequency boost without breaking the power bank, so I could see 10% from that.
There's also scope for 10% from ongoing IPC (ESPECIALLY if Apple implements SVE, which remains a big, and mainly political, question).

Recent patents suggest M4 is tentatively exploring
- value prediction
- reuse of wrong-path work executed after branch convergence
(Both of these are pretty cutting edge stuff, which may well mean that the first implementation or two is hidden behind chicken bits and will not be made public, not until Apple is absolutely sure there are no failure edge cases.)

What's interesting is that there are ideas that I think are simpler which Apple has not yet implemented, which is one reason I'm not yet pessimistic that IPC growth is over. Simpler ideas include
- a zero content cache (patents suggest Apple has SOMETHING like this, but exploration [admittedly of M1] shows no evidence, so this might be coming soon?)
- fully decoupled fetch (address generation separated from I-cache by a queue)
- instruction criticality marking (which allows for all sorts of other ideas once you have it)
- detecting instruction readiness at Rename, and bypassing ready instructions (generally about 30% of the instruction stream) past the Issue Queues. This allows scaling the machine to ~30% larger without much energy cost.
- handling a 30% larger machine without massively expensive additions to Rename by decoupling Rename from Decode with a queue (something Intel and ARM are already doing)
- maybe doing simple logic at Rename (like Intel adds small constants at Rename). Apple already have a (patent, again not yet any evidence, but no-one has explored M4 seriously yet) for load-op fusion, and simple logic at Rename is rather similar.

So I don't think 20% for M5 is unreasonable, and it's in line with the (averaged) M1->M2->M3->M4 so far. Of course that doesn't get us quite to 5000...

IF SVE is added, that might take us over the finish line... (But there are, uh, issues with SVE, which is why I think Apple, and now QC, have been so foot-dragging about it. All things considered, a better solution is probably to forget SVE and switch to a NEON++, basically NEON as it is, with predicate masks and the tail-SIMD support of SVE but everything else dropped. That will take some political negotiation, so who knows how this will play out.)
 

Meteor Late

Member
Dec 15, 2023
116
98
61
It's not, IMHO, impossible. N2 (ie GAA) allows for another frequency boost without breaking the power bank, so I could see 10% from that.
There's also scope for 10% from ongoing IPC (ESPECIALLY if Apple implements SVE, which remains a big, and mainly political, question).

Recent patents suggest M4 is tentatively exploring
- value prediction
- reuse of wrong-path work executed after branch convergence
(Both of these are pretty cutting edge stuff, which may well mean that the first implementation or two is hidden behind chicken bits and will not be made public, not until Apple is absolutely sure there are no failure edge cases.)

What's interesting is that there are ideas that I think are simpler which Apple has not yet implemented, which is one reason I'm not yet pessimistic that IPC growth is over. Simpler ideas include
- a zero content cache (patents suggest Apple has SOMETHING like this, but exploration [admittedly of M1] shows no evidence, so this might be coming soon?)
- fully decoupled fetch (address generation separated from I-cache by a queue)
- instruction criticality marking (which allows for all sorts of other ideas once you have it)
- detecting instruction readiness at Rename, and bypassing ready instructions (generally about 30% of the instruction stream) past the Issue Queues. This allows scaling the machine to ~30% larger without much energy cost.
- handling a 30% larger machine without massively expensive additions to Rename by decoupling Rename from Decode with a queue (something Intel and ARM are already doing)
- maybe doing simple logic at Rename (like Intel adds small constants at Rename). Apple already have a (patent, again not yet any evidence, but no-one has explored M4 seriously yet) for load-op fusion, and simple logic at Rename is rather similar.

So I don't think 20% for M5 is unreasonable, and it's in line with the (averaged) M1->M2->M3->M4 so far. Of course that doesn't get us quite to 5000...

IF SVE is added, that might take us over the finish line... (But there are, uh, issues with SVE, which is why I think Apple, and now QC, have been so foot-dragging about it. All things considered, a better solution is probably to forget SVE and switch to a NEON++, basically NEON as it is, with predicate masks and the tail-SIMD support of SVE but everything else dropped. That will take some political negotiation, so who knows how this will play out.)

I don't think Apple is using N2 with M5 though, most likely N3P, unless M5 is delayed to 2026.
 

Doug S

Platinum Member
Feb 8, 2020
2,888
4,911
136
Doubt it. Both of those I’d see for M6.

Surely M5's multitronic technology can push it past the 5000 barrier!

If Apple doesn't do some gag with Nomad hovering out on stage to deliver the first M5 Mac or something like that (which would be easy to do since it isn't a live event) I'm gonna be disappointed in their marketing team.
 
Reactions: okoroezenwa

name99

Senior member
Sep 11, 2010
526
412
136
I don't think Apple is using N2 with M5 though, most likely N3P, unless M5 is delayed to 2026.
N2 is supposed to be high volume in 2H2025. I assumed M5 will arrive more or less at the expected time (say October 2025), NOT as an April surprise in an iPad - at least in part because that chunk of time will be devoted to M4 Ultra (and Extreme?)

Of course an early M5 changes the calculus.
 

johnsonwax

Member
Jun 27, 2024
96
160
66
It's not, IMHO, impossible. N2 (ie GAA) allows for another frequency boost without breaking the power bank, so I could see 10% from that.
There's also scope for 10% from ongoing IPC (ESPECIALLY if Apple implements SVE, which remains a big, and mainly political, question).
I think it's unlikely that Apple will pull out that kind of a gap on the rest of the industry.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |