Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

igor_kavinski · Nov 3, 2024

MS_AT said:
Didnt someone on this forum prove that AVX512 gives Zen5 no benefits in Geekbench?

I don't know. The 285K doesn't do as well in the same subtest: https://browser.geekbench.com/v5/cpu/compare/23023412?baseline=23018554

CouncilorIrissa · Nov 4, 2024

MS_AT said:
Didnt someone on this forum prove that AVX512 gives Zen5 no benefits in Geekbench?

Yes.

igor_kavinski · Nov 4, 2024

CouncilorIrissa said:
Yes.

That's GB6.

MS_AT · Nov 4, 2024

igor_kavinski said:
That's GB6.

I would not expect Geekbench to regress with x64 support when they introduce things like SME on ARM.

CouncilorIrissa said:
Yes.

There is actually a difference in one of the subtests that is above noise (almost 10%), but only in ST and it doesn't seem it has any notable influence on the average. So without a few runs hard to say if it was one time anomaly. But well, I think this discussion should move then GB specific thread.

On topic:

Was there any reviewer that tried to even look into M family performance counters and do similar analysis that Chips&Cheese do but for Apple chips? Are Perf Counters even available for mortals?

digitaldreamer · Nov 4, 2024

igor_kavinski said:
14900K hackintosh vs. M4 Pro: https://browser.geekbench.com/v5/cpu/compare/23023862?baseline=23018554

Wonder how long Apple will allow some users to "enjoy" the MacOS experience on the cheap.

My hackintosh back in the day was about the same price as the base M4 Pro, but had significantly less RAM, spinning platters for storage, USB 2 ports, and a huge heatsink with 5" fan. How far we've come.

Eug · Nov 4, 2024

igor_kavinski said:
14900K hackintosh vs. M4 Pro: https://browser.geekbench.com/v5/cpu/compare/23023862?baseline=23018554

Wonder how long Apple will allow some users to "enjoy" the MacOS experience on the cheap.

Some macOS Sequoia features are not supported on Intel Macs at all.

name99 · Nov 4, 2024

MS_AT said:
I would not expect Geekbench to regress with x64 support when they introduce things like SME on ARM.

There is actually a difference in one of the subtests that is above noise (almost 10%), but only in ST and it doesn't seem it has any notable influence on the average. So without a few runs hard to say if it was one time anomaly. But well, I think this discussion should move then GB specific thread.

On topic:

Was there any reviewer that tried to even look into M family performance counters and do similar analysis that Chips&Cheese do but for Apple chips? Are Perf Counters even available for mortals?

Yes perf counters are available. Yes they can be used.

The BIG problem with all the standard sites (Chips and Cheese, Geekerwan, James Aslan) is that they are not especially curious. They will run microbenchmarks or look at perf counters and then say what they say (eg "The branch predictor looks same level as Intel" or "Fetch is not as good as ARM which can fetch across two taken branches per cycle" or "The memory forwarding is OK, but nothing special, slower than Intel")
all of which is *technically* true but fails to deal with the obvious question: OK then, why is Apple so much faster?

That's the question I answered in my PDFs. And that's the question about which all these x86-based sites have ZERO curiosity. They're just not interested in investigating how Apple does things DIFFERENTLY, it's enough for them to say "Well, when it comes to Apple doing some x86 thing, they do it OK"; they don't have the imagination or curiosity to see what Apple does that is NOT just an x86 thing.

It's pathetic, truly pathetic, and yet it's true.

Meanwhile, for people who get it. who do understand what Apple is doing that's a decade beyond where Intel and ARM are living, let me give a few choice recent patents:

US20240354109A1 - Re-use of Speculative Load Instruction Results from Wrong Path - Google Patents

Disclosed techniques relate to re-use of speculative results from an incorrect execution path. In some embodiments, when a control transfer instruction is mispredicted, a load instruction may have been executed on the wrong path. In disclosed embodiments, result storage circuitry records...

patents.google.com

https://patents.google.com/patent/US12067398B1

https://patents.google.com/patent/US12001847B1

jdubs03 · Nov 4, 2024

name99 said:
Yes perf counters are available. Yes they can be used.

The BIG problem with all the standard sites (Chips and Cheese, Geekerwan, James Aslan) is that they are not especially curious. They will run microbenchmarks or look at perf counters and then say what they say (eg "The branch predictor looks same level as Intel" or "Fetch is not as good as ARM which can fetch across two taken branches per cycle" or "The memory forwarding is OK, but nothing special, slower than Intel")
all of which is *technically* true but fails to deal with the obvious question: OK then, why is Apple so much faster?

That's the question I answered in my PDFs. And that's the question about which all these x86-based sites have ZERO curiosity. They're just not interested in investigating how Apple does things DIFFERENTLY, it's enough for them to say "Well, when it comes to Apple doing some x86 thing, they do it OK"; they don't have the imagination or curiosity to see what Apple does that is NOT just an x86 thing.

It's pathetic, truly pathetic, and yet it's true.

Meanwhile, for people who get it. who do understand what Apple is doing that's a decade beyond where Intel and ARM are living, let me give a few choice recent patents:

US20240354109A1 - Re-use of Speculative Load Instruction Results from Wrong Path - Google Patents

Disclosed techniques relate to re-use of speculative results from an incorrect execution path. In some embodiments, when a control transfer instruction is mispredicted, a load instruction may have been executed on the wrong path. In disclosed embodiments, result storage circuitry records...

patents.google.com

https://patents.google.com/patent/US12067398B1

https://patents.google.com/patent/US12001847B1

I’d be curious about these PDFs!

MS_AT · Nov 4, 2024

name99 said:
Yes perf counters are available. Yes they can be used.

The BIG problem with all the standard sites (Chips and Cheese, Geekerwan, James Aslan) is that they are not especially curious. They will run microbenchmarks or look at perf counters and then say what they say (eg "The branch predictor looks same level as Intel" or "Fetch is not as good as ARM which can fetch across two taken branches per cycle" or "The memory forwarding is OK, but nothing special, slower than Intel")
all of which is *technically* true but fails to deal with the obvious question: OK then, why is Apple so much faster?

That's the question I answered in my PDFs. And that's the question about which all these x86-based sites have ZERO curiosity. They're just not interested in investigating how Apple does things DIFFERENTLY, it's enough for them to say "Well, when it comes to Apple doing some x86 thing, they do it OK"; they don't have the imagination or curiosity to see what Apple does that is NOT just an x86 thing.

It's pathetic, truly pathetic, and yet it's true.

Meanwhile, for people who get it. who do understand what Apple is doing that's a decade beyond where Intel and ARM are living, let me give a few choice recent patents:

US20240354109A1 - Re-use of Speculative Load Instruction Results from Wrong Path - Google Patents

Disclosed techniques relate to re-use of speculative results from an incorrect execution path. In some embodiments, when a control transfer instruction is mispredicted, a load instruction may have been executed on the wrong path. In disclosed embodiments, result storage circuitry records...

patents.google.com

https://patents.google.com/patent/US12067398B1

https://patents.google.com/patent/US12001847B1

I remember I run into googledocs in the making that was trying to describe M1 back in the day, but since this was work in progress I thought I would come back to it later and.... I lost the link Were you the author / one of the authors and could you share the link?

igor_kavinski · Nov 4, 2024

MS_AT said:
Were you the author / one of the authors and could you share the link?

I think this is him: https://github.com/name99-org/AArch64-Explore

jdubs03 · Nov 4, 2024

That’s some real dedication right there. Sheesh!

poke01 · Nov 5, 2024

Someone needs to try M4. It looks the M series is pretty good at CPU ray tracing.

EDIT: This is Dolphin bench 5.0

fastandfurious6 · Nov 5, 2024

4100 on GB6 is crazy.....

FlameTail · Nov 5, 2024

Will M5 exceed 5000 in GB6?

M5
5 GHz
5000 GB6

Poetic.

okoroezenwa · Nov 5, 2024

FlameTail said:
Will M5 exceed 5000 in GB6?

M5
5 GHz
5000 GB6

Poetic.

Doubt it. Both of those I’d see for M6.

Meteor Late · Nov 5, 2024

FlameTail said:
Will M5 exceed 5000 in GB6?

M5
5 GHz
5000 GB6

Poetic.

Nah I think next year we are in for a more typical 10-12% improvement.

igor_kavinski · Nov 5, 2024

poke01 said:
Someone needs to try M4. It looks the M series is pretty good at CPU ray tracing.

I ran it on my M1 and it completed in about 235 seconds. That's less than 11900K. I certainly paid more than an 11900K for it. I've been screwed!

poke01 · Nov 5, 2024

igor_kavinski said:
I ran it on my M1 and it completed in about 235 seconds. That's less than 11900K. I certainly paid more than an 11900K for it. I've been screwed!

should’ve got the M1 Mac mini then

In all seriousness that’s pretty good for a passively cooled laptop.

name99 · Nov 5, 2024

jdubs03 said:
I’d be curious about these PDFs!

GitHub - name99-org/AArch64-Explore

Contribute to name99-org/AArch64-Explore development by creating an account on GitHub.

github.com

name99 · Nov 5, 2024

FlameTail said:
Will M5 exceed 5000 in GB6?

M5
5 GHz
5000 GB6

Poetic.

It's not, IMHO, impossible. N2 (ie GAA) allows for another frequency boost without breaking the power bank, so I could see 10% from that.
There's also scope for 10% from ongoing IPC (ESPECIALLY if Apple implements SVE, which remains a big, and mainly political, question).

Recent patents suggest M4 is tentatively exploring
- value prediction
- reuse of wrong-path work executed after branch convergence
(Both of these are pretty cutting edge stuff, which may well mean that the first implementation or two is hidden behind chicken bits and will not be made public, not until Apple is absolutely sure there are no failure edge cases.)

What's interesting is that there are ideas that I think are simpler which Apple has not yet implemented, which is one reason I'm not yet pessimistic that IPC growth is over. Simpler ideas include
- a zero content cache (patents suggest Apple has SOMETHING like this, but exploration [admittedly of M1] shows no evidence, so this might be coming soon?)
- fully decoupled fetch (address generation separated from I-cache by a queue)
- instruction criticality marking (which allows for all sorts of other ideas once you have it)
- detecting instruction readiness at Rename, and bypassing ready instructions (generally about 30% of the instruction stream) past the Issue Queues. This allows scaling the machine to ~30% larger without much energy cost.
- handling a 30% larger machine without massively expensive additions to Rename by decoupling Rename from Decode with a queue (something Intel and ARM are already doing)
- maybe doing simple logic at Rename (like Intel adds small constants at Rename). Apple already have a (patent, again not yet any evidence, but no-one has explored M4 seriously yet) for load-op fusion, and simple logic at Rename is rather similar.

So I don't think 20% for M5 is unreasonable, and it's in line with the (averaged) M1->M2->M3->M4 so far. Of course that doesn't get us quite to 5000...

IF SVE is added, that might take us over the finish line... (But there are, uh, issues with SVE, which is why I think Apple, and now QC, have been so foot-dragging about it. All things considered, a better solution is probably to forget SVE and switch to a NEON++, basically NEON as it is, with predicate masks and the tail-SIMD support of SVE but everything else dropped. That will take some political negotiation, so who knows how this will play out.)

Meteor Late · Nov 5, 2024

name99 said:
It's not, IMHO, impossible. N2 (ie GAA) allows for another frequency boost without breaking the power bank, so I could see 10% from that.
There's also scope for 10% from ongoing IPC (ESPECIALLY if Apple implements SVE, which remains a big, and mainly political, question).

Recent patents suggest M4 is tentatively exploring
- value prediction
- reuse of wrong-path work executed after branch convergence
(Both of these are pretty cutting edge stuff, which may well mean that the first implementation or two is hidden behind chicken bits and will not be made public, not until Apple is absolutely sure there are no failure edge cases.)

What's interesting is that there are ideas that I think are simpler which Apple has not yet implemented, which is one reason I'm not yet pessimistic that IPC growth is over. Simpler ideas include
- a zero content cache (patents suggest Apple has SOMETHING like this, but exploration [admittedly of M1] shows no evidence, so this might be coming soon?)
- fully decoupled fetch (address generation separated from I-cache by a queue)
- instruction criticality marking (which allows for all sorts of other ideas once you have it)
- detecting instruction readiness at Rename, and bypassing ready instructions (generally about 30% of the instruction stream) past the Issue Queues. This allows scaling the machine to ~30% larger without much energy cost.
- handling a 30% larger machine without massively expensive additions to Rename by decoupling Rename from Decode with a queue (something Intel and ARM are already doing)
- maybe doing simple logic at Rename (like Intel adds small constants at Rename). Apple already have a (patent, again not yet any evidence, but no-one has explored M4 seriously yet) for load-op fusion, and simple logic at Rename is rather similar.

So I don't think 20% for M5 is unreasonable, and it's in line with the (averaged) M1->M2->M3->M4 so far. Of course that doesn't get us quite to 5000...

IF SVE is added, that might take us over the finish line... (But there are, uh, issues with SVE, which is why I think Apple, and now QC, have been so foot-dragging about it. All things considered, a better solution is probably to forget SVE and switch to a NEON++, basically NEON as it is, with predicate masks and the tail-SIMD support of SVE but everything else dropped. That will take some political negotiation, so who knows how this will play out.)

I don't think Apple is using N2 with M5 though, most likely N3P, unless M5 is delayed to 2026.

Doug S · Nov 5, 2024

okoroezenwa said:
Doubt it. Both of those I’d see for M6.

Surely M5's multitronic technology can push it past the 5000 barrier!

If Apple doesn't do some gag with Nomad hovering out on stage to deliver the first M5 Mac or something like that (which would be easy to do since it isn't a live event) I'm gonna be disappointed in their marketing team.

name99 · Nov 5, 2024

Meteor Late said:
I don't think Apple is using N2 with M5 though, most likely N3P, unless M5 is delayed to 2026.

N2 is supposed to be high volume in 2H2025. I assumed M5 will arrive more or less at the expected time (say October 2025), NOT as an April surprise in an iPad - at least in part because that chunk of time will be devoted to M4 Ultra (and Extreme?)

Of course an early M5 changes the calculus.

johnsonwax · Nov 5, 2024

name99 said:
It's not, IMHO, impossible. N2 (ie GAA) allows for another frequency boost without breaking the power bank, so I could see 10% from that.
There's also scope for 10% from ongoing IPC (ESPECIALLY if Apple implements SVE, which remains a big, and mainly political, question).

I think it's unlikely that Apple will pull out that kind of a gap on the rest of the industry.

jdubs03 · Nov 5, 2024

johnsonwax said:
I think it's unlikely that Apple will pull out that kind of a gap on the rest of the industry.

They’ve done it before?
Qualcomm Oryon v3 might be pretty damn performant. Apple will want to out innovate the Nuvia team.

Discussion Apple Silicon SoC thread

Lifer

Lifer

Senior member

Lifer

Senior member

Junior Member

Lifer

Senior member

Golden Member

Senior member

Lifer

Golden Member

Platinum Member

Senior member

Diamond Member

Member

Member

Lifer

Platinum Member

Senior member

Senior member

Member

Platinum Member

Senior member

Member

Golden Member