Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

roger_k · May 11, 2024

soresu said:
Dunno about SME, but SVE was developed first as an ARM/academic research effort, and then later with Fujitsu for the Fugaku/A64FX supercomputer CPU core into the SVE1 instruction set.

I am talking about streaming mode SVE, not SVE. Streaming mode SVE is part of SME. Basic description is in the original ARM blog post:

Scalable Matrix Extension for the Armv9-A Architecture

In this blog, read the details for Scalable Matrix Extension (SME). This is a new extension from the latest Arm Vision day announcement for Armv9-A.

community.arm.com

soresu said:
What does "outer product matmul ISA" mean?

SME and Apple AMX use outer product as the basic operation to implement matrix multiplication. This is explained in the blog post I linked.

Nothingness · May 11, 2024

soresu said:
Are you saying that Cortex A5xx, 7xx and X cores do not even have it in their baseline IP despite it being part of the v9-A spec?

..... Huh, I was sure it would at least be in Neoverse V2 or V3, but I can't find any reference to it at all - which you would assume would be there if they wanted it advertised as a part of the feature set.

I think Arm has yet to announce a core with SME support. SME architecture was published years after SVE; both of these features are optional in V9.

As far as Qualcomm disabling SVE in their latest SoC (through firmware if I understood correctly) despite it being supported by the Arm-designed core goes, I wonder if it's not because their upcoming core lacks SVE. Sorry if I sound like a conspiracy theorist.

soresu · May 11, 2024

Nothingness said:
I think Arm has yet to announce a core with SME support. SME architecture was published years after SVE; both of these features are optional in V9.

A couple of years yeah, and SME2 came about 14 months later.

igor_kavinski · May 11, 2024

What is the possibility that Apple Mx SoCs have a higher frequency ceiling in macbooks but Apple will only pump it to the extreme if and only if it is forced to make itself look better against some competitor?

Take this GB6 ST score of ~3800. Maybe in Macbooks it can hit 4100. BUT suppose Zen 5 achieves a max score of only 3600. So Apple keeps M4 max frequency limited such that max GB6 score would be around 3900 or even 4000. That means about 100 points would be lost in time since Apple refuses to unlock their CPUs for overclocking or manual tweaking. Now suppose everyone jumped on this bandwagon of limiting max performance because what the customer doesn't know, the customer won't/can't complain about.

I know already that Doug approves of this methodology. He may write a paragraph extolling the virtues of this approach and how it's smart and clever. What do the rest of you think?

roger_k · May 11, 2024

igor_kavinski said:
What is the possibility that Apple Mx SoCs have a higher frequency ceiling in macbooks but Apple will only pump it to the extreme if and only if it is forced to make itself look better against some competitor?

Take this GB6 ST score of ~3800. Maybe in Macbooks it can hit 4100. BUT suppose Zen 5 achieves a max score of only 3600. So Apple keeps M4 max frequency limited such that max GB6 score would be around 3900 or even 4000. That means about 100 points would be lost in time since Apple refuses to unlock their CPUs for overclocking or manual tweaking. Now suppose everyone jumped on this bandwagon of limiting max performance because what the customer doesn't know, the customer won't/can't complain about.

I know already that Doug approves of this methodology. He may write a paragraph extolling the virtues of this approach and how it's smart and clever. What do the rest of you think?

What would Apple gain from this? They do not re-release same chips with higher frequencies (unlike Intel or AMD), each of their designs involves some rebalancing act.

igor_kavinski · May 11, 2024

roger_k said:
What would Apple gain from this? They do not re-release same chips with higher frequencies (unlike Intel or AMD), each of their designs involves some rebalancing act.

Maybe lower power consumption and more battery life? And also they get to show much bigger gen on gen performance gains.

okoroezenwa · May 11, 2024

igor_kavinski said:
What is the possibility that Apple Mx SoCs have a higher frequency ceiling in macbooks but Apple will only pump it to the extreme if and only if it is forced to make itself look better against some competitor?

Take this GB6 ST score of ~3800. Maybe in Macbooks it can hit 4100. BUT suppose Zen 5 achieves a max score of only 3600. So Apple keeps M4 max frequency limited such that max GB6 score would be around 3900 or even 4000. That means about 100 points would be lost in time since Apple refuses to unlock their CPUs for overclocking or manual tweaking. Now suppose everyone jumped on this bandwagon of limiting max performance because what the customer doesn't know, the customer won't/can't complain about.

I know already that Doug approves of this methodology. He may write a paragraph extolling the virtues of this approach and how it's smart and clever. What do the rest of you think?

It's possible, and I think it'd be a sensible way to go. They get to look good in terms of performance and power usage metrics and who wouldn't want that? Now I'm curious if that's what the M2 Max frequency increase was about...

roger_k · May 11, 2024

igor_kavinski said:
Maybe lower power consumption and more battery life? And also they get to show much bigger gen on gen performance gains.

They have their performance targets and balance around it. For example, one of their goals seems to be all day battery life. It’s obvious that they are not going to clock the chip so high that this target is not achieved. However, that is not the same as deliberately withholding performance from the users just to trump a competitor.

Now, I fully agree with you that most M-chips can likely be clocked higher. Oryon is probably very similar to Firestorm and the Nuvia team can push the clocks up quite a bit - albeit it seems at a hefty power cost. However, the cornerstone of Apples strategy seems to be consistency. They set the target to the common denominator rather than following the more common trend of binning the chips by performance class. Apple differentiates horizontally, not vertically. “Bigger” chips get more cores, but those cores are not faster on their own. They did offer slightly higher clocks on M2 Max, and maybe we will see this in their upcoming desktops. But so far one of their invariants is that the “cheap” consumer laptop has the same baseline capability as the big professional mobile workstation.

igor_kavinski · May 11, 2024

Good reply!

poke01 · May 11, 2024

iPad16,5 - Geekbench

Benchmark results for an iPad16,5 with an ARM processor.

browser.geekbench.com

Highest one so far?

Eug · May 11, 2024

poke01 said:
iPad16,5 - Geekbench

Benchmark results for an iPad16,5 with an ARM processor.

browser.geekbench.com

Highest one so far?

Yes, it is.

However, it's only 1.7% faster than the 3810 score everyone originally was talking about.

Henry swagger · May 11, 2024

https://twitter.com/x/status/1788930699715608699

🤔 m4 is 10 wide ?

poke01 · May 11, 2024

Post in thread '[IPC] Instructions per cycle - How we measure, interpret and apply this metric for modern computing systems'
http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...odern-computing-systems.2568431/post-39891455

This is an excellent post from @NTMBK that applies to M4. I think we must be reminded of this every time something like AVX-512 and SME is added.

Doug S · May 11, 2024

poke01 said:
Samsung could have implemented it last year in their tabs but didn’t. I presume because they can’t yet make larger panels yet for the 12” and 14” tablets.

LG made the 13” Tandem OLED panel for the iPad and Samsung for the 11”.

I read somewhere that Tandem OLED had previously only been used in the automotive market, where very long panel life is obviously more important than in a more frequently replaced consumer device like a tablet.

Eug · May 11, 2024

Doug S said:
I read somewhere that Tandem OLED had previously only been used in the automotive market, where very long panel life is obviously more important than in a more frequently replaced consumer device like a tablet.

Yes, I had also read it has been used mainly in some luxury cars (whereas other companies just used LCD). ie. Low volume applications where screen cost wasn't a make-or-break factor, but where brightness and longevity were key.

So they did exist, but before Apple they had never been produced in high volume due to cost.

The other advantage of dual-stack OLED that works well in the tablet market is that for higher brightness, dual-stack actually uses less power with less heat generation than single-stack, because I guess the power vs nits curve is not linear. They can also balance the power sent to sub pixels better with two OLEDs present which would also affect power utilization.

I'd love to be able to see this enter the TV market.

Nothingness · May 12, 2024

poke01 said:
Post in thread '[IPC] Instructions per cycle - How we measure, interpret and apply this metric for modern computing systems'
http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...odern-computing-systems.2568431/post-39891455

This is an excellent post from @NTMBK that applies to M4. I think we must be reminded of this every time something like AVX-512 and SME is added.

I'd add that what we abusively call IPC in this thread is performance per cycle, that is performance / frequency. This gives a hint at how well a microarchitecture performs (width, branch prediction, data prefetch, etc.) and how useful some instruction sets features (AVX-512, SVE, etc.) are.

Real IPC (number of instructions / cycle) is not easy to use to compare performance between two different ISA (Arm vs x86). On the other it's very useful to compare successive generations of CPUs (at iso extensions) to extract microarchitecture (and memory) improvements. But you can also use PPC for that.

Eug · May 12, 2024

Oh so close to 4000/15000:

iPad16,4 - Geekbench

Benchmark results for an iPad16,4 with an ARM processor.

browser.geekbench.com

Less than 2% away from 4000, and only 0.5% away from 15000.
Maybe the M4 MacBook Pro in the coming year will achieve it.

BTW, Geekbench appears to have removed search access to the Geekbench 5 results.

Nothingness · May 12, 2024

Eug said:
BTW, Geekbench appears to have removed search access to the Geekbench 5 results.

That's still doable but I had a hard time finding it; I basically hacked the URL: https://browser.geekbench.com/v5/cpu/search

Eug · May 12, 2024

It seems there is a 9-core review unit out there. Either that or the bench is coming from Apple itself. There was a Geekbench score uploaded today with 3 performance cores and 8 GB RAM:

Originally I was expecting M4 9-core multi-core to be closer to the 12000 of M3 8-core, but this M4 9-core handily beats it. I guess it's SME again?

EDIT:

As expected, Object Detection is the outlier. Here is GB 6.3.0 comparing M4 9-core vs. M3:

Note though, the M3 example I chose is much better than average for both single core and multi-core, and it seems the single M4 9-core example has a below average single-core score.

I don't know if the M4 9-core multi-core score is good or bad, because it's the only one available right now. We'll have to wait until more M4 9-core benches come out to get a better idea of its performance.

Interestingly though, with these admittedly skewed example scores, the M3 actually ekes out a win over M4 9-core for multi-core in terms of performance per clock, despite M4's advantage for object detection. I guess the other way to look at it though is that despite having only 3 performance cores, M4 9-core still does pretty damn good overall for multi-core.

Nothingness · May 12, 2024

I don't know how well Object Detection scales, but it seems to show that there's a single unit doing matrix computation (as was proven for AMX; perhaps one per perf/efficiency cluster I don't remember exactly).

poke01 · May 12, 2024

https://twitter.com/x/status/1789495719079948748

Some news from longhorn

SpudLobby · May 12, 2024

iPad16,3 - Geekbench

Benchmark results for an iPad16,3 with an ARM processor.

browser.geekbench.com

So based on GB5, where the M1 in a *laptop can score 1730-1740 at the high end* (M1 Pro/Max 1800 but that’s also bigger thermal capacity and diff chip to a degree, faster ramping probably), and we have a 2714 GB5 from *an M4 iPad* the IPC gain is 14.5% over the M1, and anywhere from 5-7% over the M3, again taking high figures and laptop figures vs the M4 in the iPad.

No SME involved there.

Makes me think maybe they did actually go wider again, just like with M3 — which did so for a 3-5% IPC gain.

M3 Pro = 2350/4.05

(and better M3 results don’t get that high in GB5, just pro/max so this is charitable to M3 stuff)

M4 = 2714/4.4

About a 6-7% perf/GHz gain over M3, or 14.5% over M1.

Eug said:
Oh so close to 4000/15000:

iPad16,4 - Geekbench

Benchmark results for an iPad16,4 with an ARM processor.

browser.geekbench.com

View attachment 98842

Less than 2% away from 4000, and only 0.5% away from 15000.
Maybe the M4 MacBook Pro in the coming year will achieve it.

BTW, Geekbench appears to have removed search access to the Geekbench 5 results.

Eug · May 12, 2024

Eug said:
It seems there is a 9-core review unit out there. Either that or the bench is coming from Apple itself. There was a Geekbench score uploaded today with 3 performance cores and 8 GB RAM:

View attachment 98844

Ooh... Interesting.

It was pointed out by someone elsewhere that the 9-core M4 benchmark is running iPadOS 17.6. All the other 10-core benchmarks are running 17.4 (current iPadOS) or 17.5 (beta iPadOS). iPadOS 17.6 isn't even available as a beta yet, which makes me think this leaked 9-core score is direct from Apple, which would also explain why there is only one leaked score so far.

StinkyPinky · May 12, 2024

I'm always suspicious about these synthetic benchmarks.

Let's see the real world benefits these chips show, I bet it is less pronounced.

Eug · May 12, 2024

StinkyPinky said:
I'm always suspicious about these synthetic benchmarks.

Let's see the real world benefits these chips show, I bet it is less pronounced.

Of course. But what it does illustrate at least is that there is an incremental and significant improvement in the performance.

I do find it interesting though that Apple was willing to spend a billion bux or whatever to tape out the M3 series, but it seems they will not bother to use it across the entire lineup.

We shall see, but I think at this point most people believe that M3 Ultra will never appear. There were hints ahead of time though, so it's not as if it was a complete surprise.

Discussion Apple Silicon SoC thread

Lifer

Member

Diamond Member

Diamond Member

Lifer

Member

Lifer

Member

Member

Lifer

Platinum Member

Lifer

Senior member

Platinum Member

Platinum Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Platinum Member

Senior member

Lifer

Diamond Member

Lifer