Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

moinmoin · Tuesday at 7:17 AM

naukkis said:
With GB6 MT even nowadays gaming performance is pretty comparable to those MT results.

Nothingness said:
Anyway, game engines are targetting a limited number of cores and have a potential hard bottleneck out of their control (GPU and their drivers), so it's unsurprising they don't scale well beyond a point.

Sorry to say I completely disagree with these two accounts. These two takes are again taking the clean room approach which is exactly why GB6's "MT" is so misleading as an MT score.

While gaming doesn't (yet) scale much beyond currently 6-8 cores, it is highly sensitive to background noise, i.e. random other loads running in the background. This is exactly where GB6's "MT" fails where more typical N-rate MT benchmarks give a better view on achievable performance of a given system. This is how 4 cores CPUs and 6 cores CPUs without SMT became unusable for gaming. If you can ensure isolating the game from any interference it may be fine, but random loads will appear, and will lead to stutters to some degrees. If you want to avoid those you either look for more cores or for significantly more cache, neither GB6's "MT" is representing well at all while the former is covered by traditional N-rate MT benchmarks well enough.

And since we are in the Apple Silicon SoC thread, may I remind you that the whole discussion started with GB6's "MT"'s hilarious misrepresentation of the MT difference between M3 Ultra and M4 Max?

Eug said:
M4 Max vs. M3 Ultra:

MacBook Pro (16-inch, 2024) vs Mac15,14 - Geekbench

According to GB6's "Multi-Core Score" 16 cores in M4 Max are within 93.7% of the performance of 32 cores in M3 Ultra. What real use does that 💩 comparison have at all?

Nothingness · Tuesday at 8:12 AM

moinmoin said:
While gaming doesn't (yet) scale much beyond currently 6-8 cores, it is highly sensitive to background noise, i.e. random other loads running in the background. This is exactly where GB6's "MT" fails where more typical N-rate MT benchmarks give a better view on achievable performance of a given system. This is how 4 cores CPUs and 6 cores CPUs without SMT became unusable for gaming. If you can ensure isolating the game from any interference it may be fine, but random loads will appear, and will lead to stutters to some degrees. If you want to avoid those you either look for more cores or for significantly more cache, neither GB6's "MT" is representing well at all while the former is covered by traditional N-rate MT benchmarks well enough.

I partially agree. What do you propose, to use unrealistic benchmarks that scale perfectly to assess performance of gaming with low priority background tasks running? N-rate MT benchmarks represent that extremely poorly (SPECrate comes to mind). Processor benchmarking isn't everyday usage, which I agree is what should matter to any end user. No benchmark, as far as I know, represents that well, and GB6 is no exception.

And since we are in the Apple Silicon SoC thread, may I remind you that the whole discussion started with GB6's "MT"'s hilarious misrepresentation of the MT difference between M3 Ultra and M4 Max?

According to GB6's "Multi-Core Score" 16 cores in M4 Max are within 93.7% of the performance of 32 cores in M3 Ultra. What real use does that 💩 comparison have at all?

I've been saying the same since the very beginning: if someone isn't educated enough to consider aggregated scores with a grain of salt, nothing can be done. If someone is looking at CB24 scores to know how well its system will behave in his/her usage of Word or games, nothing can be done.

You can repeat ad nauseam that GB6 MT stinks, and it does especially compared to GB5 MT, it doesn't change the problem that no benchmark represents each end user experience. I'm no fan of GB6 MT, but repeating the same stance "GB6 is horrible, trash it" doesn't move things forward.

moinmoin · Tuesday at 8:56 AM

Nothingness said:
I partially agree. What do you propose, to use unrealistic benchmarks that scale perfectly to assess performance of gaming with low priority background tasks running?

There is no clear solution yet since it's not seen as a big enough issue to solve yet. The majority is simply not aware this is an issue at all.

Ideally there would be a standardized benchmark routine, like doing all benchmarks running video encoding/playback or a browser test in the background and show the standard deviation of those runs to clean room runs.

There are precious few examples of people looking at the issue. I remember Computerbase having done it once, but for the life of me I can't find it anymore. Another nice example is following old video about Rzyen 1600X, working out how 6C/12T is an advantage over 4C (oh how time flies...):

Nothingness said:
You can repeat ad nauseam that GB6 MT stinks, and it does especially compared to GB5 MT, it doesn't change the problem that no benchmark represents each end user experience. I'm no fan of GB6 MT, but repeating the same stance "GB6 is horrible, trash it" doesn't move things forward.

I'll have the urge to repeat it every time somebody posts a comparison of two many cores CPUs using GB6's "MT" since the danger here is somebody else seeing it and doing the easy but wrong conclusion that GB6's "MT" refers to the CPUs' multi core capability and that - in this case - 16 additional cores of M3 Ultra are barely adding on top of M4 Max' 16 cores.

I'm not sure how to move forward from this? Suggest admins to ban and remove postings of GB6's "MT" in such contexts? Keep silent and let it fester?

mvprod123 · Tuesday at 9:25 AM

M3 Ultra benchmarks

mvprod123 · Tuesday at 9:27 AM

M3 Ultra Blender GPU

Eug · Tuesday at 10:30 AM

mvprod123 said:
M3 Ultra Blender GPU View attachment 119499

My M4 non-Pro is almost as fast as M3 Pro.

name99 · Tuesday at 1:17 PM

mikegg said:
M3 Ultra is a bargain for those wanting to run DeepSeek at home or other large LLM models.

Even compared to workstations such as the ones Puget Systems sells, it's a bargain.

32 core Threadripper + 512GB of RAM + RTX 4060ti is already $12k.

Meanwhile, an M3 Ultra has 32 core CPU and a 160 core GPU with 512GB of 819GB/s RAM for $9.5k.

View attachment 119473

This is a market as non-existent as the market that wish to run a home email server or home web server.

There's something deeply broken inside the vast mass of much of the internet that they UTTERLY refuse to learn from the past, no matter how often the same thing occurs... The same people who (justifiably) consider it a hassle to run a home (anything else) server will consider it a hassle to run a home AI server.

M3 Ultra is a nice system for a certain type of work. And it's interesting to see how the prices compare at this high end!

But let's not pretend that there's this huge new, underserved, TAM for it...

Doug S · Tuesday at 2:30 PM

name99 said:
This is a market as non-existent as the market that wish to run a home email server or home web server.

There's something deeply broken inside the vast mass of much of the internet that they UTTERLY refuse to learn from the past, no matter how often the same thing occurs... The same people who (justifiably) consider it a hassle to run a home (anything else) server will consider it a hassle to run a home AI server.

M3 Ultra is a nice system for a certain type of work. And it's interesting to see how the prices compare at this high end!

But let's not pretend that there's this huge new, underserved, TAM for it...

Yeah if he'd said "university or corporate researcher who wants to run Deepseek in their office" then one might buy it, as there is at least a conceivable market there (albeit not nearly big enough for Apple to orient marketing towards)

Worst though, just showing us that the M3 Ultra is cheaper than ONE competitive x86 configuration without even providing benchmarks of how each performs in AI related tasks makes his claim kind of pointless. I could probably put together a Core i3 with sufficient RAM for running AI models and claim that's a much cheaper alternative, if we aren't going to show any numbers about any of this performs.

Nothingness · Tuesday at 8:20 PM

Eug said:
View attachment 119513

This particular benchmark seems to have some scaling issue. I wonder if it's the linker which is either not parallelized or not fully. Or perhpas some of the files are taking significantly longer to compile than others.

The Hardcard · Tuesday at 11:23 PM

name99 said:
This is a market as non-existent as the market that wish to run a home email server or home web server.

There's something deeply broken inside the vast mass of much of the internet that they UTTERLY refuse to learn from the past, no matter how often the same thing occurs... The same people who (justifiably) consider it a hassle to run a home (anything else) server will consider it a hassle to run a home AI server.

M3 Ultra is a nice system for a certain type of work. And it's interesting to see how the prices compare at this high end!

But let's not pretend that there's this huge new, underserved, TAM for it...

The AI market is far from nonexistent. Peruse the LocalLlama subReddit, Youtube, X, the Medium website among others. This is is a rapidly growing community already tens of thousands in size. And Apple is making measurable revenue. The persistent talk about LLMs in the last several chip launches is not some marketing fantasy. Macs are moving in volume within this market.

I don’t know what the original purpose for the ever wider memory interfaces on the Pro, Max, and Ultra variants, but they make Macs the bargain choice for high parameter LLMs. Both the Max and Ultra series can run LLMs not only run LLMs than alternatives, but they are easier to deal with.

The huge fly in the ointment is the severely reduced compute. Apple GPUs are huge for integrated graphics, but still not nearly enough. The current NPU design is bandwidth lacking, it can barely get 120 GB/s in any M3 or M4, half that on earlier generations.

But with the unmatched combination of memory capacity and bandwidth, it is a popular choice. In fact, Studio clusters are a thing.

Doug S said:
Yeah if he'd said "university or corporate researcher who wants to run Deepseek in their office" then one might buy it, as there is at least a conceivable market there (albeit not nearly big enough for Apple to orient marketing towards)

Worst though, just showing us that the M3 Ultra is cheaper than ONE competitive x86 configuration without even providing benchmarks of how each performs in AI related tasks makes his claim kind of pointless. I could probably put together a Core i3 with sufficient RAM for running AI models and claim that's a much cheaper alternative, if we aren't going to show any numbers about any of this performs.

I don’t get what you mean. Apple provided very little details about other uses for M3 Ultra. LLMs are the main focus of Ultra marketing and the only expressed market target for 512 GB RAM. The market is more than big enough.

Memory bandwidth dictates the speed of LLM token generation. That Threadripper system not only costs more that M3 Ultra, it is slower. Standard bandwidth is 332 GB/s, a little more than M4 Pro. You can get up to 461 GB/s with overclocked memory, less than M4 Max.

The only way to have faster token generation than Macs is to have the model in GPU VRAM. The options to attach enough GPUs to match memory capacity of M4 Max or M3 Ultra cost a lot more, not to mention the space, power and heat.

For 512 GBs of VRAM, you need 5 H100s — $125,000, 11 professional Nvidia cards — $66,000, 16 5090s — $32000 and that just the GPUs, not the system that will hold and power them.

22 3090s is only about $17,000. Without the rack system needed to hold, power, and host all those GPUs.

mikegg · 2025-03-12T02:18:43-0400

=

name99 said:
This is a market as non-existent as the market that wish to run a home email server or home web server.

There's something deeply broken inside the vast mass of much of the internet that they UTTERLY refuse to learn from the past, no matter how often the same thing occurs... The same people who (justifiably) consider it a hassle to run a home (anything else) server will consider it a hassle to run a home AI server.

M3 Ultra is a nice system for a certain type of work. And it's interesting to see how the prices compare at this high end!

But let's not pretend that there's this huge new, underserved, TAM for it...

I didn't say the market was huge. Calm down.

That said, the market is likely bigger than you think - especially if they also use the chip for internal AI needs.

mikegg · 2025-03-12T03:16:32-0400

Doug S said:
Worst though, just showing us that the M3 Ultra is cheaper than ONE competitive x86 configuration without even providing benchmarks of how each performs in AI related tasks makes his claim kind of pointless. I could probably put together a Core i3 with sufficient RAM for running AI models and claim that's a much cheaper alternative, if we aren't going to show any numbers about any of this performs.

What?

Do you have any idea how LLM inferencing works? If not, this is a pointless discussion.

The Hardcard said:
I don’t get what you mean. Apple provided very little details about other uses for M3 Ultra. LLMs are the main focus of Ultra marketing and the only expressed market target for 512 GB RAM. The market is more than big enough.

Memory bandwidth dictates the speed of LLM token generation. That Threadripper system not only costs more that M3 Ultra, it is slower. Standard bandwidth is 332 GB/s, a little more than M4 Pro. You can get up to 461 GB/s with overclocked memory, less than M4 Max.

The only way to have faster token generation than Macs is to have the model in GPU VRAM. The options to attach enough GPUs to match memory capacity of M4 Max or M3 Ultra cost a lot more, not to mention the space, power and heat.

For 512 GBs of VRAM, you need 5 H100s — $125,000, 11 professional Nvidia cards — $66,000, 16 5090s — $32000 and that just the GPUs, not the system that will hold and power them.

22 3090s is only about $17,000. Without the rack system needed to hold, power, and host all those GPUs.

Exactly.

M3 Ultra is a bargain for what it actually provides. It's a unique computer. If you have a need to run DeepSeek R1 locally, there is no better value.

poke01 · 2025-03-12T05:27:59-0400

Lunar Lake gets clapped by M4

okoroezenwa · 2025-03-12T07:01:49-0400

poke01 said:
Lunar Lake gets clapped by M4

View attachment 119634

View attachment 119635
View attachment 119636

Just sad at this point. Hoping for better from Panther Lake (that’s the next one right?).

mikegg · 2025-03-12T07:14:23-0400

poke01 said:
Lunar Lake gets clapped by M4

View attachment 119634

View attachment 119635
View attachment 119636

Does the battery test normalize for performance?

StinkyPinky · 2025-03-12T11:14:45-0400

What bugs me about the great majority of reviews is they always test out the maxed out M4 Max etc. Why not also test the base version? That's what most people will be getting.

okoroezenwa · 2025-03-12T11:53:34-0400

StinkyPinky said:
What bugs me about the great majority of reviews is they always test out the maxed out M4 Max etc. Why not also test the base version? That's what most people will be getting.

Apple usually doesn’t give those out I think.

name99 · 2025-03-12T13:15:22-0400

mikegg said:
=

I didn't say the market was huge. Calm down.

That said, the market is likely bigger than you think - especially if they also use the chip for internal AI needs.

And I didn't say anything about the "AI" market. Or the M3 Ultra market.
I specifically attacked the phrase "run DeepSeek at home".

As others have pointed out, there's a legitimately interesting market for these things as AI workstations in businesses and colleges (and for other use cases like video/movie editing work stations).

My ONLY complaint was with this perennial fantasy that people can, should, and will run "servers" at home.

jdubs03 · 2025-03-12T17:21:33-0400

This:

Looks backward.
If it isn’t, what is the x-axis?

poke01 · 2025-03-12T17:47:24-0400

jdubs03 said:
This:
View attachment 119680
Looks backward.
If it isn’t, what is the x-axis?

Number of rounds

SpudLobby · 2025-03-13T01:48:34-0400

jdubs03 said:
This:
View attachment 119680

Lmao, and this is CB23 which is handicapping Arm. Remember all the morons that thought LNL would close gaps? Amazing how it keeps looking worse and worse.

digitaldreamer · 2025-03-13T11:53:24-0400

jdubs03 said:
This:
View attachment 119680
Looks backward.
If it isn’t, what is the x-axis?

And, the laptop the Intel chip is in is doing a great job dissipating the heat at 25W, whereas the M4 is struggling at 17W all the way down to 9.5W. And, it still looks to be throttling beyond that.

Anyone know the differences in the heatsink design and layout of these two laptops? Is the YOGA Air heavier?
How much time are we talking about here with 15 rounds?

gdansk · 2025-03-13T12:08:43-0400

digitaldreamer said:
Anyone know the differences in the heatsink design and layout of these two laptops? Is the YOGA Air heavier?
How much time are we talking about here with 15 rounds

It has a fan...

poke01 · 2025-03-13T13:06:58-0400

digitaldreamer said:
And, the laptop the Intel chip is in is doing a great job dissipating the heat at 25W, whereas the M4 is struggling at 17W all the way down to 9.5W. And, it still looks to be throttling beyond that.

Anyone know the differences in the heatsink design and layout of these two laptops? Is the YOGA Air heavier?
How much time are we talking about here with 15 rounds?

The M4 is passively cooled. The yoga is actively cooled, which makes it even more embarrassing for Intel.

Discussion Apple Silicon SoC thread

Lifer

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Lifer

Senior member

Diamond Member

Diamond Member

Senior member

Golden Member

Golden Member

Diamond Member

Member

Golden Member

Diamond Member

Member

Senior member

Golden Member

Diamond Member

Golden Member

Member

Diamond Member

Diamond Member