Discussion Apple Silicon SoC thread

Page 387 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,986
1,596
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
5,193
8,328
136
With GB6 MT even nowadays gaming performance is pretty comparable to those MT results.
Anyway, game engines are targetting a limited number of cores and have a potential hard bottleneck out of their control (GPU and their drivers), so it's unsurprising they don't scale well beyond a point.
Sorry to say I completely disagree with these two accounts. These two takes are again taking the clean room approach which is exactly why GB6's "MT" is so misleading as an MT score.

While gaming doesn't (yet) scale much beyond currently 6-8 cores, it is highly sensitive to background noise, i.e. random other loads running in the background. This is exactly where GB6's "MT" fails where more typical N-rate MT benchmarks give a better view on achievable performance of a given system. This is how 4 cores CPUs and 6 cores CPUs without SMT became unusable for gaming. If you can ensure isolating the game from any interference it may be fine, but random loads will appear, and will lead to stutters to some degrees. If you want to avoid those you either look for more cores or for significantly more cache, neither GB6's "MT" is representing well at all while the former is covered by traditional N-rate MT benchmarks well enough.

And since we are in the Apple Silicon SoC thread, may I remind you that the whole discussion started with GB6's "MT"'s hilarious misrepresentation of the MT difference between M3 Ultra and M4 Max?


According to GB6's "Multi-Core Score" 16 cores in M4 Max are within 93.7% of the performance of 32 cores in M3 Ultra. What real use does that 💩 comparison have at all?
 

Nothingness

Diamond Member
Jul 3, 2013
3,239
2,293
136
While gaming doesn't (yet) scale much beyond currently 6-8 cores, it is highly sensitive to background noise, i.e. random other loads running in the background. This is exactly where GB6's "MT" fails where more typical N-rate MT benchmarks give a better view on achievable performance of a given system. This is how 4 cores CPUs and 6 cores CPUs without SMT became unusable for gaming. If you can ensure isolating the game from any interference it may be fine, but random loads will appear, and will lead to stutters to some degrees. If you want to avoid those you either look for more cores or for significantly more cache, neither GB6's "MT" is representing well at all while the former is covered by traditional N-rate MT benchmarks well enough.
I partially agree. What do you propose, to use unrealistic benchmarks that scale perfectly to assess performance of gaming with low priority background tasks running? N-rate MT benchmarks represent that extremely poorly (SPECrate comes to mind). Processor benchmarking isn't everyday usage, which I agree is what should matter to any end user. No benchmark, as far as I know, represents that well, and GB6 is no exception.

And since we are in the Apple Silicon SoC thread, may I remind you that the whole discussion started with GB6's "MT"'s hilarious misrepresentation of the MT difference between M3 Ultra and M4 Max?

According to GB6's "Multi-Core Score" 16 cores in M4 Max are within 93.7% of the performance of 32 cores in M3 Ultra. What real use does that 💩 comparison have at all?
I've been saying the same since the very beginning: if someone isn't educated enough to consider aggregated scores with a grain of salt, nothing can be done. If someone is looking at CB24 scores to know how well its system will behave in his/her usage of Word or games, nothing can be done.

You can repeat ad nauseam that GB6 MT stinks, and it does especially compared to GB5 MT, it doesn't change the problem that no benchmark represents each end user experience. I'm no fan of GB6 MT, but repeating the same stance "GB6 is horrible, trash it" doesn't move things forward.
 

moinmoin

Diamond Member
Jun 1, 2017
5,193
8,328
136
I partially agree. What do you propose, to use unrealistic benchmarks that scale perfectly to assess performance of gaming with low priority background tasks running?
There is no clear solution yet since it's not seen as a big enough issue to solve yet. The majority is simply not aware this is an issue at all.

Ideally there would be a standardized benchmark routine, like doing all benchmarks running video encoding/playback or a browser test in the background and show the standard deviation of those runs to clean room runs.

There are precious few examples of people looking at the issue. I remember Computerbase having done it once, but for the life of me I can't find it anymore. Another nice example is following old video about Rzyen 1600X, working out how 6C/12T is an advantage over 4C (oh how time flies...):


You can repeat ad nauseam that GB6 MT stinks, and it does especially compared to GB5 MT, it doesn't change the problem that no benchmark represents each end user experience. I'm no fan of GB6 MT, but repeating the same stance "GB6 is horrible, trash it" doesn't move things forward.
I'll have the urge to repeat it every time somebody posts a comparison of two many cores CPUs using GB6's "MT" since the danger here is somebody else seeing it and doing the easy but wrong conclusion that GB6's "MT" refers to the CPUs' multi core capability and that - in this case - 16 additional cores of M3 Ultra are barely adding on top of M4 Max' 16 cores.

I'm not sure how to move forward from this? Suggest admins to ban and remove postings of GB6's "MT" in such contexts? Keep silent and let it fester?
 
Reactions: igor_kavinski

name99

Senior member
Sep 11, 2010
585
489
136
M3 Ultra is a bargain for those wanting to run DeepSeek at home or other large LLM models.

Even compared to workstations such as the ones Puget Systems sells, it's a bargain.

32 core Threadripper + 512GB of RAM + RTX 4060ti is already $12k.

Meanwhile, an M3 Ultra has 32 core CPU and a 160 core GPU with 512GB of 819GB/s RAM for $9.5k.

View attachment 119473

This is a market as non-existent as the market that wish to run a home email server or home web server.

There's something deeply broken inside the vast mass of much of the internet that they UTTERLY refuse to learn from the past, no matter how often the same thing occurs... The same people who (justifiably) consider it a hassle to run a home (anything else) server will consider it a hassle to run a home AI server.

M3 Ultra is a nice system for a certain type of work. And it's interesting to see how the prices compare at this high end!

But let's not pretend that there's this huge new, underserved, TAM for it...
 

Doug S

Diamond Member
Feb 8, 2020
3,059
5,290
136
This is a market as non-existent as the market that wish to run a home email server or home web server.

There's something deeply broken inside the vast mass of much of the internet that they UTTERLY refuse to learn from the past, no matter how often the same thing occurs... The same people who (justifiably) consider it a hassle to run a home (anything else) server will consider it a hassle to run a home AI server.

M3 Ultra is a nice system for a certain type of work. And it's interesting to see how the prices compare at this high end!

But let's not pretend that there's this huge new, underserved, TAM for it...

Yeah if he'd said "university or corporate researcher who wants to run Deepseek in their office" then one might buy it, as there is at least a conceivable market there (albeit not nearly big enough for Apple to orient marketing towards)

Worst though, just showing us that the M3 Ultra is cheaper than ONE competitive x86 configuration without even providing benchmarks of how each performs in AI related tasks makes his claim kind of pointless. I could probably put together a Core i3 with sufficient RAM for running AI models and claim that's a much cheaper alternative, if we aren't going to show any numbers about any of this performs.
 

The Hardcard

Senior member
Oct 19, 2021
311
395
106
This is a market as non-existent as the market that wish to run a home email server or home web server.

There's something deeply broken inside the vast mass of much of the internet that they UTTERLY refuse to learn from the past, no matter how often the same thing occurs... The same people who (justifiably) consider it a hassle to run a home (anything else) server will consider it a hassle to run a home AI server.

M3 Ultra is a nice system for a certain type of work. And it's interesting to see how the prices compare at this high end!

But let's not pretend that there's this huge new, underserved, TAM for it...
The AI market is far from nonexistent. Peruse the LocalLlama subReddit, Youtube, X, the Medium website among others. This is is a rapidly growing community already tens of thousands in size. And Apple is making measurable revenue. The persistent talk about LLMs in the last several chip launches is not some marketing fantasy. Macs are moving in volume within this market.

I don’t know what the original purpose for the ever wider memory interfaces on the Pro, Max, and Ultra variants, but they make Macs the bargain choice for high parameter LLMs. Both the Max and Ultra series can run LLMs not only run LLMs than alternatives, but they are easier to deal with.

The huge fly in the ointment is the severely reduced compute. Apple GPUs are huge for integrated graphics, but still not nearly enough. The current NPU design is bandwidth lacking, it can barely get 120 GB/s in any M3 or M4, half that on earlier generations.

But with the unmatched combination of memory capacity and bandwidth, it is a popular choice. In fact, Studio clusters are a thing.

Yeah if he'd said "university or corporate researcher who wants to run Deepseek in their office" then one might buy it, as there is at least a conceivable market there (albeit not nearly big enough for Apple to orient marketing towards)

Worst though, just showing us that the M3 Ultra is cheaper than ONE competitive x86 configuration without even providing benchmarks of how each performs in AI related tasks makes his claim kind of pointless. I could probably put together a Core i3 with sufficient RAM for running AI models and claim that's a much cheaper alternative, if we aren't going to show any numbers about any of this performs.
I don’t get what you mean. Apple provided very little details about other uses for M3 Ultra. LLMs are the main focus of Ultra marketing and the only expressed market target for 512 GB RAM. The market is more than big enough.

Memory bandwidth dictates the speed of LLM token generation. That Threadripper system not only costs more that M3 Ultra, it is slower. Standard bandwidth is 332 GB/s, a little more than M4 Pro. You can get up to 461 GB/s with overclocked memory, less than M4 Max.

The only way to have faster token generation than Macs is to have the model in GPU VRAM. The options to attach enough GPUs to match memory capacity of M4 Max or M3 Ultra cost a lot more, not to mention the space, power and heat.

For 512 GBs of VRAM, you need 5 H100s — $125,000, 11 professional Nvidia cards — $66,000, 16 5090s — $32000 and that just the GPUs, not the system that will hold and power them.

22 3090s is only about $17,000. Without the rack system needed to hold, power, and host all those GPUs.
 

mikegg

Golden Member
Jan 30, 2010
1,881
490
136
=
This is a market as non-existent as the market that wish to run a home email server or home web server.

There's something deeply broken inside the vast mass of much of the internet that they UTTERLY refuse to learn from the past, no matter how often the same thing occurs... The same people who (justifiably) consider it a hassle to run a home (anything else) server will consider it a hassle to run a home AI server.

M3 Ultra is a nice system for a certain type of work. And it's interesting to see how the prices compare at this high end!

But let's not pretend that there's this huge new, underserved, TAM for it...
I didn't say the market was huge. Calm down.

That said, the market is likely bigger than you think - especially if they also use the chip for internal AI needs.
 

mikegg

Golden Member
Jan 30, 2010
1,881
490
136
Worst though, just showing us that the M3 Ultra is cheaper than ONE competitive x86 configuration without even providing benchmarks of how each performs in AI related tasks makes his claim kind of pointless. I could probably put together a Core i3 with sufficient RAM for running AI models and claim that's a much cheaper alternative, if we aren't going to show any numbers about any of this performs.
What?

Do you have any idea how LLM inferencing works? If not, this is a pointless discussion.

I don’t get what you mean. Apple provided very little details about other uses for M3 Ultra. LLMs are the main focus of Ultra marketing and the only expressed market target for 512 GB RAM. The market is more than big enough.

Memory bandwidth dictates the speed of LLM token generation. That Threadripper system not only costs more that M3 Ultra, it is slower. Standard bandwidth is 332 GB/s, a little more than M4 Pro. You can get up to 461 GB/s with overclocked memory, less than M4 Max.

The only way to have faster token generation than Macs is to have the model in GPU VRAM. The options to attach enough GPUs to match memory capacity of M4 Max or M3 Ultra cost a lot more, not to mention the space, power and heat.

For 512 GBs of VRAM, you need 5 H100s — $125,000, 11 professional Nvidia cards — $66,000, 16 5090s — $32000 and that just the GPUs, not the system that will hold and power them.

22 3090s is only about $17,000. Without the rack system needed to hold, power, and host all those GPUs.
Exactly.

M3 Ultra is a bargain for what it actually provides. It's a unique computer. If you have a need to run DeepSeek R1 locally, there is no better value.
 

name99

Senior member
Sep 11, 2010
585
489
136
=

I didn't say the market was huge. Calm down.

That said, the market is likely bigger than you think - especially if they also use the chip for internal AI needs.
And I didn't say anything about the "AI" market. Or the M3 Ultra market.
I specifically attacked the phrase "run DeepSeek at home".

As others have pointed out, there's a legitimately interesting market for these things as AI workstations in businesses and colleges (and for other use cases like video/movie editing work stations).

My ONLY complaint was with this perennial fantasy that people can, should, and will run "servers" at home.
 
Mar 23, 2007
31
17
81
This:
View attachment 119680
Looks backward.
If it isn’t, what is the x-axis?
And, the laptop the Intel chip is in is doing a great job dissipating the heat at 25W, whereas the M4 is struggling at 17W all the way down to 9.5W. And, it still looks to be throttling beyond that.

Anyone know the differences in the heatsink design and layout of these two laptops? Is the YOGA Air heavier?
How much time are we talking about here with 15 rounds?
 

poke01

Diamond Member
Mar 8, 2022
3,301
4,546
106
And, the laptop the Intel chip is in is doing a great job dissipating the heat at 25W, whereas the M4 is struggling at 17W all the way down to 9.5W. And, it still looks to be throttling beyond that.

Anyone know the differences in the heatsink design and layout of these two laptops? Is the YOGA Air heavier?
How much time are we talking about here with 15 rounds?
The M4 is passively cooled. The yoga is actively cooled, which makes it even more embarrassing for Intel.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |