Discussion Apple Silicon SoC thread

Page 385 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,986
1,596
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

joshua95

Junior Member
Jan 6, 2025
5
0
11
24GB RAM in the base MacBook Air... £1299 with education pricing compared to £1499 for the MBP with only 16GB... Now that is tempting.

I do wonder why they've boosted the base RAM... Perhaps to make it more of a competitive purchase vs the M3 Air
 

Eug

Lifer
Mar 11, 2000
23,986
1,596
126
24GB RAM in the base MacBook Air... £1299 with education pricing compared to £1499 for the MBP with only 16GB... Now that is tempting.

I do wonder why they've boosted the base RAM... Perhaps to make it more of a competitive purchase vs the M3 Air
They boosted the RAM because of Apple Intelligence, and they boosted the M2 and M3 at the same time. ie. Both the base M2 MBA and base M3 MB Air have been 16 GB for months.

It did a real number on 8 GB MBA resale values. The latest M4 release will also push down M2/M3 resale values, because the M4 16/256 model got a drop in price. All of a sudden all the sale prices on the M2/M3 at retail in the last couple of months don't seem very good. So, I'm glad I waited for M4.

I will purchase an M4 16/256 or M4 16/512 when the Back-To-School Promotion starts in a few months.

The only problem with the M4 16/512 is that it's an expensive upgrade from 16/256. To mitigate the price shock they bundle in an SoC upgrade (from 8-core GPU to 10-core) and a higher end USB-C adapter (from single 30 W to dual 35 W or single 70 W).
 
Last edited:
Reactions: Mopetar

mvprod123

Senior member
Jun 22, 2024
237
271
96
Apple says the Ultra will skip certain generations.

That leaves few options for the MacPro:
1. Upgrade to the M3 Ultra, which will look ridiculous when the M5 generation launches in the fall.
2. Upgrade to M5 Ultra. It would be fair, but is the deployed capacity of the N3P node enough now?
 
Reactions: jdubs03

name99

Senior member
Sep 11, 2010
585
489
136
Damn, it was actually an M3 Ultra. That’s so bizarre.
It's CALLED an M3 Ultra. That's a slightly different point.
If we accept that the chips are subject to on-going improvement, then what you call a chip that has any particular mix of features (GPU from class N, CPU from class M, ANE from class L, ....) is something of a choice.

My *guess* is that the specific CPU in the M3 Ultra, while possibly having a few tweaks and fixes relative to that in the M3 Max, does NOT have SME (obviously it will have AMX) and that (this time round...) that's considered important enough as far as a developer is concerned that it should be defined as M3 class.

But overall, as others are pointing out, it's how the two SoCs are joined together (both physically and logically) that's the most interesting element here. We appear to have something different with the memory controllers (ie memory capacity is more than twice what the M3 or M4 Max controllers could support).
Do we have something physically like UltraFusion, or a different physical connector?
And, given the delay, we PROBABLY have a whole lot of subtle work to fix the various disappointments in the M1 and M2 Ultra, the cases where either we did not get 2x scaling, or we couldn't even try for 2x scaling.

For example we probably get the new coherence protocol that should speed up the performance of tasks. We probably get tweaks to the GPU that allow it to schedule more effectively over two (and multiple...) "GPU clusters". We probably get something that allows two (and multiple) ANE's to work together. Probably some way of distributing interrupts across multiple SoCs effectively. etc etc.

It's feels like the orders, after getting M2 Ultra out the door, were something like "OK, take whatever IP you want from the M3 and M4 bins, and modify them however necessary to get better scaling to not just 2 but also 4 chiplets, and when you're done we'll roll back the relevant changes into the main-line (hopefully M5, maybe only M6)".
 

GC2:CS

Member
Jul 6, 2018
34
20
81
So are there 64GB stacks with 128 bit bus ? Or we have 16 channels by 32 GB ?

I guess that has to mean a lot of power dedicated to memory when fully specked out.

Also my bet is on entry level iPad having only one P core. I “foresaw” this years ago. The Ps are so strong you do not need two of them at entry level stuff. Now I can start predicting a fully E core SoC in phones tablets and entry level notebooks.
 

name99

Senior member
Sep 11, 2010
585
489
136
"Mac Studio with M3 Ultra enables
  • Up to 1.1x faster basecalling for DNA sequencing in Oxford Nanopore MinKNOW when compared to Mac Studio with M1 Ultra, and up to 21.1x faster when compared to the 16-core Intel-based Mac Pro with Radeon Pro W5700X.
"

1.1x is, let's face it, pretty pathetic!

What exactly limits the performance of "DNA sequencing in Oxford Nanopore MinKNOW" that it runs like a beast compared to the Intel Mac Pro, but then speeds up nothing compared to an M1 Ultra?

I'm guessing this runs on the GPU (and that's why they specified the Radeon)?
Is this a very rare very special case where Dynamic Caching provides absolutely zero benefit to the code, but the overhead of Dynamic Caching cuts into the speedups you'd expect from more GPU cores running at higher frequency?
 
Reactions: igor_kavinski

name99

Senior member
Sep 11, 2010
585
489
136
So are there 64GB stacks with 128 bit bus ? Or we have 16 channels by 32 GB ?

I guess that has to mean a lot of power dedicated to memory when fully specked out.

Also my bet is on entry level iPad having only one P core. I “foresaw” this years ago. The Ps are so strong you do not need two of them at entry level stuff. Now I can start predicting a fully E core SoC in phones tablets and entry level notebooks.
You are asking two questions: how the extra DRAM is handled LOGICALLY, and how it is handle PHYSICALLY.

Note that the bandwidth does not go up at the maximum capacity, and the bandwidth at all capacities is the same, basically 820GB/s.

So my guess is that, for the first time with Apple Silicon, Apple is using the "rank" functionality of DRAM. Effectively "rank" refers to the two sides of a DRAM DIMM.
Imagine a DIMM with chips on one side. There is a single set of wires that goes to the DIMM, so you have a specific fixed bandwidth, and address pins. Now imagine you put a second set of DRAM chips on the other side. All you need to do is add a single additional wire and you can address either one side of the DIMM or the other (you have doubled capacity) but note that you have NOT doubled bandwidth. Of course your memory controller also needs a few tweaks to understand this setup (and, for example, to make sure that it doesn't try to send commands to side A when side B is active and using the DRAM bus...)

Apple has never (as far as I know) used this functionality before for Apple Silicon, but a year or so ago I pointed out a patent (and was mocked for it by certain people...) that described tweaks made to the memory controller to optimized scheduling when using ranks....

As for how this is handled physically, my guess is that, although rank does not HAVE TO mean "side" (of a DIMM or otherwise) doing things two sided is generally the easiest solution. So imagine the traditional M2 Ultra setup
as seen here:
with the DRAM forming wings on the side. We could imagine a second set of DRAM chips on the other side, below the visible "wings".

Of course the DRAMs COULD be stacked, likewise sharing a single set of wires. But that's more expensive, and seems pointless given the easier solution.
 
Reactions: igor_kavinski

The Hardcard

Senior member
Oct 19, 2021
311
395
106
"Mac Studio with M3 Ultra enables
  • Up to 1.1x faster basecalling for DNA sequencing in Oxford Nanopore MinKNOW when compared to Mac Studio with M1 Ultra, and up to 21.1x faster when compared to the 16-core Intel-based Mac Pro with Radeon Pro W5700X.
"

1.1x is, let's face it, pretty pathetic!

What exactly limits the performance of "DNA sequencing in Oxford Nanopore MinKNOW" that it runs like a beast compared to the Intel Mac Pro, but then speeds up nothing compared to an M1 Ultra?

I'm guessing this runs on the GPU (and that's why they specified the Radeon)?
Is this a very rare very special case where Dynamic Caching provides absolutely zero benefit to the code, but the overhead of Dynamic Caching cuts into the speedups you'd expect from more GPU cores running at higher frequency?
Memory bandwidth. The M3 Ultra has the same 800 GB/s bandwidth as the previous Ultras.
 

Doug S

Diamond Member
Feb 8, 2020
3,059
5,290
136
Wait, I thought die photos had determined there were no Fusion I/O pads on M3 Max (or M4 Max) so how they are using it to make M3 Ultra? Are they using different dies to make it than the Max? If so, what's the point of doing the fusion thing? Avoiding a reticle sized die?

Something isn't adding up here.
 

name99

Senior member
Sep 11, 2010
585
489
136
I've done the comparison, so you don't need to.
I've been holding off buying an M4 class machine because I wasn't sure if a maxed out mini pro would be a better deal than a studio. And I was correct to do!

Suppose you want a high end mini Pro.
This means what I wanted included
- 10G ethernet
- 10+4 CPU cores
- more than 24GB

The first two take us to $1700, if we want 48GB that adds $400

Now compare with the M4 Max Studio
- 10G ethernet built in
- 10+4 cores at the lowest tier
- 36GB

This costs $2000.
If we had a 36GB mini Pro option, that would cost an extra $200, taking us to $1900.

So for an extra $100 we're getting
- 32 vs 20 GPU cores
- 410 vs 270 GB/s DRAM bandwidth
- probably 64 vs 32GB of SLC (hard to get anything definitive about SLC in the M4 generation!)
- more IO (I probably won't use)
- more screens (I probably won't use)

This seems like a slam dunk! $100 for a lot better GPU and mildly better everything else (larger SLC and bandwidth, better thermals).
Even if you want to go to the 48GB tier (I don't) there's an M4 Max Studio option available, at the same $200 per 12GB upgrade cost as the mini Pro.

So to me it looks clear -- if you want a high end mini pro and don't need an especially small machine, just get the lowest end Studio.
 

mvprod123

Senior member
Jun 22, 2024
237
271
96
Wait, I thought die photos had determined there were no Fusion I/O pads on M3 Max (or M4 Max) so how they are using it to make M3 Ultra? Are they using different dies to make it than the Max? If so, what's the point of doing the fusion thing? Avoiding a reticle sized die?

Something isn't adding up here.

It seems that Apple built a new version of the M3 Max on the new N3E node with UltraFusion attached just for the M3 Ultra Mac Studio, with Thunderbolt 5 controllers.
 
Reactions: Doug S

oak8292

Member
Sep 14, 2016
139
152
116
It seems that Apple built a new version of the M3 Max on the new N3E node with UltraFusion attached just for the M3 Ultra Mac Studio, with Thunderbolt 5 controllers.
Based on guesses about volume this ‘M3 Max’ version 2 must be going into internal servers. The increased capability of the I/O for both DRAM and Thunderbolt would make it a good server chip. They need to have volume for this die and it isn’t going to come from the Studio. I will speculate that going forward all of the Ultras will serve double duty in Apple server farms.

“The Apple silicon servers that form the foundation of Private Cloud Compute provide unprecedented cloud security.”

 

jdubs03

Golden Member
Oct 1, 2013
1,206
848
136
I've done the comparison, so you don't need to.
I've been holding off buying an M4 class machine because I wasn't sure if a maxed out mini pro would be a better deal than a studio. And I was correct to do!

Suppose you want a high end mini Pro.
This means what I wanted included
- 10G ethernet
- 10+4 CPU cores
- more than 24GB

The first two take us to $1700, if we want 48GB that adds $400

Now compare with the M4 Max Studio
- 10G ethernet built in
- 10+4 cores at the lowest tier
- 36GB

This costs $2000.
If we had a 36GB mini Pro option, that would cost an extra $200, taking us to $1900.

So for an extra $100 we're getting
- 32 vs 20 GPU cores
- 410 vs 270 GB/s DRAM bandwidth
- probably 64 vs 32GB of SLC (hard to get anything definitive about SLC in the M4 generation!)
- more IO (I probably won't use)
- more screens (I probably won't use)

This seems like a slam dunk! $100 for a lot better GPU and mildly better everything else (larger SLC and bandwidth, better thermals).
Even if you want to go to the 48GB tier (I don't) there's an M4 Max Studio option available, at the same $200 per 12GB upgrade cost as the mini Pro.

So to me it looks clear -- if you want a high end mini pro and don't need an especially small machine, just get the lowest end Studio.
Yeah the higher end Mac Mini at this point makes no sense. Probably why they released this 3 months after.
 

kingsleyopara

Junior Member
May 7, 2024
5
0
11
So my guess is that, for the first time with Apple Silicon, Apple is using the "rank" functionality of DRAM. Effectively "rank" refers to the two sides of a DRAM DIMM.
Imagine a DIMM with chips on one side. There is a single set of wires that goes to the DIMM, so you have a specific fixed bandwidth, and address pins. Now imagine you put a second set of DRAM chips on the other side. All you need to do is add a single additional wire and you can address either one side of the DIMM or the other (you have doubled capacity) but note that you have NOT doubled bandwidth. Of course your memory controller also needs a few tweaks to understand this setup (and, for example, to make sure that it doesn't try to send commands to side A when side B is active and using the DRAM bus...)
Should we expect the M5 Max in the fall to double the maximum supported memory to 256GB?
 

SpudLobby

Golden Member
May 18, 2022
1,039
699
106
They boosted the RAM because of Apple Intelligence, and they boosted the M2 and M3 at the same time. ie. Both the base M2 MBA and base M3 MB Air have been 16 GB for months.

It did a real number on 8 GB MBA resale values. The latest M4 release will also push down M2/M3 resale values, because the M4 16/256 model got a drop in price. All of a sudden all the sale prices on the M2/M3 at retail in the last couple of months don't seem very good. So, I'm glad I waited for M4.

I will purchase an M4 16/256 or M4 16/512 when the Back-To-School Promotion starts in a few months.

The only problem with the M4 16/512 is that it's an expensive upgrade from 16/256. To mitigate the price shock they bundle in an SoC upgrade (from 8-core GPU to 10-core) and a higher end USB-C adapter (from single 30 W to dual 35 W or single 70 W).
I think they boosted the base amount not only because of that but because Apple is not *totally* immune from competition in the Windows sector, it's not like iPhones/Android stuff, people use these things to work (even students). 8GB of RAM was just absurd and long overdue to a 16GB standard. The price cut and M4 standard at $999 I think also shows Apple pays at least some attention to the wider market.
 

Doug S

Diamond Member
Feb 8, 2020
3,059
5,290
136
I think they boosted the base amount not only because of that but because Apple is not *totally* immune from competition in the Windows sector, it's not like iPhones/Android stuff, people use these things to work (even students). 8GB of RAM was just absurd and long overdue to a 16GB standard. The price cut and M4 standard at $999 I think also shows Apple pays at least some attention to the wider market.

The price cut was a little surprising to me given the uncertainty (to say the least) around the tariff situation. I guess the M4 should be a little cheaper from TSMC than the M3 so if other than RAM everything else stayed pretty much the same their costs may have gone down enough to reduce the price while maintaining their margin. If tariffs hit hard then everyone will be raising prices so I suppose there's no point in trying to plan for something they have no control over.
 
Reactions: SpudLobby

name99

Senior member
Sep 11, 2010
585
489
136
Should we expect the M5 Max in the fall to double the maximum supported memory to 256GB?
That's a business question, not a technical question.
Do they have a large enough pool of customers who want the extra DRAM but NOT the extra compute or bandwidth of an M5 Ultra? I have no idea.
 

Doug S

Diamond Member
Feb 8, 2020
3,059
5,290
136
Ming-Chi Kuo claims that it isn't so difficult to just implement mmWave support per se, but it's hard to do it well with low power consumption. Also, he says the process nodes for a revised C1 (C1X?!?) in 2026 will be:

The reason mmwave takes up more power is because it supports really wide channels to enable crazy high bandwidth. So per megabit of download I doubt there's a whole lot of difference between traditional cellular bands used for 5G and mmwave used for 5G (assuming relatively similar signal strengths when measured in dbm above the noise floor)

But if you're on mmwave and you download gigabyte after gigabyte yeah it'll burn down your battery quickly, faster than you would on slower 5G because it would take longer to download the same amount of stuff.
 
Reactions: name99

jdubs03

Golden Member
Oct 1, 2013
1,206
848
136
M3 Ultra doesn't do so hot in Geekbench 6 CPU, vs. M4 Max. I guess the real test will be for the GPU.

M3 Ultra:


View attachment 119156

M4 Max vs. M3 Ultra:


View attachment 119157
Wow. Really the only advantage per-se would be the GPU performance.
For reference the M3 Pro/Max scores around 3130 in 1T. Will be interesting to see the power draw too for nT. I doubt we’ll be whelmed though.
 

repoman27

Senior member
Dec 17, 2018
382
537
136
You are asking two questions: how the extra DRAM is handled LOGICALLY, and how it is handle PHYSICALLY.

Note that the bandwidth does not go up at the maximum capacity, and the bandwidth at all capacities is the same, basically 820GB/s.

So my guess is that, for the first time with Apple Silicon, Apple is using the "rank" functionality of DRAM. Effectively "rank" refers to the two sides of a DRAM DIMM.
Imagine a DIMM with chips on one side. There is a single set of wires that goes to the DIMM, so you have a specific fixed bandwidth, and address pins. Now imagine you put a second set of DRAM chips on the other side. All you need to do is add a single additional wire and you can address either one side of the DIMM or the other (you have doubled capacity) but note that you have NOT doubled bandwidth. Of course your memory controller also needs a few tweaks to understand this setup (and, for example, to make sure that it doesn't try to send commands to side A when side B is active and using the DRAM bus...)

Apple has never (as far as I know) used this functionality before for Apple Silicon, but a year or so ago I pointed out a patent (and was mocked for it by certain people...) that described tweaks made to the memory controller to optimized scheduling when using ranks....

As for how this is handled physically, my guess is that, although rank does not HAVE TO mean "side" (of a DIMM or otherwise) doing things two sided is generally the easiest solution. So imagine the traditional M2 Ultra setup
as seen here:
with the DRAM forming wings on the side. We could imagine a second set of DRAM chips on the other side, below the visible "wings".

Of course the DRAMs COULD be stacked, likewise sharing a single set of wires. But that's more expensive, and seems pointless given the easier solution.
The M3 Ultra has a 1024-bit (64-channel) LPDDR5-6400 memory interface. There is no inherent issue with dual-ranked LPDDR; it's actually quite common. Because the M-series memory interfaces are stupid wide, dual-ranking generally isn't necessary. Also, the DRAM is on package, which places constraints on the footprint and z-height of the memory modules. Most LPDDR comes in multi-die packages with the dies stacked and wire-bonded to an organic substrate. Apple was already using quad or octal-die packages containing one or two 4-high stacks just to fully populate the memory bus with a single rank. With the M4 generation, and quite possibly before that, they introduced 8-high stacks in a dual-rank configuration. It turns out that all of the different memory capacities Apple offered for the M4 SoCs were achieved using just two densities of DRAM dies—12 Gbit and 16 Gbit. The M3 Ultra hits 512 GB by utilizing 32 Gbit dies.

I was very skeptical that Apple had shifted to 8-high stacks because it's very difficult to maintain signal integrity when stacking that high using conventional wire bonding. We also knew that 32 Gbit LPDDR5-6400 dies were a thing because NVIDIA used them to hit 512 GB (480 GB addressable + 32 GB for inline ECC) with the Grace CPU. Also, I had never seen conclusive evidence that anyone was using 8-high stacks in a shipping product.

While placing DRAM chips on opposite sides of a PCB in clamshell fashion is a common topology for dual-rank memory, all you really need is the extra dies and chip selects to switch between them.

 
Reactions: poke01
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |