Discussion Apple Silicon SoC thread

Page 191 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,752
1,309
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,765
4,668
136
M1 Ultra 64C is such a waste of money!
IIRC M1 Ultra 48c and 64c have the same number of CPU cores so if Lightroom and HEVC aren't GPU accelerated it's expected they get no speed up.
OTOH I'd expect a better scaling on Blender.
Comparing M1 Max vs M1 Ultra the speedup is higher. But in that case beyond the doubling of GPU cores, there's a doubling in memory bandwidth (and a doubling in number of cores but that is a GPU benchmark so the impact should not be large). So I'll put the blame on memory bandwidth and/or a driver issue.
M1 Ultra has TLB issue, which made scaling pretty much impossible, and very often you would not see any meanigful performance uplift from M1 MAX chip to M1 Ultra, despite having two M1 Max Chips in a single package.

M2 Max and Ultra completely solved this problem, which is the reason why we see so insane performance uplifts, despite measily 25% GPU core count increase.
 

SpudLobby

Senior member
May 18, 2022
961
656
106
DDR5 also has 32 bit data channels meaning that every dimm has 2-channels and dual-dimm 128 bit configuration has 4 channels. Wide memory channels lose bandwidth efficiency compares to narrower channels.
Yeah this is my understanding. Bandwidth is an ideal cap but utilization can vary and splitting the channels up further improves this, along with improving power efficiency.

Here’s a question though: how does the M-series 128B stuff have an 8x16b LPDDR4x or LPDDR5 interface? Shouldn’t it be 4x32? On the other hand this is basically just doubling a phone’s 4x16b interface, so maybe that straightforward.
 

Eug

Lifer
Mar 11, 2000
23,752
1,309
126
IIRC M1 Ultra 48c and 64c have the same number of CPU cores so if Lightroom and HEVC aren't GPU accelerated it's expected they get no speed up.
OTOH I'd expect a better scaling on Blender.
Comparing M1 Max vs M1 Ultra the speedup is higher. But in that case beyond the doubling of GPU cores, there's a doubling in memory bandwidth (and a doubling in number of cores but that is a GPU benchmark so the impact should not be large). So I'll put the blame on memory bandwidth and/or a driver issue.
Yep, the GPU core count is largely irrelevant here.

Lightroom in this bench is likely mostly CPU.
HEVC in this bench is likely mostly neither CPU nor GPU, but the media engine.
 
Reactions: Nothingness

smalM

Member
Sep 9, 2019
63
66
91
Here’s a question though: how does the M-series 128B stuff have an 8x16b LPDDR4x or LPDDR5 interface? Shouldn’t it be 4x32? On the other hand this is basically just doubling a phone’s 4x16b interface, so maybe that straightforward.
M1 and M2 are just the successors of the A12X.
 

Doug S

Platinum Member
Feb 8, 2020
2,507
4,101
136
Here’s a question though: how does the M-series 128B stuff have an 8x16b LPDDR4x or LPDDR5 interface? Shouldn’t it be 4x32? On the other hand this is basically just doubling a phone’s 4x16b interface, so maybe that straightforward.


I don't think it matters, since it acts as a single 128 bit wide channel. I can't find it but I remember reading at some point that it does not operate in smaller chunks.

Which would make sense for Apple since the reason you want more channels (i.e. DDR5 going to 2x32 bit channels instead of DDR4's single 64 bit channel) is for servers that have a lot of concurrent memory heavy processes. Phones/Macs/PCs - especially those using main memory for graphics - perform better with fewer wider memory channels.
 
Reactions: SpudLobby

naukkis

Senior member
Jun 5, 2002
782
637
136
I don't think it matters, since it acts as a single 128 bit wide channel. I can't find it but I remember reading at some point that it does not operate in smaller chunks.

Which would make sense for Apple since the reason you want more channels (i.e. DDR5 going to 2x32 bit channels instead of DDR4's single 64 bit channel) is for servers that have a lot of concurrent memory heavy processes. Phones/Macs/PCs - especially those using main memory for graphics - perform better with fewer wider memory channels.

No they won't. With 64 byte cache lines and 128-bit interface dram burst length needed for fill cache line is just 4 clock cycles. High speed dram isn't designed for such a low length bursts - lpddr5 minimum burst length is 16 cycles and preferred 32. So to fill 64-byte cache line preferred memory configuration is 16 bit wide channels - with any wider memory interface memory interface cannot reach it's peak bandwidth without memory controller being able to combine memory requests together. DDR4 and DDR5 sufffered from that problem too - with DDR4 this resulted suboptimal memory performance - memory's theoretical peak performance for random cache line filling was only half of total memory throughput. DDR5 corrected that problem by dividing memory channels making cache-line length memory accesses more compatible for memory burst lengths.

Apple's soc designs have 64 byte cache lines for L1 and 128 byte cache lines for L2 cache and probably phone-soc derived designs are optimized for L1-cache filling from memory with 16 bit memory channels and bigger configurations are optimized towards more overall memory bandwidth by filling whole L2-line with single memory requests.
 
Last edited:
Reactions: smalM and SpudLobby

SpudLobby

Senior member
May 18, 2022
961
656
106
I don't think it matters, since it acts as a single 128 bit wide channel. I can't find it but I remember reading at some point that it does not operate in smaller chunks.

Which would make sense for Apple since the reason you want more channels (i.e. DDR5 going to 2x32 bit channels instead of DDR4's single 64 bit channel) is for servers that have a lot of concurrent memory heavy processes. Phones/Macs/PCs - especially those using main memory for graphics - perform better with fewer wider memory channels.
That seems completely wrong... I have no idea what you are talking about operating as a single 128 bit wide channel - maybe I am wrong though. Bandwidth utilization still improves generally with more channels. Phones operate as 16x4 anyways afaict
 
Last edited:

SteinFG

Senior member
Dec 29, 2021
523
615
106
I don't think it matters, since it acts as a single 128 bit wide channel. I can't find it but I remember reading at some point that it does not operate in smaller chunks.

Which would make sense for Apple since the reason you want more channels (i.e. DDR5 going to 2x32 bit channels instead of DDR4's single 64 bit channel) is for servers that have a lot of concurrent memory heavy processes. Phones/Macs/PCs - especially those using main memory for graphics - perform better with fewer wider memory channels.
LPDDR5 doesn't work that way. it's 16 bits per channel. Apple doesn't have control over that, as the channel logic is also inside memory chips.
 
Reactions: SpudLobby

Doug S

Platinum Member
Feb 8, 2020
2,507
4,101
136
I have no idea what Doug meant or where he got that honestly, curious if he has a clarification

I've been trying to find what I read but it was quite some time ago.

From what I can remember they were talking about the interface between LPDDR and the SLC, with each group of 128 bits wide of controllers and the custom package it interfaces with working in parallel as a single unit. I don't know the SLC's line size but it wouldn't be limited to the line size of L1.

It is possible I'm remembering something incorrectly or the person who wrote it was mistaken - he was referring to Apple patents which may or may not reflect the reality "on the ground" of Apple's actual implementations.
 

smalM

Member
Sep 9, 2019
63
66
91
All cpus use all channels in parallel, that's how you get the speed.
Maybe it's a non native speaker problem:
My question was about combining channels for a 32b or even a 64b RAM access.
Especially by CPUs which support DDR and LPDDR RAM.
Do they access RAM with 32b channels for DDR5 and 16b channels for LPDDR5?
Or do they combine 2 LPDDR5 channels and so always access RAM with 32b channels?

@Doug S
Looking at the die shots it seems one such block consists of 4 memory controllers.
 
Last edited:

oak8292

Member
Sep 14, 2016
88
69
91
I am not claiming any expertise here but I have always thought that Apple probably purchased DRAM PHY IP from either Synopsis or Cadence. As an extremely large purchaser they probably have some influence over the design but they aren’t doing it alone. Cadence has TSMC 5 nm PHY available.


Synopsis also has memory controller IP available but a less friendly website.


I believe their is a fair amount of ‘generic’ IP that Apple probably purchases to keep costs in check.
 

SpudLobby

Senior member
May 18, 2022
961
656
106
I've been trying to find what I read but it was quite some time ago.

From what I can remember they were talking about the interface between LPDDR and the SLC, with each group of 128 bits wide of controllers and the custom package it interfaces with working in parallel as a single unit. I don't know the SLC's line size but it wouldn't be limited to the line size of L1.

It is possible I'm remembering something incorrectly or the person who wrote it was mistaken - he was referring to Apple patents which may or may not reflect the reality "on the ground" of Apple's actual implementations.
Well at any rate, I’m pretty sure the channels are 16b for the M1 and at worst 32B for the M2 (though I think it’s just 16b again I don’t know why they’d change it) and work that way as this is afaict is most efficient use of bandwidth as opposed to wider implementations (of the channels themselves). Theoretical max bandwidth doesn’t change of course but interpolating within that maximum, the utilization improves to my understanding as well as offering power benefits

Could have been Maynard exaggerating or daydreaming I assume re: patents. He’s hit the mark before but he’s also into speculation, so.
 

Doug S

Platinum Member
Feb 8, 2020
2,507
4,101
136
Well at any rate, I’m pretty sure the channels are 16b for the M1 and at worst 32B for the M2 (though I think it’s just 16b again I don’t know why they’d change it) and work that way as this is afaict is most efficient use of bandwidth as opposed to wider implementations (of the channels themselves). Theoretical max bandwidth doesn’t change of course but interpolating within that maximum, the utilization improves to my understanding as well as offering power benefits

Could have been Maynard exaggerating or daydreaming I assume re: patents. He’s hit the mark before but he’s also into speculation, so.

FWIW it wasn't Maynard that wrote what I'm referring to. Can't remember who it was but it wasn't him - though I can see why you'd assume that since he loves to speculate based on Apple's patents.
 
Reactions: SpudLobby

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
I like to buy base MacBook airs/pros and use x86 for desktop. Best of both worlds.
Games are really the only thing keeping me off Macs since the M-series SoCs came out. But with the Unreal Engine now available for M-Series development, that may well change enough for me in 2-3 years time.
 

Mopetar

Diamond Member
Jan 31, 2011
8,015
6,465
136
I'll believe that Apple/Mac is serious about PC gaming when there's a mature, officially supported Steam client for it.

Huh? I have Steam installed on my M1 MBP and haven't noticed anything odd about it. I can't recall it crashing or acting up outside of it not wanting to render the store page, but swapping to library and back again fixes whatever causes that.
 
Reactions: scannall

LightningZ71

Golden Member
Mar 10, 2017
1,661
1,946
136
Interesting. I haven't fooled with it much since early last year. I have been told that it's a "try it and see if it works" kind of thing. For those of you that have steam on MAC, how good is game compatibility/performance?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |