Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Glo. · Jul 4, 2023

igor_kavinski said:
M1 Ultra 64C is such a waste of money!

Nothingness said:
IIRC M1 Ultra 48c and 64c have the same number of CPU cores so if Lightroom and HEVC aren't GPU accelerated it's expected they get no speed up.
OTOH I'd expect a better scaling on Blender.
Comparing M1 Max vs M1 Ultra the speedup is higher. But in that case beyond the doubling of GPU cores, there's a doubling in memory bandwidth (and a doubling in number of cores but that is a GPU benchmark so the impact should not be large). So I'll put the blame on memory bandwidth and/or a driver issue.

M1 Ultra has TLB issue, which made scaling pretty much impossible, and very often you would not see any meanigful performance uplift from M1 MAX chip to M1 Ultra, despite having two M1 Max Chips in a single package.

M2 Max and Ultra completely solved this problem, which is the reason why we see so insane performance uplifts, despite measily 25% GPU core count increase.

SpudLobby · Jul 4, 2023

naukkis said:
DDR5 also has 32 bit data channels meaning that every dimm has 2-channels and dual-dimm 128 bit configuration has 4 channels. Wide memory channels lose bandwidth efficiency compares to narrower channels.

Yeah this is my understanding. Bandwidth is an ideal cap but utilization can vary and splitting the channels up further improves this, along with improving power efficiency.

Here’s a question though: how does the M-series 128B stuff have an 8x16b LPDDR4x or LPDDR5 interface? Shouldn’t it be 4x32? On the other hand this is basically just doubling a phone’s 4x16b interface, so maybe that straightforward.

Eug · Jul 4, 2023

Nothingness said:
IIRC M1 Ultra 48c and 64c have the same number of CPU cores so if Lightroom and HEVC aren't GPU accelerated it's expected they get no speed up.
OTOH I'd expect a better scaling on Blender.
Comparing M1 Max vs M1 Ultra the speedup is higher. But in that case beyond the doubling of GPU cores, there's a doubling in memory bandwidth (and a doubling in number of cores but that is a GPU benchmark so the impact should not be large). So I'll put the blame on memory bandwidth and/or a driver issue.

Yep, the GPU core count is largely irrelevant here.

Lightroom in this bench is likely mostly CPU.
HEVC in this bench is likely mostly neither CPU nor GPU, but the media engine.

smalM · Jul 5, 2023

SpudLobby said:
Here’s a question though: how does the M-series 128B stuff have an 8x16b LPDDR4x or LPDDR5 interface? Shouldn’t it be 4x32? On the other hand this is basically just doubling a phone’s 4x16b interface, so maybe that straightforward.

M1 and M2 are just the successors of the A12X.

Doug S · Jul 5, 2023

SpudLobby said:
Here’s a question though: how does the M-series 128B stuff have an 8x16b LPDDR4x or LPDDR5 interface? Shouldn’t it be 4x32? On the other hand this is basically just doubling a phone’s 4x16b interface, so maybe that straightforward.

I don't think it matters, since it acts as a single 128 bit wide channel. I can't find it but I remember reading at some point that it does not operate in smaller chunks.

Which would make sense for Apple since the reason you want more channels (i.e. DDR5 going to 2x32 bit channels instead of DDR4's single 64 bit channel) is for servers that have a lot of concurrent memory heavy processes. Phones/Macs/PCs - especially those using main memory for graphics - perform better with fewer wider memory channels.

naukkis · Jul 6, 2023

Doug S said:
I don't think it matters, since it acts as a single 128 bit wide channel. I can't find it but I remember reading at some point that it does not operate in smaller chunks.

Which would make sense for Apple since the reason you want more channels (i.e. DDR5 going to 2x32 bit channels instead of DDR4's single 64 bit channel) is for servers that have a lot of concurrent memory heavy processes. Phones/Macs/PCs - especially those using main memory for graphics - perform better with fewer wider memory channels.

No they won't. With 64 byte cache lines and 128-bit interface dram burst length needed for fill cache line is just 4 clock cycles. High speed dram isn't designed for such a low length bursts - lpddr5 minimum burst length is 16 cycles and preferred 32. So to fill 64-byte cache line preferred memory configuration is 16 bit wide channels - with any wider memory interface memory interface cannot reach it's peak bandwidth without memory controller being able to combine memory requests together. DDR4 and DDR5 sufffered from that problem too - with DDR4 this resulted suboptimal memory performance - memory's theoretical peak performance for random cache line filling was only half of total memory throughput. DDR5 corrected that problem by dividing memory channels making cache-line length memory accesses more compatible for memory burst lengths.

Apple's soc designs have 64 byte cache lines for L1 and 128 byte cache lines for L2 cache and probably phone-soc derived designs are optimized for L1-cache filling from memory with 16 bit memory channels and bigger configurations are optimized towards more overall memory bandwidth by filling whole L2-line with single memory requests.

SpudLobby · Jul 6, 2023

Doug S said:
I don't think it matters, since it acts as a single 128 bit wide channel. I can't find it but I remember reading at some point that it does not operate in smaller chunks.

Which would make sense for Apple since the reason you want more channels (i.e. DDR5 going to 2x32 bit channels instead of DDR4's single 64 bit channel) is for servers that have a lot of concurrent memory heavy processes. Phones/Macs/PCs - especially those using main memory for graphics - perform better with fewer wider memory channels.

That seems completely wrong... I have no idea what you are talking about operating as a single 128 bit wide channel - maybe I am wrong though. Bandwidth utilization still improves generally with more channels. Phones operate as 16x4 anyways afaict

SteinFG · Jul 9, 2023

Doug S said:
I don't think it matters, since it acts as a single 128 bit wide channel. I can't find it but I remember reading at some point that it does not operate in smaller chunks.

Which would make sense for Apple since the reason you want more channels (i.e. DDR5 going to 2x32 bit channels instead of DDR4's single 64 bit channel) is for servers that have a lot of concurrent memory heavy processes. Phones/Macs/PCs - especially those using main memory for graphics - perform better with fewer wider memory channels.

LPDDR5 doesn't work that way. it's 16 bits per channel. Apple doesn't have control over that, as the channel logic is also inside memory chips.

smalM · Jul 12, 2023

SteinFG said:
LPDDR5 doesn't work that way. it's 16 bits per channel. Apple doesn't have control over that, as the channel logic is also inside memory chips.

Why should a CPU not be able to use two or four LPDDR channels in parallel?

SteinFG · Jul 12, 2023

smalM said:
Why should a CPU not be able to use two or four LPDDR channels in parallel?

All cpus use all channels in parallel, that's how you get the speed.

There's no cpu that accesses its channels sequentially, because by doing that you'd only reach the speed of a single memory channel.

SpudLobby · Jul 12, 2023

I have no idea what Doug meant or where he got that honestly, curious if he has a clarification

Doug S · Jul 13, 2023

SpudLobby said:
I have no idea what Doug meant or where he got that honestly, curious if he has a clarification

I've been trying to find what I read but it was quite some time ago.

From what I can remember they were talking about the interface between LPDDR and the SLC, with each group of 128 bits wide of controllers and the custom package it interfaces with working in parallel as a single unit. I don't know the SLC's line size but it wouldn't be limited to the line size of L1.

It is possible I'm remembering something incorrectly or the person who wrote it was mistaken - he was referring to Apple patents which may or may not reflect the reality "on the ground" of Apple's actual implementations.

smalM · Jul 13, 2023

SteinFG said:
All cpus use all channels in parallel, that's how you get the speed.

Maybe it's a non native speaker problem:
My question was about combining channels for a 32b or even a 64b RAM access.
Especially by CPUs which support DDR and LPDDR RAM.
Do they access RAM with 32b channels for DDR5 and 16b channels for LPDDR5?
Or do they combine 2 LPDDR5 channels and so always access RAM with 32b channels?

@Doug S
Looking at the die shots it seems one such block consists of 4 memory controllers.

oak8292 · Jul 13, 2023

I am not claiming any expertise here but I have always thought that Apple probably purchased DRAM PHY IP from either Synopsis or Cadence. As an extremely large purchaser they probably have some influence over the design but they aren’t doing it alone. Cadence has TSMC 5 nm PHY available.

https://www.cadence.com/content/dam/cadence-www/global/en_US/documents/tools/ip/design-ip/ddr-phy-ip-for-tsmc-5nm-ip-br.pdf

Synopsis also has memory controller IP available but a less friendly website.

https://www.synopsys.com/dw/ipdir.php?ds=dwc_lpddr54_controller

I believe their is a fair amount of ‘generic’ IP that Apple probably purchases to keep costs in check.

SpudLobby · Jul 13, 2023

Doug S said:
I've been trying to find what I read but it was quite some time ago.

From what I can remember they were talking about the interface between LPDDR and the SLC, with each group of 128 bits wide of controllers and the custom package it interfaces with working in parallel as a single unit. I don't know the SLC's line size but it wouldn't be limited to the line size of L1.

It is possible I'm remembering something incorrectly or the person who wrote it was mistaken - he was referring to Apple patents which may or may not reflect the reality "on the ground" of Apple's actual implementations.

Well at any rate, I’m pretty sure the channels are 16b for the M1 and at worst 32B for the M2 (though I think it’s just 16b again I don’t know why they’d change it) and work that way as this is afaict is most efficient use of bandwidth as opposed to wider implementations (of the channels themselves). Theoretical max bandwidth doesn’t change of course but interpolating within that maximum, the utilization improves to my understanding as well as offering power benefits

Could have been Maynard exaggerating or daydreaming I assume re: patents. He’s hit the mark before but he’s also into speculation, so.

Doug S · Jul 14, 2023

SpudLobby said:
Well at any rate, I’m pretty sure the channels are 16b for the M1 and at worst 32B for the M2 (though I think it’s just 16b again I don’t know why they’d change it) and work that way as this is afaict is most efficient use of bandwidth as opposed to wider implementations (of the channels themselves). Theoretical max bandwidth doesn’t change of course but interpolating within that maximum, the utilization improves to my understanding as well as offering power benefits

Could have been Maynard exaggerating or daydreaming I assume re: patents. He’s hit the mark before but he’s also into speculation, so.

FWIW it wasn't Maynard that wrote what I'm referring to. Can't remember who it was but it wasn't him - though I can see why you'd assume that since he loves to speculate based on Apple's patents.

Ajay · Jul 14, 2023

Doug S said:
FWIW it wasn't Maynard that wrote what I'm referring to. Can't remember who it was but it wasn't him - though I can see why you'd assume that since he loves to speculate based on Apple's patents.

Is he still kicking around Kanter's forums?

smalM · Jul 14, 2023

StinkyPinky · Jul 20, 2023

Honestly I sold my M1 Max laptop for a pretty penny. I barely used it. Great piece of hardware but I still prefer the versatility that Windows/x86 offers.

poke01 · Jul 21, 2023

StinkyPinky said:
Honestly I sold my M1 Max laptop for a pretty penny. I barely used it. Great piece of hardware but I still prefer the versatility that Windows/x86 offers.

I like to buy base MacBook airs/pros and use x86 for desktop. Best of both worlds.

Ajay · Jul 21, 2023

poke01 said:
I like to buy base MacBook airs/pros and use x86 for desktop. Best of both worlds.

Games are really the only thing keeping me off Macs since the M-series SoCs came out. But with the Unreal Engine now available for M-Series development, that may well change enough for me in 2-3 years time.

LightningZ71 · Jul 21, 2023

I'll believe that Apple/Mac is serious about PC gaming when there's a mature, officially supported Steam client for it.

igor_kavinski · Jul 21, 2023

LightningZ71 said:
I'll believe that Apple/Mac is serious about PC gaming when there's a mature, officially supported Steam client for it.

What's wrong with the current client? (admittedly, never tried to use it)

Mopetar · Jul 21, 2023

LightningZ71 said:
I'll believe that Apple/Mac is serious about PC gaming when there's a mature, officially supported Steam client for it.

Huh? I have Steam installed on my M1 MBP and haven't noticed anything odd about it. I can't recall it crashing or acting up outside of it not wanting to render the store page, but swapping to library and back again fixes whatever causes that.

LightningZ71 · Jul 21, 2023

Interesting. I haven't fooled with it much since early last year. I have been told that it's a "try it and see if it works" kind of thing. For those of you that have steam on MAC, how good is game compatibility/performance?

Discussion Apple Silicon SoC thread

Lifer

Diamond Member

Senior member

Lifer

Member

Platinum Member

Senior member

Senior member

Senior member

Member

Senior member

Senior member

Platinum Member

Member

Member

Senior member

Platinum Member

Lifer

Member

Diamond Member

Golden Member

Lifer

Golden Member

Lifer

Diamond Member

Golden Member