Discussion Apple Silicon SoC thread

Page 190 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,808
1,387
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,212
2,483
136
I'm not sure those NAND chips are necessarily comparable. Unlike in a phone, in a SSD power isn't too much of a concern.
I'm doubt that NAND chips differ as much from one manufacturer to another as you seem to believe.

Controller chips yes, actual NAND Flash memory dies that they control not so much.

They all share the same basic, flawed design - something easily noticed feeling a USB thumb drive heat up to uncomfortable levels merely from light use.

Flash is overdue to be replaced as the SSD memory of choice not just for its wear mechanism, but also for its basic power consumption, which even after decades of R&D investment is still terrible.

I was hoping that the great Stuart Parkin's racetrack memory savior tech would have materialised by now, but alas not a peep in years 😭
 

Doug S

Platinum Member
Feb 8, 2020
2,756
4,685
136
I'm doubt that NAND chips differ as much from one manufacturer to another as you seem to believe.

Controller chips yes, actual NAND Flash memory dies that they control not so much.

They all share the same basic, flawed design - something easily noticed feeling a USB thumb drive heat up to uncomfortable levels merely from light use.

Flash is overdue to be replaced as the SSD memory of choice not just for its wear mechanism, but also for its basic power consumption, which even after decades of R&D investment is still terrible.

I was hoping that the great Stuart Parkin's racetrack memory savior tech would have materialised by now, but alas not a peep in years 😭


I'm not saying they differ between manufacturers, but different SKUs from the same manufacturer. You know, like how they have lower power DRAM chips, lower power CPUs, lower power displays and so forth. Phones have a very different power budget so it would make sense that if NAND can be designed to require less power to read/write/erase even if it is more expensive it would be desirable for use in smartphones where it would make zero sense in an SSD going into a PC plugged into the wall (or even a laptop that sometimes but not always runs on battery since that battery is over an order of magnitude larger than a smartphone battery)

And, you know, the reason why you observe USB thumb drive "heating up to uncomfortable levels from light use" might have to do with the choice of NAND chips not just the choice of controller. In a market where price is all that matters to 99% of people (who is buying a high end USB thumb drive?) you are going to use whatever is cheapest even if it has very poor power efficiency and runs hot. Because advertising a lower price per gigabyte all that matters in that market.
 
Reactions: Tlh97 and scannall

soresu

Diamond Member
Dec 19, 2014
3,212
2,483
136
I'm not saying they differ between manufacturers, but different SKUs from the same manufacturer. You know, like how they have lower power DRAM chips
Still nope - binning affects power draw a bit, but nowhere near that much.

The tech is fundamentally flawed, and I cannot emphasize that point enough - it is not a mobile friendly memory tech at all for power consumption and thermal power density.

The real difference is basically in controller efficiency and IO bus where power draw is concerned.

It's not unlike the dark silicon problem made popular by ARM - if you are going to try reading a solid state memory chip at any significant speed you are activating as many parallel areas of the die at the same time as possible which increases the power density per area significantly, then increase that by higher frequency for saturating PCIe5 bandwidth and you are asking for trouble without active cooling.

If not for the huge investment that the industry has already sunk in to NAND R&D and production they would be moving a lot faster on a replacement - alas while they have something working with NAND they will exploit it until it is no longer viable to continue 3D multilayer or node pitch area scaling, and I believe that area scaling is pretty much done at this point.

And, you know, the reason why you observe USB thumb drive "heating up to uncomfortable levels from light use" might have to do with the choice of NAND chips not just the choice of controller
No - it's to do with the fact that most thumb drives are just chunks of plastic surrounding the circuit board.

Plastic isn't exactly famous for being the best thermal conductor or radiator.

Though yes, cost does mean lower standard of controller too which doesn't help.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,756
4,685
136
Still nope - binning affects power draw a bit, but nowhere near that much.

So you are saying you KNOW FOR A FACT that one cannot design NAND arrays differently so they consume less power, something we know they can do for DRAM? LPDDR chips are different than standard DRAM chips and operate on lower voltages. You are saying the same is impossible for NAND?
 

soresu

Diamond Member
Dec 19, 2014
3,212
2,483
136
LPDDR chips are different than standard DRAM chips and operate on lower voltages
Unless I have been woefully misinformed LPDDR is the IO bus that transfers data between the DRAM chips on the DIMM itself and the CPU socket.

LPDDR DIMMs operate at lower voltages by using narrower channel widths vs regular DDR memory standards.

As quoted from the Synopsys website:

LPDDR DRAM channels are typically 16- or 32-bits wide, in contrast to the typical standard DDR DRAM channels which are 64-bit wide.

So while you are getting what looks like greater MT/s vs regular DDR in cases like GPD WIN4 with LPDDR5-7500, in reality this is more like DDR5-3750 in terms of actual performance, so APUs running on AM4/5 with DDR4/5 definitely have a serious advantage in terms of bandwidth on top of clocking potential in a less constrained thermal environment.

There are no free lunches I'm afraid while running the same basic memory technology.*

There's a cost one way or another - either in perf with LPDDR, or in economy with HBM where you need lots of pins to support a wider bus per stack running inside the socket package itself for minimal operating power at point blank distance from the processor.

*Not without a sea change in how DRAM is designed overall, such as the upcoming shift to capacitorless DRAM devices required for scaling to sub 10nm nodes. This increases DRAM data retention time to >400s, reducing refresh frequency for idle memory cells - which should reduce power significantly for mobile devices, especially ones not using their radios. I'm not sure if this change means partial non volatility if the power is cut temporarily🤔
 
Jul 27, 2020
19,850
13,608
146
Plastic isn't exactly famous for being the best thermal conductor or radiator.
I have had a few metallic USBs fail after overheating (Sandisk and Kingston) so I guess they are saving pennies on the thermal interface material. Or the generated heat really needs a bigger heatsink with more surface area.
 
Reactions: soresu
Jul 27, 2020
19,850
13,608
146
Damn, hope there was nothing critical on them.
My very first metallic Sandisk failure was in 2008. It hurt. After that, stopped trusting Sandisk USBs and always backed up the data on USBs to my HDDs. The Kingston UFD failure was surprising coz it was really well-built and I expected more from it due to their good reputation in RAM products. Thankfully, it only contained some downloaded videos. I got myself a SATA SSD in an enclosure after my last UFD failure (stupid Lexar 256GB). I think the data survived but it went into permanent write protect mode which probably means the NAND wore out from too much writing (again used primarily for downloading videos).
 
Reactions: soresu

ikjadoon

Senior member
Sep 4, 2006
220
480
146
So while you are getting what looks like greater MT/s vs regular DDR in cases like GPD WIN4 with LPDDR5-7500, in reality this is more like DDR5-3750 in terms of actual performance, so APUs running on AM4/5 with DDR4/5 definitely have a serious advantage in terms of bandwidth on top of clocking potential in a less constrained thermal environment.

Just to share: LPDDR5 is designed with [many] more channels than socketed DDR5: 4x 16-bit channels in mobile and 8x 16-bit channels on laptops. So LPDDR5 bandwidth is similar, and sometimes notably higher, than DDR5.

Anandtech: So for a high-end phone where 64-bit memory buses are common, we’d be looking at over 50GB/sec of memory bandwidth, and over 100GB/sec for a standard 128-bit bus PC.
Anandtech: One large feature of both chips is their much-increased memory bandwidth and interfaces – the M1 Pro features 256-bit LPDDR5 memory at 6400MT/s speeds, corresponding to 204GB/s bandwidth. This is significantly higher than the M1 at 68GB/s, and also generally higher than competitor laptop platforms which still rely on 128-bit interfaces.

...


Apple [M1 Max] also doubles up on the memory interfaces, using a whopping 512-bit wide LPDDR5 memory subsystem – unheard of in an SoC and even rare amongst historical discrete GPU designs. This gives the chip a massive 408GB/s of bandwidth – how this bandwidth is accessible to the various IP blocks on the chip is one of the things we’ll be investigating today.

Mobile phones & some laptops: 64-bit width
Most laptops & all consumer desktops: 128-bit width (similar to DDR5 dual-channel)
M1 Pro: 256-bit width (similar to DDR5 quad-channel)
M1 Max: 512-bit width (similar to DDR5 octa-channel)

You're right on the free lunch: LPDDR5 takes a hit to latencies, wake up time, etc. due to its longer refresh interval & voltages versus socketed DDR5. Just that LPDDR5 tested bandwidth is comparable & real to DDR5.

Some of NVIDIA's higher-end GPU-CPU platforms rely on LPDDR5, instead of DDR5.

NVIDIA: Compared to an eight-channel DDR5 design, the NVIDIA Grace CPU LPDDR5X memory subsystem provides up to 53% more bandwidth at one-eighth the power per gigabyte per second while being similar in cost. An HBM2e memory subsystem would have provided substantial memory bandwidth and good energy efficiency but at more than 3x the cost-per-gigabyte and only one-eighth the maximum capacity available with LPDDR5X.

//

I have had a few metallic USBs fail after overheating (Sandisk and Kingston) so I guess they are saving pennies on the thermal interface material. Or the generated heat really needs a bigger heatsink with more surface area.

On this note, I'd agree size is a major difference, too: a USB drive is notably smaller than a phone & the USB protocol overhead may be a factor, too.

Likewise, the workload is often quite different.

USB drive: pummeled with 10s of GBs, read at ~2 GB/s from an NVMe SSD
Phone: cheerfully taking in ~1 GB at ~100 MB/s (bottlenecked by network speed)

Phone storage (with a larger surface area) is less abused than USB storage (with a smaller surface area).

Now, it'd be neat to compare the joules consumed in transferring 10GB on USB 3.2 Gen2 on Android vs a high-end USB drive. Though, we'd need an Android phone, as iPhones are frustratingly limited to 480 Mbps / USB 2.0 protocol.

Anecdotally, my USB 2.0 drives always seem to run quite cool, even in plastic enclosures.
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
Ooof, that's an expensive habit even so 🤑

It's sad that phone manufacturers are so damn cheap though - and that's not an Apple specific comment, it's an industry wide grift.

You can get a 2 TB external SSD for less than £100 these days - to charge so much for a mere 1/8th the space is truly miserly.
FWIW my phone (combined with my watch/airpods) basically powers everything from meetings to exercise. It is a mission critical device for me, so $699 every year or do is well worth the investment.
 

ikjadoon

Senior member
Sep 4, 2006
220
480
146
So even a 4 DIMM slot mobo still has just 2x 64 bit channels?

You got it right. Adding in the 3rd & 4th DIMM increases capacity, but it can't increase bandwidth. The 3rd & 4th DIMMs share one of the 64-bit channels with the 1st & 2nd DIMMs.

1DPC = 1 DIMM per channel
2DPC = 2 DIMM per channel

1DPC
64-bit channel 1: 1st DIMM
64-bit channel 2: 2nd DIMM

2DPC
64-bit channel 1: 1st and 3rd DIMM
64-bit channel 2: 2nd and 4th DIMM

In that case, 4x 8GB is akin to 2x16GB; with 4x8GB, 2x 8GB DIMMs share one 64-bit channel.

AnandTech tested DDR5 1DPC vs 2DPC, where bandwidth is confirmed virtually the same, as the # channels are set by the CPU's IMC + the motherboard; the RAM slots in just wherever.

This is all assuming the motherboard + CPU here are dual-channel maximally (which is most consumer CPUs). Thus, almost all consumer CPUs are 128-bit DDR5 (dual-channel 64-bit) or 128-bit LPDDR5 (octa-channel 16-bit). But, in total, it's still 128-bit.

//

So how can LPDDR5 hit ludicrous speeds that we hardly see on desktop DDR5? I'm thinking of LPDDR5T-9600 stock, but no desktop CPU rated over DDR5-5600. LPDDR5 has smaller capacities → fewer packages per channel and then always soldered → shorter interconnects:

Synopsys: Since SoCs for such applications tend to have fewer memory devices on each channel and shorter interconnects, the LPDDR DRAMs can run faster than the standard DDR DRAMs, (for example, LPDDR4/4X DRAMs run at up to 4267 Mbps and standard DDR4 DRAMs run up to 3200 Mbps), thereby providing higher performance.

Phone with 16GB LPDDR5T (PoP): 1x DRAM package → CPU pins
Desktop with a 16GB DDR5 (DIMM): 8x DRAM packages → DIMM PCB → DIMM pins → motherboard pins → motherboard PCB → CPU pins

LPDDR5 is actually not a derivative of DDR5; each get developed independently by JEDEC. I believe that gives JEDEC + RAM manufacturers much more flexibility to optimize for each use-case and not worry about the other spec's design at all.

//

One final thing that I've re-learned: LPDDR5 devices can also be designed for 32-bit channels (vs the typical 16-bit), but I've never confirmed which LPDDR5 devices have 16-bit vs 32-bit channels.
 
Reactions: soresu

soresu

Diamond Member
Dec 19, 2014
3,212
2,483
136
LPDDR5 has smaller capacities → fewer packages per channel and then always soldered → shorter interconnects
Yeah shorter distance providing power consumption benefits is a given as with HBM.

🤤I dream of capacitorless 3D DRAM with a quantity of cell layers seen in current 3D NAND so that we can finally break free of this super slow growth in system/VRAM memory density since they hit 1x nm nodes years ago.

Combine that with HBM production scaling up for economy and true mass adoption.....

Geeb eet toooo meeeeeehh 😂
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,756
4,685
136
Is it? It looks like some of those benchmarks aren’t scaling with core count.

I am curious how it scales in other benchmarks.

The question is WHY is it not scaling with core count. Is that all the cores it can use (i.e. does it do similarly on x86 CPUs) or is it hitting some other limit like memory bandwidth?

Depending on what exactly it is doing, if it has system calls happening in each process it could be a scheduling limitation or lock contention. It is just a supposition, but OS X is probably not as well optimized for operating with many cores as Windows, let alone Linux, because there was little history of high core count Macs. Basically the only Macs with double digit number of cores pre Apple Silicon was the last two generations of x86 Mac Pro. You have to run stuff on hardware with a lot of cores before you can run into issues that show you what needs to be fixed. Certainly no one reviewing Macs would have run into this issue because they probably weren't reviewing Mac Pro and even if they were very few reviewers are thorough enough to even realize this is a thing, let alone know how to go about testing it.
 
Reactions: ashFTW

Nothingness

Diamond Member
Jul 3, 2013
3,072
2,066
136
IIRC M1 Ultra 48c and 64c have the same number of CPU cores so if Lightroom and HEVC aren't GPU accelerated it's expected they get no speed up.
OTOH I'd expect a better scaling on Blender.
Comparing M1 Max vs M1 Ultra the speedup is higher. But in that case beyond the doubling of GPU cores, there's a doubling in memory bandwidth (and a doubling in number of cores but that is a GPU benchmark so the impact should not be large). So I'll put the blame on memory bandwidth and/or a driver issue.

EDIT: see @Glo. post down below about TLB which very likely explains the GPU scaling issue.
 
Last edited:
Reactions: Mopetar and Eug

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
Unless I have been woefully misinformed LPDDR is the IO bus that transfers data between the DRAM chips on the DIMM itself and the CPU socket.

LPDDR DIMMs operate at lower voltages by using narrower channel widths vs regular DDR memory standards.

As quoted from the Synopsys website:



So while you are getting what looks like greater MT/s vs regular DDR in cases like GPD WIN4 with LPDDR5-7500, in reality this is more like DDR5-3750 in terms of actual performance, so APUs running on AM4/5 with DDR4/5 definitely have a serious advantage in terms of bandwidth on top of clocking potential in a less constrained thermal environment.

There are no free lunches I'm afraid while running the same basic memory technology.*

There's a cost one way or another - either in perf with LPDDR, or in economy with HBM where you need lots of pins to support a wider bus per stack running inside the socket package itself for minimal operating power at point blank distance from the processor.

*Not without a sea change in how DRAM is designed overall, such as the upcoming shift to capacitorless DRAM devices required for scaling to sub 10nm nodes. This increases DRAM data retention time to >400s, reducing refresh frequency for idle memory cells - which should reduce power significantly for mobile devices, especially ones not using their radios. I'm not sure if this change means partial non volatility if the power is cut temporarily🤔
LPDDRx interfaces are quad channel usually when the DDRx interfaces are dual channel. They're still 128b wide in both instances.

To phrase things differently: what we usually refer to as "dual channel" is actually a 128b memory bus. So when we're talking about what APUs support, they support running quad-channel LPDDRx (4x32b) and dual-channel DDRx (2x64b) using the same physical memory interface on die (which is 128b).
 

naukkis

Senior member
Jun 5, 2002
896
779
136
LPDDRx interfaces are quad channel usually when the DDRx interfaces are dual channel. They're still 128b wide in both instances.

To phrase things differently: what we usually refer to as "dual channel" is actually a 128b memory bus. So when we're talking about what APUs support, they support running quad-channel LPDDRx (4x32b) and dual-channel DDRx (2x64b) using the same physical memory interface on die (which is 128b).

DDR5 also has 32 bit data channels meaning that every dimm has 2-channels and dual-dimm 128 bit configuration has 4 channels. Wide memory channels lose bandwidth efficiency compares to narrower channels.
 
Reactions: SpudLobby

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
DDR5 also has 32 bit data channels meaning that every dimm has 2-channels and dual-dimm 128 bit configuration has 4 channels. Wide memory channels lose bandwidth efficiency compares to narrower channels.
And that is why I wrote DDRx and not DDR5

Wanted to keep things simple
 
Reactions: SpudLobby
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |