Discussion Apple Silicon SoC thread

Page 298 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,749
1,281
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

okoroezenwa

Member
Dec 22, 2020
54
52
61
You can't expect me to notice something that obvious. 🤣

Based on that, I measured 13.10 mm x 12.71 mm = 166.5 mm², which is right in the same neighborhood of what y'all came up with already.


The Thunderbolt and PCIe blocks are super easy to pick out. I didn't spend much time on this, so apologies if it's hard to read.

View attachment 101499
The blue blocks are what I suspect to be the display controllers. It looks like there might be three this go round, but we'll probably have to wait for an M4 Mac to know for sure.
Interesting. If all this speculation is correct, it looks like they’re tackling the weaknesses in the M3 and below. I wonder if they’ll budge on RAM + storage in that case. I would guess the M4 MBP finally gets the 16GB starting RAM it deserves but Apple can always surprise in the worst way ¯\_(ツ)_/¯
 

Doug S

Platinum Member
Feb 8, 2020
2,478
4,035
136
Has anyone seen any information on whether the neural engine has access to all the memory bandwidth available to the larger chips? Mac Studios have been selling a lot to machine learning researchers, because of the ability to run large models on Max and Ultra devices. But these are running on the GPU because language models, especially are bandwidth limited In addition to the actual compute available.

But looking at the dies the ANE appears to be on the far side Of the CPU L2 cache. if Apple intends to significantly boost AI capabilities, it seem to me that they would need to make sure those IP blocks have access to full memory bandwidth. do they have such access?

I doubt the NPU has sufficient computational resources to use all the memory bandwidth that exists. If you look at Nvidia compared to Apple's NPU I think you'd have to multiply the Apple NPU's computational power by more times than their memory bandwidth to reach the level of Nvidia's high end stuff. That tells me that the memory bandwidth of even a non Pro M3/M4 is likely more than the NPU is able to use.

When they double/triple the size of that NPU as I expect they will in the near future, then maybe it will be able to, but unless they add more NPU cores in the Pro/Max/Ultra like they do with CPU and GPU cores it won't matter, because it certainly couldn't soak up much of the total memory bandwidth in a Max, let alone Ultra.
 
Reactions: SpudLobby

SpudLobby

Senior member
May 18, 2022
961
655
106
I doubt the NPU has sufficient computational resources to use all the memory bandwidth that exists. If you look at Nvidia compared to Apple's NPU I think you'd have to multiply the Apple NPU's computational power by more times than their memory bandwidth to reach the level of Nvidia's high end stuff. That tells me that the memory bandwidth of even a non Pro M3/M4 is likely more than the NPU is able to use.

When they double/triple the size of that NPU as I expect they will in the near future, then maybe it will be able to, but unless they add more NPU cores in the Pro/Max/Ultra like they do with CPU and GPU cores it won't matter, because it certainly couldn't soak up much of the total memory bandwidth in a Max, let alone Ultra.
Yep
 

FlameTail

Diamond Member
Dec 15, 2021
3,144
1,790
106
How much memory bandwidth does the NPU actually use.

Chips and Cheese should do an article comparing the NPUs of AMD vs Qualcomm vs Apple vs Intel
 

The Hardcard

Member
Oct 19, 2021
124
177
86
I doubt the NPU has sufficient computational resources to use all the memory bandwidth that exists. If you look at Nvidia compared to Apple's NPU I think you'd have to multiply the Apple NPU's computational power by more times than their memory bandwidth to reach the level of Nvidia's high end stuff. That tells me that the memory bandwidth of even a non Pro M3/M4 is likely more than the NPU is able to use.

When they double/triple the size of that NPU as I expect they will in the near future, then maybe it will be able to, but unless they add more NPU cores in the Pro/Max/Ultra like they do with CPU and GPU cores it won't matter, because it certainly couldn't soak up much of the total memory bandwidth in a Max, let alone Ultra.
Yes, I was referring to your wager of doubling or tripling the neural engine on the M5. What I am thinking about is the future as well, I also don’t see a present problem.

The issue is I think Apple future chips are also going to lean harder into matrix units and compute TOPS. I feel though that those units will need access to the full memory bus and am wondering if just supersizing the ANE in the current layout would allow for that.

On the A and base M chips it is not a problem as it appears the CPU block saturate a 128-bit bus. But the solution needs to scale to the Pro, Max, and Ultra (Extreme?) While certain inferencing tasks are bandwidth limited on Max and Ultra there are other tasks that are severely compute constrained. For LLMs token generation is is tolerable and already pretty much what the bandwidth will allow, even on the M1 series.

But for complex prompts and extended contexts, the delay in time to first token is severely compute starved. The Ultras can generate tokens and more than half the speed of a 4090 with LPDDR5 6400 and moving to faster memory will boost that higher.

But, even for 128 K context windows and huge prompts, the 4090 has the compute resources to maintain and true interactive sessions with time to first token in 10 to 20 seconds whereas, even Ultras can take 20 to 30 minutes. If Apple could mitigate delay by boosting the compute resources with full memory bandwidth access with even just LPDDR-9600 (1200 GB/s bandwidth on the Ultra) it would be a game changer.

Mac Studio AI clusters are showing near linear scaling so far up to 4 boxes (768 GB of GPU accelerated RAM (not to mention an M3 Ultra cluster would have had 1 TB RAM). Boosting the compute to more closely match the bandwidth and Apple could pull certain niche AI markets from Nvidia. Not only could they move heavily into the AI and data science research markets, but there could be other companies deploying Mx Ultras in datcenters, especially as the Macs are massively more energy efficient than Nvidia.

A proper GPU bus side compute solution could make the Mac Studio Apple’s second biggest revenue generator as well as boost sales of top-end Macbook Pros. But would just boosting the ANE resources as you suggested might go into die space for M5 series allow for that? Not impossible if it has GPU bus access. But that was my question about its location. For future potential.
 

Eug

Lifer
Mar 11, 2000
23,749
1,281
126
Apple's Back-To-School Promotion is now live in several countries, like the US, Canada, Thailand, etc.

I just ordered an M4 iPad Pro. It's crazy that my fanless 11" 5.3 mm thick tablet will be the fastest computing device in my house by a huge margin.
 
Reactions: Mopetar

Doug S

Platinum Member
Feb 8, 2020
2,478
4,035
136
If Apple could mitigate delay by boosting the compute resources with full memory bandwidth access with even just LPDDR-9600 (1200 GB/s bandwidth on the Ultra) it would be a game changer.

Well they've gone to LPDDR5X-7500 (or whatever it is) with the M4 in the iPad Pro, it may be clocked faster in other M4/M4P/M4M implementations which are a bit less power/heat limited, and LPDDR6 is on the horizon for M6 if they wanted to be more aggressive in adopting newer memory standards in the future (which maybe they will be, if they have something they consider to be a valid reason to do so)
 

Doug S

Platinum Member
Feb 8, 2020
2,478
4,035
136
Geekerwan says they overclocked LPDDR5 instead of going with LPDDR5X, for latency reasons...

I wish there some source of LPDDR latencies available somewhere. I've always wondered how they compare, but other than knowing LPDDR5 is slower than DDR5 we don't really know the details. If LPDDR5X is slower (in wall clock time) than LPDDR5 that's news to me, but I guess that's one way it might deliver the promised power benefits.
 
Reactions: Tlh97 and SpudLobby

SpudLobby

Senior member
May 18, 2022
961
655
106
M4 DRAM latency did reduce yes? If so then that’s actually believable, but “overclocked” LPDDR5 is something I didn’t know existed, that seems weird. I would think some of the node stuff with LPDDR5x is also how it hits those data rates at acceptable power, but maybe there’s some compromise Apple worked out with Micron and also the power gains — or gains by going to 8500 MT/s — just weren’t worth it vs cost and latency for Apple.

Very weird still all around if true.
 

Doug S

Platinum Member
Feb 8, 2020
2,478
4,035
136
M4 DRAM latency did reduce yes? If so then that’s actually believable, but “overclocked” LPDDR5 is something I didn’t know existed, that seems weird. I would think some of the node stuff with LPDDR5x is also how it hits those data rates at acceptable power, but maybe there’s some compromise Apple worked out with Micron and also the power gains — or gains by going to 8500 MT/s — just weren’t worth it vs cost and latency for Apple.

Very weird still all around if true.

I wouldn't call it "overclocked", but "not JEDEC standard".

I imagine with iterations in DRAM processes over the last few years that at least some DRAM OEMs are able to produce LPDDR5 that can clock higher than the highest JEDEC speed of 6400 MT/s, just like they are able to produce DDR5 that clocks above whatever the fastest JEDEC speed is there. That latter is well known, because gamers/enthusiasts are always in the market for faster RAM.

I guess Apple was too, because they must have wanted more memory bandwidth for M4, but whether they chose this route because they didn't want to compromise on latency as LPDDR5X apparently would require, or a more prosaic reason like not having LPDDR5X controllers ready in time to make the M4 tape out, who knows.

This also explains why it had that odd -7700 speed or whatever it was, instead of -8533 which seems to be a widely available JEDEC standard LPDDR5X speed. It didn't make a lot of sense why they'd underclock LPDDR5X like that.
 
Reactions: Tlh97 and SpudLobby

SpudLobby

Senior member
May 18, 2022
961
655
106
I wouldn't call it "overclocked", but "not JEDEC standard".

I imagine with iterations in DRAM processes over the last few years that at least some DRAM OEMs are able to produce LPDDR5 that can clock higher than the highest JEDEC speed of 6400 MT/s, just like they are able to produce DDR5 that clocks above whatever the fastest JEDEC speed is there. That latter is well known, because gamers/enthusiasts are always in the market for faster RAM.

I guess Apple was too, because they must have wanted more memory bandwidth for M4, but whether they chose this route because they didn't want to compromise on latency as LPDDR5X apparently would require, or a more prosaic reason like not having LPDDR5X controllers ready in time to make the M4 tape out, who knows.
Yeah, actually also a good point. Possible
This also explains why it had that odd -7700 speed or whatever it was, instead of -8533 which seems to be a widely available JEDEC standard LPDDR5X speed. It didn't make a lot of sense why they'd underclock LPDDR5X like that.
Yeah, this makes sense to me. Good clarification on overlocked vs non-JEDEC generic/standard.
 

trivik12

Senior member
Jan 26, 2006
319
288
136
it would be weird to see M4 used in just Ipad Pros as this point. What a waste. I hope they release some Mac with this chip. Rumors of next update being just for MBP near end of the year. That would be 6-7 months after the iPad release.
 

Eug

Lifer
Mar 11, 2000
23,749
1,281
126
I think the M4 MBP will get 16/512 as base, however I still see 8/512 for the Air at its current price. Just doubtful they’ll move to 16/512 there.
12 GB is a decent possibility. In fact, that 12 GB prediction is one reason I’ve held off buying a new MB Air.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |