Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Doug S · Dec 13, 2023

FlameTail said:
Yaya N3E is better but if everyone moves to N3E, then what becomes of the N3B production lines? Are they going to just sit and rot? Which is why I think TSMC would be incentivised to have Intel using N3B.

No they don't sit and rot, they would be repurposed to provide additional N3E capacity. The process flow is different but the equipment is all the same.

Doug S · Dec 13, 2023

DrMrLordX said:
That's why I asked about it in the first place. With yields like that, it's a bit concerning that Intel would finally start taking N3B wafers.

Well those were the reported yields when they started mass production for Apple, and that was after a year of risk production. Maybe they have a breakthrough that allows them to significantly increase yields, but it seems more likely that any improvement would be minor and N3B would have yields far below what TSMC normally considers acceptable for mass production.

Without knowing the details of the deal between TSMC and Intel, which we never will, all we'll ever be able to do is guess. Heck I don't think we know for SURE whether Intel will be using N3B, and even after they start shipping chips unless TSMC or Intel makes a clear statement what they are using we will never know unless someone like TechInsights gets out the electron microscope or whatever to find out.

DrMrLordX · Dec 13, 2023

Doug S said:
Heck I don't think we know for SURE whether Intel will be using N3B, and even after they start shipping chips unless TSMC or Intel makes a clear statement what they are using we will never know unless someone like TechInsights gets out the electron microscope or whatever to find out.

Also a valid point, and it'll probably be well-discussed by the time Arrow Lake finally hits the scene. Among other things.

mikegg · Dec 14, 2023

FlameTail said:
35 TOPS INT8?

Doug S said:
Yeah I'm skeptical of anyone's claims for "TOPS" since there is no standard way of measuring it. It is worse than Drystone MIPS or megahertz as a performance indicator.

Apple always quotes their TOPS performance figures for their NPUs in INT16.

Doug S · Dec 14, 2023

senttoschool said:
Apple always quotes their TOPS performance figures for their NPUs in INT16.

So A17P really has double the NPU resources of M3? That seems like an odd decision given how small the NPU is.

SpudLobby · Dec 14, 2023

Doug S said:
So A17P really has double the NPU resources of M3? That seems like an odd decision given how small the NPU is.

The theory I’ve heard is they are more strained for engineering resources than realized basically and so made some cutoffs. It is odd yeah

FlameTail · Dec 14, 2023

Haha. I am chuckling now.

Intel and AMD are trying to one up each other's AI capabilities by boasting who has the more TOPS, while a smartphone chip casually has more TOPS than either of them.

mikegg · Dec 15, 2023

Doug S said:
So A17P really has double the NPU resources of M3? That seems like an odd decision given how small the NPU is.

Yes. The specs are easy to find on Apple's website. Apple cut the NPU in half for M3 series.

The NPU is not that small in die area. It's about 4 e-cores. It's probably a concious decision to reduce cost on the M series. iPhones need more NPU power anyway due to the plethora of ML tasks such as camera on the iPhone and the need to save power. Meanwhile, Macs don't have much ML work - that is until local LLM, Stable Diffusion, etc become common. I'm sure Apple made the decision to cut the NPU in half for M3 series before the explosion of LLMs and Gen AI.

mikegg · Dec 15, 2023

FlameTail said:
Haha. I am chuckling now.

Intel and AMD are trying to one up each other's AI capabilities by boasting who has the more TOPS, while a smartphone chip casually has more TOPS than either of them.

Intel and AMD's AI accelerators are mostly just marketing speak right now. Windows has very few ML tasks that can be offloaded to an NPU. Even macOS, which Apple fully controls, doesn't have that much except for local Siri and some small tasks like organizing photos.

I think it's going to be a few years at least before NPUs on laptops/desktops become useful.

Eug · Dec 15, 2023

senttoschool said:
Intel and AMD's AI accelerators are mostly just marketing speak right now. Windows has very few ML tasks that can be offloaded to an NPU. Even macOS, which Apple fully controls, doesn't have that much except for local Siri and some small tasks like organizing photos.

I think it's going to be a few years at least before NPUs on laptops/desktops become useful.

Hmmm... Photo organization and background processing in Apple Photos on the Mac sometimes can take far, far too long even on M-series chips. I don't know how much Photos uses the NPU though.

I attribute it to problematic programming though. For example, to just export native images from Photos, if I to export say 10000 photos, the machine will gobble up all the memory, try to do the task for extended period, and then crash. Memory leak? Remember, there is no image processing here at all (no NPU), since it's just a straight export of the originals.

OTOH, there are cheap third party apps that will do the export from the Photos database quickly with minimal memory usage.

FlameTail · Dec 16, 2023

Both Apple M2 and Radeon 780M have upto 100 GB/s of memory bandwidth (LPDDR5-6400, 128 bit)

Yet the Apple GPU is a bit more powerful,

I believe the reason is the tile architecture of Apple's GPU, which allows them to pack in more GPU performance to the same memory bandwidth.

I guess this applies to Qualcomm as well.

igor_kavinski · Dec 16, 2023

Eug said:
I don't know how much Photos uses the NPU though.

Does that app have the ability to recognize people's faces, let you assign names to each face and then let you search all photos of a particular person? I think that functionality would depend heavily on the NPU and could be much faster than doing it on the CPU.

igor_kavinski · Dec 16, 2023

FlameTail said:
Yet the Apple GPU is a bit more powerful,

That's a synthetic test.

In reality, it gets bottlenecked somehow.

FlameTail · Dec 16, 2023

Is control running natively there?

igor_kavinski · Dec 16, 2023

FlameTail said:
Is control running natively there?

Probably using the Apple Game Porting Toolkit.

Apple would need to release a cheap game console (under $600) to get studios to take its graphics horsepower seriously. Most gamers don't get a Macbook for gaming when there are superior and cheaper options available in the same price range with far more RAM and storage.

mikegg · Dec 16, 2023

igor_kavinski said:
That's a synthetic test.

View attachment 90524

In reality, it gets bottlenecked somehow.

Almost certain because the game is not optimized for ARM and Metal. Even if it uses Metal, it's probably using MoltenVK which translate Vulcan to Metal.

eek2121 · Dec 16, 2023

FlameTail said:
Hmm. Did you read the semianalysis article I linked?

Average cost? So then the price-per-wafer is not fixed, and varies?

Then what the article is saying is very relevant. More lithography time significantly increase the lithography cost, and in their worst-case-scenarior example, the lithography cost is up by $2000!

TSMC charges based on machine time and volume. Each contract is also different, so two companies with the exact same requirements may be charged very different prices.

DrMrLordX said:
Nah it's overused on TikTok and elsewhere.

💀☠️

senttoschool said:
Intel and AMD's AI accelerators are mostly just marketing speak right now. Windows has very few ML tasks that can be offloaded to an NPU. Even macOS, which Apple fully controls, doesn't have that much except for local Siri and some small tasks like organizing photos.

I think it's going to be a few years at least before NPUs on laptops/desktops become useful.

Windows 11 has some AI stuff (generative AI in paint), but Windows 12 will change everything.

Windows 12 will also probably require a subscription for the advanced stuff. There were rumors of tying it to office 365, but I believe Microsoft backtracked on that at some point.

It will be a long time before most PCs without giant GPUs can perform complex tasks without a cloud GPU farm supporting somewhere.

Doug S · Dec 16, 2023

senttoschool said:
Almost certain because the game is not optimized for ARM and Metal. Even if it uses Metal, it's probably using MoltenVK which translate Vulcan to Metal.

True that most GPU benchmarks tend to understate what Apple's GPU is really capable of because very little is truly natively designed and tuned for Metal and TBDR, but that just reflects reality.

What would be interesting to see would be benchmarks that compare results using a version that's highly tuned for Apple's GPU and also a version ported using MoltenVK with no special tuning. How much does Apple lose? Is it 5% or is it 50%? I have no idea.

FlameTail · Dec 16, 2023

They are charging $800 for 64 GB worth of memory.

Thonk.

mikegg · Dec 16, 2023

eek2121 said:
Windows 11 has some AI stuff (generative AI in paint), but Windows 12 will change everything.

Windows 12 will also probably require a subscription for the advanced stuff. There were rumors of tying it to office 365, but I believe Microsoft backtracked on that at some point.

It will be a long time before most PCs without giant GPUs can perform complex tasks without a cloud GPU farm supporting somewhere.

FYI, the NPUs are probably way slower than GPUs at ML acceleration. The advantage for NPUs is just efficiency. On Windows, laptops and desktops don't need to conserve as much as energy as phones. That's why Apple cut the NPU in half for M3. It's just not needed right now.

When LLMs or Gen AI gets big, I could see the NPUs becoming huge as well. That's why I wrote this: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...us-with-a-cpu-and-gpu-attached-to-it.2611174/

FlameTail · Dec 17, 2023

Exynos 2200. You can clearly see the tag logic for the SLC marked here.

Where is the tag logic for the SLC of the A17 Pro? I guess it's the structure in between the two SLC slices?

Doug S · Dec 17, 2023

Interesting that A17P is 8.18mm x 12.69mm, considering that the reticle is 33 x 26 - 8 dies just barely but quite neatly fit.

Both A15 and A16 were just enough larger that 8 dies could not fit - pretty sure only 6 would but I don't have their exact dimensions. I wonder if knowing about N3B's issues informed their decision on how efficiently to fill the reticle and minimize scans per wafer?

FlameTail · Dec 17, 2023

Doug S said:
Interesting that A17P is 8.18mm x 12.69mm, considering that the reticle is 33 x 26 - 8 dies just barely but quite neatly fit.

Yup.

33 × 26 = 1 chip
16.5 × 13 = 4 chips
11 × 13 = 6 chips
8.25 × 13 = 8 chips

11 × 13 = 143 mm².

M3 estimate size is 146 mm². This means it just barely couldn't fit 6 dies per reticle!

Doug S said:
Both A15 and A16 were just enough larger that 8 dies could not fit - pretty sure only 6 would but I don't have their exact dimensions. I wonder if knowing about N3B's issues informed their decision on how efficiently to fill the reticle and minimize scans per wafer?

I was asking a question related to that in the foundry thread. TSMC has fixed price per wafers, right? So if a chip takes more time in lithography, the cost doesn't pass on to the customer, does it?

Doug S · Dec 17, 2023

FlameTail said:
Yup.

33 × 26 = 1 chip
16.5 × 13 = 4 chips
11 × 13 = 6 chips
8.25 × 13 = 8 chips

11 × 13 = 143 mm².

M3 estimate size is 146 mm². This means it just barely couldn't fit 6 dies per reticle!

I was asking a question related to that in the foundry thread. TSMC has fixed price per wafers, right? So if a chip takes more time in lithography, the cost doesn't pass on to the customer, does it?

I don't know whether "fixed price per wafer" is true. We see claimed prices now and then but it isn't like those come directly from TSMC, and we don't know if there are some caveats to those numbers. I wouldn't be surprised if those who know the full details of TSMC's pricing are under NDA, but if anyone can comment please do.

Whether it "takes more time in lithography" depends on more than the overall fill percentage. Recall that EUV machines are scanning across the mask with a 33mm wide slit so I have to wonder how much of a difference it makes if you scan the full 26mm long path possible versus scanning only say 18mm. The "scan distance" to cover the full wafer would be basically the same - it would have to step/reset to start another scan more often but I'm pretty sure that's much quicker than the actual scanning (because the speed at which that occurs depends on getting enough photons onto the wafer, not how fast the tool can move the slit) So all in all forcing it to step/reset more often would impact total lithography time a lot less than if you didn't utilitize the full slit width, because that underutilization of the slit width would increase your "scan distance".

If I'm right about that then it would be much more important to fit as close to 33mm as possible, so you'd see designers trying to insure their chip dimensions were divisible into 33mm and consider it less important that they are divisible into 26. That's why that article talking about the problems of high NA mentioned this - because with high NA litho you'll have to insure your chips are divisible into 16.5, which other than for chips much smaller than 100 mm^2 mean you basically have 8.25 and 16.5 as your only choices for one of the dimensions if you want to maximize your litho efficiency.

Eug · Dec 17, 2023

igor_kavinski said:
Does that app have the ability to recognize people's faces, let you assign names to each face and then let you search all photos of a particular person? I think that functionality would depend heavily on the NPU and could be much faster than doing it on the CPU.

Yes, Photos does this, both on the Mac and on the iPhone/iPad.

Discussion Apple Silicon SoC thread

Lifer

Platinum Member

Platinum Member

Lifer

Golden Member

Platinum Member

Senior member

Diamond Member

Golden Member

Golden Member

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Golden Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Lifer