Discussion Apple Silicon SoC thread

Page 56 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,809
1,388
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

Bam360

Member
Jan 10, 2019
30
58
61
According to that reddit post, the 4800U @14.3W achieves a score of 6374, which isn't higher than the MBA at 9.2W (full package power). Unless you're referring to different results?

I am comparing MBP and Mini M1 vs MBA M1, but I'm not sure about the numbers anymore, as I see different scores in different sites, the problem is that there is no fan, so it depends on the test duration, the MBA reduces power consumption the longer the MT test takes to complete. There is this video where you can compare power consumption of the big cluster vs frequency when it throttles:


So probably 3-3.2GHz is not that inefficient for M1 in the curve, although efficiency obviously improves the lower the frequency. The good news is that this probably means there is a decent overhead to increase frequency, I don't see how 300-400MHz more, at the very least, isn't possible, of course maybe with double the power consumption, but absolutely feasible if Apple were to introduce something like a single core turbo mode, or a more fine tuned algorithm like Ryzen uses.
 
Last edited:

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
That's difficult to do on battery-powered devices.
And that depends on what you want to measure. If you're interested in CPU perf/W in particular, you'd rather exclude other sources of variation.
Measuring total laptop power usage is repeatable and done commonly - cf. notebookcheck.com's power consumption methodology. In a laptop, this is more important than package power or CPU power used, because any number of other areas can cause increased power usage. Laptops are largely not upgradeable, and so total power draw of the entire laptop is most important when considering battery life.

As a technical interest, I can see the interest in measuring CPU perf/W. But we don't need to in this case. It's absolutely clear that the M1 has a lead over anything else in perf/W and I'm don't see any reason to re-confirm that.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
I am comparing MBP and Mini M1 vs MBA M1, but I'm not sure about the numbers anymore, as I see different scores in different sites, the problem is that there is no fan, so it depends on the test duration, the MBA reduces power consumption the longer the MT test takes to complete. There is this video where you can compare power consumption of the big cluster vs frequency when it throttles:

I'm curious what the 4800U Lenovo laptop does after several subsequent CB23 MT runs on that 14.3W intelligent power scheme.
 

bigggggggg

Junior Member
Nov 27, 2020
18
12
41

4900HS + RTX 2060 Super vs MBP M1. It seems CPU performs really well (faster in lightroom). I don't know if the software uses some kind of specialized hardware, but it runs through Rosetta 2.

CB23 results (M1 vs 4900HS):
- 7686 vs 9988
- 13 watt total cluster (8 cpus) vs 53 peak consumption and 35 watt stable consumption.
 

biostud

Lifer
Feb 27, 2003
18,683
5,416
136

dmens

Platinum Member
Mar 18, 2005
2,274
959
136
Why m1 is beaten by x86 in single core benchmarks.

This is one of the dumbest things I have ever read and demonstrates an absolute ignorance of how SMT is actually implemented, specifically, what is replicated and what is shared for SMT.
 

jeanlain

Member
Oct 26, 2020
159
136
86
Why m1 is beaten by x86 in single core benchmarks.
It's a question of semantics. The distinction is between single-core and single-thread.
They say
benchmark vendors need to move to SMT-enabled single-core tests
This is silly. Single-thread tests are not designed to evaluate the performance of a core per se, but performance of a CPU running a given single-threaded task as fast as possible. Many tasks use a single main thread.
We could also say that, because the M1 has two types of cores, we should compare an SMT x86 core running two threads to a high-performance + a high-efficiency core.

BTW, I wonder how much the M1 performance cores could benefit from SMT, since they're quite wide.
 
Reactions: Tlh97 and Saylick

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Why m1 is beaten by x86 in single core benchmarks.
If the question is "how fast can the core do a single task" then the single-threaded benchmark is fine and is a fair representation.

I think you could make the argument that the uarch design decisions to support SMT use up transistor space that could be spent on logic to increase IPC for single-threaded tasks, but clearly for the target market of each chip, they've made design decisions that they feel best serve their customers.
 
Reactions: Tlh97 and Saylick

Bam360

Member
Jan 10, 2019
30
58
61
Not to mention the cringeworthy "exclusive" thing, there is nothing exclusive there, I remember reading this debate years ago, it has some merit because technically a core includes hyperthreading, however the purpose of the test is to assess performance when there is no parallelism at all, because there are still tasks that are not parallelizable and HT/SMT requires parallelization on the code (I think?).
Besides, it is evident the article is more interested in discrediting the single core performance of M1 instead of making a rational discussion about why the test should be changed to include HT/SMT.
 

teejee

Senior member
Jul 4, 2013
361
199
116
This is one of the dumbest things I have ever read and demonstrates an absolute ignorance of how SMT is actually implemented, specifically, what is replicated and what is shared for SMT.

I agree with you, he is inventing a new concept of ”single core” performance and let x86 run two threads but M1 only one and claim that is fair.

But I do believe that not having SMT is actually one of the most important success factors with Apples CPU. SMT used to be a big success, that is for sure. But with the very advanced ”high IPC cores” we have today it is probably becoming a burden that complicates the design and makes it more difficult to increase the IPC.
 

jeanlain

Member
Oct 26, 2020
159
136
86
4900HS + RTX 2060 Super vs MBP M1. It seems CPU performs really well (faster in lightroom). I don't know if the software uses some kind of specialized hardware, but it runs through Rosetta 2.
The Lightroom results are impressive. Almost too good to be true considering it's an x86 app.
 
Last edited:

nxre

Member
Nov 19, 2020
60
103
66
Qualcomm just announced the Snapdragon 888 > 1 X1 2,84Ghz + 3 A78 2,4Ghz + 4 A55 1,8Ghz.
Eh, I don't think we're going to see an ARM cpu competing with Apple M1 anytime soon. The X1 was meant to match A13 single core performance, but it was meant to peak at 3Ghz, at 2,84Ghz it likely will stay behind it. The A55 cores are also a tragedy, for a power efficient core they are next to useless, they have no power nor are they efficient. Maybe samsung can become competitive again now that they abandoned their custom cores, but afaik they have never made a laptop exynos cpu. I think the main problem is that making a new mask and design for bleeding edge processes is extremely expensive, and only makes sense if you expect to sell a lot of devices. Which is why we never saw a tablet-oriented chip on android side to compete with the AX chips, they just don't sell the volume to make up the cost.
Microsoft seems interested in moving to ARM on their surface devices, but i don't think they have that big volume that would justify a vendor making a custom CPU for them. Samsung makes their own notebooks but idk how big of a market share they have. So yeah, I think we are stuck with AMD and Intel for the near future for non-Apple laptops.
I wonder who will be the first to make a dedicated ARM laptop chip (that is not a rebranded phone cpu).
 
Reactions: Etain05 and Viknet

coercitiv

Diamond Member
Jan 24, 2014
6,630
14,065
136
Why m1 is beaten by x86 in single core benchmarks.
Wccftech has no idea what they're talking about. The mere fact that this idea was never brought up until Apple introduced the M1 should highlight just how ridiculous it is.

I mean, let's run only half a thread on Bulldozer cores for floating-point benchmarks since the FPU doesn't really belong to a core.
 

bigggggggg

Junior Member
Nov 27, 2020
18
12
41
The Lightroom results are impressive. Almost too good to be true considering it's an x86 app.
Well, x86 apps running through Rosetta could use specialized hardware from what i understand, but i don't know if this is the case, considering isn't doing any particular task on that images. And i found out that lightroom can take advantage from CUDA to accelerate tasks, so if that would be the case, the 4900HS + 2060 Super would have won certainly.
Anyway, Anandtech analysis on multi-core performance shows that in some tasks the M1 SoC is faster thatn 4900HS. Particularly is faster on average than 4900HS in floating point tasks, so it could be true that in certain real world tasks the M1 is faster than 4900HS.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,337
5,456
136
Qualcomm just announced the Snapdragon 888 > 1 X1 2,84Ghz + 3 A78 2,4Ghz + 4 A55 1,8Ghz.
...
I wonder who will be the first to make a dedicated ARM laptop chip (that is not a rebranded phone cpu).

That is strange. Just a single X1 core makes it look like they are including it just to win single core benchmarks. I guess X1 is too power hungry for phones even when they have efficiency cores to fall back on. Ouch.

A purpose built laptop chip should include 4 X1 cores. If they just jam an 888 with only 1 X1 core, in laptops, they aren't really trying.
 

Bam360

Member
Jan 10, 2019
30
58
61
That is strange. Just a single X1 core makes it look like they are including it just to win single core benchmarks. I guess X1 is too power hungry for phones even when they have efficiency cores to fall back on. Ouch.

A purpose built laptop chip should include 4 X1 cores. If they just jam an 888 with only 1 X1 core, in laptops, they aren't really trying.

Yeah, obviously what they are trying to do is close the gap as much as possible in single thread performance while still tying or winning in multi threaded tasks (at least compared to A14), thanks to having more cores that, especially the A78 on a newer node, sip power. Besides, the key thing here is that it's much cheaper to cram 3 A78 cores instead of 3 X1 cores, because they need much less area, X1 may be more power hungry, like 50% more power at same GHz, but you would probably get lower power at the same performance by downclocking the X1. I think it is the right choice for Smartphone chips, maybe 2x X1 max, but 4 makes no sense for the added cost and the ridiculously low thermal envelope that a Smartphone allows, we are talking sub 5W. For laptops however, yeah, 4 X1 at the very least, and Cortex A55 just has to die, too slow and not really that efficient for the little performance it has.
 

Doug S

Platinum Member
Feb 8, 2020
2,759
4,697
136
This is one of the dumbest things I have ever read and demonstrates an absolute ignorance of how SMT is actually implemented, specifically, what is replicated and what is shared for SMT.

Not only that, but there is still plenty of code that is single thread, and doesn't benefit at all from multiple cores - or even if it does, there is one thread that takes up the lion's share of the CPU time meaning overall performance is still limited to some degree by single thread performance.
 

thunng8

Member
Jan 8, 2013
167
72
101
That is strange. Just a single X1 core makes it look like they are including it just to win single core benchmarks. I guess X1 is too power hungry for phones even when they have efficiency cores to fall back on. Ouch.

A purpose built laptop chip should include 4 X1 cores. If they just jam an 888 with only 1 X1 core, in laptops, they aren't really trying.
First Geekbench results are here of the 888 compared to A12:


Seems like it eeks past the A12 for single core performance (still well below A13)

I'll wait for more results to be certain (this could be an abnormal result)
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,337
5,456
136
Windows ARM virtualized on M1 Mac:


Highlights for me:

M1 DESTROYS Qualcomm based ARM Windows performance. Almost Twice as fast at x86 geekbench. Not a surprise though.

But this is a surprise: x86 Geekbench runs faster on M1 Macs under Virtualized ARM Windows with Microsoft x86 emulation (ST ~1500), than it does in MacOS using Rosetta 2 (ST ~1300).

Maybe Microsoft x86 emulation is not so shabby after all.
 
Reactions: Mopetar

bigggggggg

Junior Member
Nov 27, 2020
18
12
41
But this is a surprise: x86 Geekbench runs faster on M1 Macs under Virtualized ARM Windows with Microsoft x86 emulation (ST ~1500), than it does in MacOS using Rosetta 2 (ST ~1300).
It seems those results come from native Geekbench 5 for WoA, not from his "windows-emulated" version. The guy choose AArch64 from the menu.
 
Last edited:
Reactions: Viknet

Heartbreaker

Diamond Member
Apr 3, 2006
4,337
5,456
136
It seems that results come from native Geekbench 5 for WoA, not from his "windows-emulated" version. The guy choose AArch64 from the menu.

Ok, that explains it. I thought he said it was emulated. I wonder if he comparing to emulated on the Qualcomm HW.

Edit:
Looks like those were Qualcomm native results as well. Which shows how far ARM competitors have to go. Heck Qualcomm and Microsoft were supposed to have worked together to achieve that, and it's the only ARM chip Microsoft currently allows to license ARM-Windows.

M1 even while running virtualized ARM-Windows absolutely crushes the purpose built Surface Pro X.

They need a new chip with at least 4 Cortex X1 cores, just to reduce that down to just soundly thrashed, instead of crushed.
 
Last edited:
Reactions: bigggggggg

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
It's not just the architecture but the whole platform from SOC to drivers/API to the OS. Apple controls it all and there is a very limited number of configs. Its essentially like it used to be with consoles. If you control everything you can extract far more from the hardware especially dedicaded hardware that gets called by the APIs automatically without developer having to do anything special. Bascially AMDs fusion dream that never really came to fruition, instead of AVX the gpu could be used transparently for matrix calculations. But that simply doesn't work in the x86 world.

I agree, that Apple's top down method is a lot more efficient (your analogy with the consoles is a good one), but when it comes to raw power, the PC has the advantage.......just like it does with consoles.

And you saw that with the benchmarks that @senttoschool posted with the Intel Mac Mini and the 5700XT. The M1 is obviously not going to compare in terms of hardware accelerated performance with a much more powerful Ampere or RDNA2 class GPU.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Where are they "so far behind"? Unless you are comparing them against PCs with 8 big cores (and probably SMT enabled as well) they aren't far behind.

Yes, I was comparing them against the stereotypical big multicore x86-64 CPUs, which I realize is not totally fair. Then again, we aren't going to get a truly fair comparison until we see AMD's offerings on 5nm as well. At any rate, I see the M1 as a legit 8 core CPU. The big little distinction is irrelevant to me, as those icestorm cores are CPU cores that do actual work.

Are we going to come up with excuses when Golden Cove launches and it loses to the 5950x in multicore, because the "little cores" aren't real cores?

And the "massacre" would come when new generations of the M* come with more big cores, eventually scaling up to the Mac Pro in a couple years. I expect at least 32 big cores in the high end there, maybe more. You're going to have to be comparing with some awfully big (and expensive) x86 hardware to put that "far behind".

This I would love to see. Like I said before, I may not like Apple, but I cannot deny that they do push the industry forward in many ways. That said, I would imagine that a theoretical M* CPU with would probably look completely different than the current M1 in terms of its cache structure and other features, so using the M1 to extrapolate a more advanced M* class CPU seems like it would be unproductive.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |