Discussion Apple Silicon SoC thread

Page 57 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,752
1,309
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
The very fact that you can say such a thing "If it's doing work, it should be counted!" shows how UTTERLY clueless you are.
Quick question. How many CPUs do you think exist on an M1? 8? 15? 50?

According to Apple themselves, it's 8. But I suppose in your "supreme knowledge" you have probably convinced yourself that you know more about the M1 than the people that actually designed it.

The 8‑core CPU in M1 is by far the highest‑performance CPU we’ve ever built. Designed to crush tasks using the least amount of power, M1 features two types of cores: high performance and high efficiency. So from editing family photos to exporting iMovie videos for the web to managing huge RAW libraries in Lightroom to checking your email, M1 blazes right through it all — without blazing through battery life.

Apple M1

Like I said, you can choose to be on the team that cares about *understanding* the technology. Or you can choose to be on the team that thinks scoring points with words counts as an important achievement.

I don't think being a zealot is going to enhance my understanding of technology
 
Last edited:
Reactions: lobz and Tlh97

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Now here's an interesting thought. Despite the almost universal derision that many of us here have with WCCFTech, they posted an interesting article today.

Exclusive: Why Apple M1 Single "Core" Comparisons Are Fundamentally Flawed (With Benchmarks) (wccftech.com)

The article claims that unlike the M1, modern SMT capable x86-64 cores cannot be completely saturated when running purely single threaded code. Therefore, any comparative benchmarks between the M1 and say, Tiger Lake or Zen 3 in single threaded benchmarks are not accurate.

They did some benchmarks to prove their point. They ran Cinebench R23, in both pure single core mode, and single core mode with one additional thread running, effectively being 1 core + SMT. This is their result. As you can see, performance increases by a significant amount, allowing the much older Zen 2 and Skylake cores to beat the M1 in CB R23.

They did the same thing in Geekbench to a similar effect with a 9980xe, with performance increasing by about 19%.

System manufacturer System Product Name - Geekbench Browser

So what do you guys think? Do you believe these assertions have any merit? To me it makes sense. Modern x86-64 CPUs are designed to be run with SMT, so it makes sense that they need an additional thread in "single threaded mode" to become completely saturated.



This is the new score with Tiger Lake. Tiger Lake sees a healthy boost, putting it significantly ahead of the M1.

 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Now here's an interesting thought. Despite the almost universal derision that many of us here have with WCCFTech, they posted an interesting article today.

Exclusive: Why Apple M1 Single "Core" Comparisons Are Fundamentally Flawed (With Benchmarks) (wccftech.com)

The article claims that unlike the M1, modern SMT capable x86-64 cores cannot be completely saturated when running purely single threaded code. Therefore, any comparative benchmarks between the M1 and say, Tiger Lake or Zen 3 in single threaded benchmarks are not accurate.

They did some benchmarks to prove their point. They ran Cinebench R23, in both pure single core mode, and single core mode with one additional thread running, effectively being 1 core + SMT. This is their result. As you can see, performance increases by a significant amount, allowing the much older Zen 2 and Skylake cores to beat the M1 in CB R23.

They did the same thing in Geekbench to a similar effect with a 9980xe, with performance increasing by about 19%.

System manufacturer System Product Name - Geekbench Browser

So what do you guys think? Do you believe these assertions have any merit? To me it makes sense. Modern x86-64 CPUs are designed to be run with SMT, so it makes sense that they need an additional thread in "single threaded mode" to become completely saturated.



This is the new score with Tiger Lake. Tiger Lake sees a healthy boost, putting it significantly ahead of the M1.

I'm not sure there's a "right" answer.

Intel and AMD have chosen to have their core do SMT and designed around it. Not utilizing the SMT decreases the amount of work the core can do artificially, because it's not fully exercising the core's abilities.

Apple have chosen not to have their core do SMT, and instead used those transistors to focus on other areas.

Both sides on this argument have merit. I guess if we wanted to compare "how much work can Apple's core do compared to AMD or Intel's core" then we have to include the design decision to have SMT enabled on the chips that can do it. But if we are wondering which chip can handle a single-threaded task the fastest, that's another question.

Someone else mentioned that we could consider running Zen3 core with SMT on --- against an M1 core with a Firestorm and Icestorm enabled, and see how things compare. But they're not the same thing, they use different die areas (that is, the die area for a single Zen3 core running with SMT on equals that of the core running with SMT off, whereas Firestorm alone has x die area, whereas Firestorm + Icestorm has x + y die area). Additionally, the Firestorm and Icestorm have different memory subsystem designs, etc., which allow further increases in cache availability which is important especially on L1/L2. We already know that 2 cores is better than 1 core with SMT; why would we re-hash that discussion?
 
Last edited:
Reactions: Tlh97 and Carfax83

IvanKaramazov

Member
Jun 29, 2020
56
102
66
So what do you guys think? Do you believe these assertions have any merit? To me it makes sense. Modern x86-64 CPUs are designed to be run with SMT, so it makes sense that they need an additional thread in "single threaded mode" to become completely saturated.

I must admit it makes no sense to me. The only possible reason to even care about ST performance is because many tasks are not optimized for multiple threads, or else are so sequential they’re run best on a single thread. It’s not too much of a simplification to say that the only two types of performance that matter are 1) total multithreaded performance of a processor and 2) the speed at which a core can run a single threaded instruction. SMT doesn’t run any single instruction faster, it simply uses an underutilized core to run a second thread. In other words, SMT can aid in boosting the first type of performance above but is largely useless for the second. Why would one even be interested in the performance of two threads run on the single core? The “adjusted” scores for the x86 chips tell you literally nothing about their capacity to execute a single thread performantly.

Am I missing something here? I know I’m an amateur with this stuff, but this article strikes me as deeply missing the point.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
I must admit it makes no sense to me. The only possible reason to even care about ST performance is because many tasks are not optimized for multiple threads, or else are so sequential they’re run best on a single thread. It’s not too much of a simplification to say that the only two types of performance that matter are 1) total multithreaded performance of a processor and 2) the speed at which a core can run a single threaded instruction. SMT doesn’t run any single instruction faster, it simply uses an underutilized core to run a second thread. In other words, SMT can aid in boosting the first type of performance above but is largely useless for the second. Why would one even be interested in the performance of two threads run on the single core? The “adjusted” scores for the x86 chips tell you literally nothing about their capacity to execute a single thread performantly.

Am I missing something here? I know I’m an amateur with this stuff, but this article strikes me as deeply missing the point.
My thoughts:

This whole conversation could be simply-solved by just running a diverse set of benchmarks and leaving it at that. When you go to buy a CPU, you find the one that runs the programs you want the fastest. That's it.

However irrelevant, though, it's still fun to find out who's the winner.

There are innumerable things you can evaluate:

1. Fastest CPU
2. Fastest core
3. Fastest handling of a single-threaded activity (1T)
4. Fastest handling of a multi-threaded activity (which you could further restrict or release, from 2T ... nT)

The chip with the fastest handling of a single-threaded activity therefore is not necessarily the chip with the core that can do work the fastest, unless the workload is strictly single-threaded. Similarly, the chip with the fastest handling of a multithreaded activity is not necessarily the chip with the core that can do work the fastest. And for each of those scenarios, you can further delineate based on power consumption. Just with these examples, then, there are 8+ different test scenarios.

Which one matters to the real world? Probably just #1 -- for the benchmarks you care about for your workload. And #1, with power consumption, if you're on a laptop.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Both sides on this argument have merit. I guess if we wanted to compare "how much work can Apple's core do compared to AMD or Intel's core" then we have to include the design decision to have SMT enabled on the chips that can do it. But if we are wondering which chip can handle a single-threaded task the fastest, that's another question.

This is a great point, and emphasizes the differences in platforms that both architectures attempt to tackle. Multithreaded code is of course much more prevalent in desktop/workstation/server applications than mobile, which is presumably why Intel and AMD have designed their CPUs with SMT and Apple have not. Also, since both Intel and AMD use the same core across multiple platforms including mobile, it stands to reason why their CPUs would never be as energy efficient as Apple's as the focus just isn't as intense in that regard.

Which further begs the question, of whether the mobile focused M1 can ever be accurately compared against an x86-64 desktop chip to begin with. In the end I suppose it doesn't really matter. Most people are always going to get the chip/product that gives them the best performance at an acceptable price.

Someone else mentioned that we could consider running Zen3 core with SMT on --- against an M1 core with a Firestorm and Icestorm enabled, and see how things compare. But they're not the same thing, they use different die areas (that is, the die area for a single Zen3 core running with SMT on equals that of the core running with SMT off, whereas Firestorm alone has x die area, whereas Firestorm + Icestorm has x + y die area). Additionally, the Firestorm and Icestorm have different memory subsystem designs, etc., which allow further increases in cache availability which is important especially on L1/L2. We already know that 2 cores is better than 1 core with SMT; why would we re-hash that discussion?

I wouldn't mind seeing a Zen 3 based 4 core CPU with SMT in a multithreaded application, just for shits and grins. But I agree with you, as it would be mostly unproductive.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Am I missing something here? I know I’m an amateur with this stuff, but this article strikes me as deeply missing the point.

I suppose the point the author was making was that since modern x86-64 CPUs are all SMT capable, they cannot be fully saturated in a purely single threaded environment and thus it is inaccurate to say that the M1 has a higher single core performance than many x86-64 desktop CPUs. So to the author, there is a distinction between single thread and single core performance. And this is probably correct, because as I alluded to above, Intel and AMD design their CPUs for various platforms to include desktop, workstation and servers in addition to mobile. This is in direct contrast to Apple which focuses on mobile entirely, so outfitting their cores with SMT might be counterproductive.

One big question I have is, are there any taxing purely single threaded applications left? The most popular and common desktop applications by far are browsers, and every single one of them is multithreaded these days. It seems to me that single threaded performance is glorified by Apple proponents, but when was the last time you saw a performance intensive application running on a single thread?

High single threaded performance is only important in the context that you can achieve higher overall throughput . But single threaded performance in and of itself is to me quite irrelevant because I don't know of any high performance application that runs off a single thread.

If there are applications like that, I definitely want to know.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
I suppose the point the author was making was that since modern x86-64 CPUs are all SMT capable, they cannot be fully saturated in a purely single threaded environment and thus it is inaccurate to say that the M1 has a higher single core performance than many x86-64 desktop CPUs. So to the author, there is a distinction between single thread and single core performance. And this is probably correct, because as I alluded to above, Intel and AMD design their CPUs for various platforms to include desktop, workstation and servers in addition to mobile. This is in direct contrast to Apple which focuses on mobile entirely, so outfitting their cores with SMT might be counterproductive.

One big question I have is, are there any taxing purely single threaded applications left? The most popular and common desktop applications by far are browsers, and every single one of them is multithreaded these days. It seems to me that single threaded performance is glorified by Apple proponents, but when was the last time you saw a performance intensive application running on a single thread?

High single threaded performance is only important in the context that you can achieve higher overall throughput . But single threaded performance in and of itself is to me quite irrelevant because I don't know of any high performance application that runs off a single thread.

If there are applications like that, I definitely want to know.
With respect to the M1, they handle the multi-threadedness of browsers and other light loads also with the efficiency cores and fast ramp-up, combined with massive single-thread performance. IIRC much like games, browsers lean heavily on the first core (though I'm having trouble finding a source on that right now - admittedly, not looking too hard at the moment though!). I can't imagine too many browser tasks that would tax a 3300X, let alone a 4+4 M1. Plus, SMT costs more from a power standpoint than the M1's efficiency cores, which, when all four are 99+% active, consume a total of 1.32W.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,262
5,259
136
I suppose the point the author was making was that since modern x86-64 CPUs are all SMT capable, they cannot be fully saturated in a purely single threaded environment and thus it is inaccurate to say that the M1 has a higher single core performance than many x86-64 desktop CPUs. So to the author, there is a distinction between single thread and single core performance. And this is probably correct, because as I alluded to above, Intel and AMD design their CPUs for various platforms to include desktop, workstation and servers in addition to mobile. This is in direct contrast to Apple which focuses on mobile entirely, so outfitting their cores with SMT might be counterproductive.

One big question I have is, are there any taxing purely single threaded applications left? The most popular and common desktop applications by far are browsers, and every single one of them is multithreaded these days. It seems to me that single threaded performance is glorified by Apple proponents, but when was the last time you saw a performance intensive application running on a single thread?

High single threaded performance is only important in the context that you can achieve higher overall throughput . But single threaded performance in and of itself is to me quite irrelevant because I don't know of any high performance application that runs off a single thread.

If there are applications like that, I definitely want to know.

Since the author is at WCCFtech, I think the point was click bait riding on recent Apple M1 popularity. "Exclusive everyone but me is doing benchmarks wrong! M1 really sucks! Click to see important details!"

The point of ST benchmarks was never to saturate anything. IBM has a CPU with up to 8 way SMT, would you consider only appropriate to run tests with a minimum of 8 Threads??

Between ST, full MT benchmarks and running real world applications more than cover all the important cases and that is already done.

There is no real need to suddenly run 2 thread specific benchmarks, it really won't tell you anything you don't get from the currently done testing.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Since the author is at WCCFtech, I think the point was click bait riding on recent Apple M1 popularity.

The point of ST benchmarks was never to saturate anything. IBM has a CPU with up to 8 way SMT, would you consider only appropriate to run tests with a minimum of 8 Threads??

Between ST, full MT benchmarks and running real world applications more than cover all the important cases and that is already done.

There is no real need to suddenly run 2 thread specific benchmarks, it really won't tell you anything you don't get from the currently done testing.
A 2-threaded benchmark, an 8-threaded benchmark, and a 1-threaded benchmark are all equally valid.

They are all only useful practically if you know your use case uses x number of threads and ONLY x number of threads, and there doesn't exist an analogous real-world benchmark for that use case.

But I disagree that the 1-core (rather than 1-thread) benchmark won't tell you anything you don't get from the current testing. It will tell you who has designed the core that can process a given amount of work the fastest. That is an importantly different question than asking who has designed the core that can process a single-threaded workload the fastest.
 

thunng8

Member
Jan 8, 2013
165
69
101
Now here's an interesting thought. Despite the almost universal derision that many of us here have with WCCFTech, they posted an interesting article today.

Exclusive: Why Apple M1 Single "Core" Comparisons Are Fundamentally Flawed (With Benchmarks) (wccftech.com)

The article claims that unlike the M1, modern SMT capable x86-64 cores cannot be completely saturated when running purely single threaded code. Therefore, any comparative benchmarks between the M1 and say, Tiger Lake or Zen 3 in single threaded benchmarks are not accurate.

They did some benchmarks to prove their point. They ran Cinebench R23, in both pure single core mode, and single core mode with one additional thread running, effectively being 1 core + SMT. This is their result. As you can see, performance increases by a significant amount, allowing the much older Zen 2 and Skylake cores to beat the M1 in CB R23.

They did the same thing in Geekbench to a similar effect with a 9980xe, with performance increasing by about 19%.

System manufacturer System Product Name - Geekbench Browser

So what do you guys think? Do you believe these assertions have any merit? To me it makes sense. Modern x86-64 CPUs are designed to be run with SMT, so it makes sense that they need an additional thread in "single threaded mode" to become completely saturated.



This is the new score with Tiger Lake. Tiger Lake sees a healthy boost, putting it significantly ahead of the M1.

It is quite complicated. For many end user computing tasks, a single thread can dominate the performance characteristics of the application - so single threaded performance is still very relevant. For many applications, where it is multithreaded, the SMT score is quite a useful metric. But keep in mind Apple is heavily focused on performance/watt, so example Apple can field 2 or more high performance cores (especially in comparison with Intel) within the same power envelope compared to a single SMT-2 core. I suspect we will soon see this with the upcoming 'm1x' where it will start dominating the mid-high end laptops in benchmarks.

FYI, if you consider the best performance per core - IBM's Power9, first released way back in 2017 is still the king on optimized benchmarks for the platform and there are many examples where it still far outpaces any x86 processor in commercial server type benchmarks. It can implement up to SMT-8. The Power9 however, has significantly lower single thread performance compared to AMD or Intel. IBM gets approx 3X speedup going from single thread to SMT-8 on a single core.

e.g. a POWER9 40 core system performance is still higher than the latest 64 core EPYC in specint_rate (475 vs 385) and specfp_rate

and for SAP both running db2 10.5:
192 core power9 - 205,000 users

128 core EPYC - 59,000 users


And POWER10 to be released next year should even extend the lead.
 
Last edited:

Heartbreaker

Diamond Member
Apr 3, 2006
4,262
5,259
136
A 2-threaded benchmark, an 8-threaded benchmark, and a 1-threaded benchmark are all equally valid.

They are all only useful practically if you know your use case uses x number of threads and ONLY x number of threads, and there doesn't exist an analogous real-world benchmark for that use case.

But I disagree that the 1-core (rather than 1-thread) benchmark won't tell you anything you don't get from the current testing. It will tell you who has designed the core that can process a given amount of work the fastest. That is an importantly different question than asking who has designed the core that can process a single-threaded workload the fastest.

If you run a fully MT Embarrassingly Parallel benchmark, you can figure out the scaling you get from SMT, without any need to run a separate 2 thread benchmark.

Plus you would have to go through the trouble of disabling the other cores to run a 2 thread benchmark confined to a single core.
 
Reactions: Tlh97 and Saylick

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
If you run a fully MT Embarrassingly Parallel benchmark, you can figure out the scaling you get from SMT, without any need to run a separate 2 thread benchmark.

Plus you would have to go through the trouble of disabling the other cores to run a 2 thread benchmark confined to a single core.
The scaling is non-linear, so you need more than 2 data points.

Edit: To get more detailed -- extremely few workloads are perfectly parallel to infinity, so for each additional core, there is degradation of benefit, and that degradation is non-linear. And when you add SMT to the picture, its benefits vary depending on workload and # of cores as well, and that degradation of benefit is also non-linear.
 
Last edited:

Heartbreaker

Diamond Member
Apr 3, 2006
4,262
5,259
136
The scaling is non-linear, so you need more than 2 data points.

CB is embarrassingly parallel and essentially linear scaling.

Double cores of the same kind and speed, and you will get double performance. You will hit a wall eventually, but not at the scale of these processors.

It will tell you who has designed the core that can process a given amount of work the fastest.

Edit: Revising this. Why is this important?

ST performance is still important. Because outside of the Embarrassingly parallel code, most software still has significant ST bottlenecks. No amount of extra cores/SMT will speed those bottlenecks.

Where do we every encounter a situation where we are confined to a single core with multiple threads, to make this way of benchmarking suddenly important.

It seems it's main importance would be soothing egos bruised by, M1 ST performance. Brilliance on WCCFTechs part. Not only will this click bait work today, but it will get brought up repeatedly by people want to change long held standard benchmarking ST performance to Single Core performance, and link back to WCCFTech for more clicks...
 
Last edited:
Reactions: Zucker2k and Viknet

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
CB is embarrassingly parallel and essentially linear scaling.

Double cores of the same kind and speed, and you will get double performance. You will hit a wall eventually, but not at the scale of these processors.
No, it does not have linear scaling. And no, you won't get double performance from double cores or double threads. This was demonstrated within this very thread, just two pages ago.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Revising this. Why is this important?

ST performance is still important. Because outside of the Embarrassingly parallel code, most software still has significant ST bottlenecks. No amount of extra cores/SMT will speed those bottlenecks.

Where do we every encounter a situation where we are confined to a single core that make this arbitrary, do multiple threads, but do them only on this core, suddenly important?

It seems it's main importance would be soothing egos bruised by, M1 ST performance. Brilliance on WCCFTechs part. Not only will this click bait work today, but it will get brought up repeatedly by people want to change long held standard benchmarking ST performance to Single Core performance, and link back to WCCFTech for more clicks...
Here is what I've said just on this page alone:

"When you go to buy a CPU, you find the one that runs the programs you want the fastest. That's it."

"Which one matters to the real world? Probably just #1 ["Fastest CPU"] -- for the benchmarks you care about for your workload. And #1, with power consumption, if you're on a laptop."

"A 2-threaded benchmark, an 8-threaded benchmark, and a 1-threaded benchmark are all equally valid. They are all only useful practically if [...]"

So, I agree, it is not important practically, only as a technical investigation that confining a process to a single core would be interesting.

Also, let me take your question and ask a similar one: Where do we ever encounter a situation where we are confined to a single THREAD that makes this arbitrary "disregard SMT and the rest of the chip" suddenly important?

The only reason we're talking about pure ST performance, PPC/IPC, etc. is as a technical investigation. And in such a case, purely as a technical investigation, asking "which core can handle a single-threaded workload fastest" is just as valid as asking "which core can handle any load the fastest".

As for soothing egos, why would that be necessary? Whose ego is tied to the M1, Apple, AMD, Intel, x86, Arm, or anything else? Why are you being so antagonistic toward investigation and discussion?
 

LightningZ71

Golden Member
Mar 10, 2017
1,661
1,946
136
This then leads to other questions about benchmarks in general:

Should we find a way to disable SMT on the 4800u/4900h when we test single threaded benchmarks to make sure that that one thread has the core entirely to itself?

Should we not use specially imaged computers that have sanitized images with absolutely nothing else running on them because we NEVER actually see computers used like that in the wild?

Should we even run single thread only benchmarks at all because few programs these days are exclusively single threaded?

The list goes on.

There is no perfect way of doing this stuff. You run a variety of benchmarks and publish your results and methods, hoping that your work helps and informs others.

Personally, I view the M1 as just being yet another way of extracting maximum efficiency from silicon. AMD tried it with CMT with the construction cores, and their implementation sucked. Intel went with SMT, extracting 20-25% more performance per core from it. AMD went back and got a few more percent from a similar approach. ARM figured that you would do best with dedicated smaller, high efficiency cores and larger, high performance cores instead of going with even larger and more complex SMT cores.

So, in my book, the M1 is comparable to tiger lake, but instead of having eight threads spread across four big SMT cores, it's four HP cores and four HE cores handling eight threads. In that metric, it is even more impressive.
 

Saylick

Diamond Member
Sep 10, 2012
3,389
7,154
136
I must admit it makes no sense to me. The only possible reason to even care about ST performance is because many tasks are not optimized for multiple threads, or else are so sequential they’re run best on a single thread. It’s not too much of a simplification to say that the only two types of performance that matter are 1) total multithreaded performance of a processor and 2) the speed at which a core can run a single threaded instruction. SMT doesn’t run any single instruction faster, it simply uses an underutilized core to run a second thread. In other words, SMT can aid in boosting the first type of performance above but is largely useless for the second. Why would one even be interested in the performance of two threads run on the single core? The “adjusted” scores for the x86 chips tell you literally nothing about their capacity to execute a single thread performantly.

Am I missing something here? I know I’m an amateur with this stuff, but this article strikes me as deeply missing the point.
The real amateur is the author of that WCCFTech article. He comes off as if there was a big "gotcha" moment in Apple's advertising and came up with some artificial "true" ST score multiplier that should be applied to ST benchmarks for SMT cores to make the comparison more apples to apples, when all he did was invent some academic, but practically useless, criteria to judge different CPU cores across entirely different ISAs. My bias is obvious now, but this article is typical of WCCFTech, by making a mountain out of a mole hole. Usman needs to learn to stay in his lane and just spit out what AIBs leak to him, because his lack of software or semiconductor engineering really shows in articles like these. It is almost as egregious as his article on why he thought the PS5's clockspeeds couldn't be maintained at the 2.3 GHz maximum: https://wccftech.com/sony-ps5-vs-xbox-series-x-analysis/ Lo and behold, we have Big Navi easily maintaining clocks north of 2.3 GHz and he had the nerve to post crap like that.

This Medium article does a far better job at explaining the differences between the cores without the author coming off as haughty or self-righteous: Why is Apple’s M1 Chip So Fast?
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
The real amateur is the author of that WCCFTech article. He comes off as if there was a big "gotcha" moment in Apple's advertising and came up with some artificial "true" ST score multiplier that should be applied to ST benchmarks for SMT cores to make the comparison more apples to apples, when all he did was invent some academic, but practically useless, criteria to judge different CPU cores across entirely different ISAs. My bias is obvious now, but this article is typical of WCCFTech, by making a mountain out of a mole hole. Usman needs to learn to stay in his lane and just spit out what AIBs leak to him, because his lack of software or semiconductor engineering really shows in articles like these. It is almost as egregious as his article on why he thought the PS5's clockspeeds couldn't be maintained at the 2.3 GHz maximum: https://wccftech.com/sony-ps5-vs-xbox-series-x-analysis/ Lo and behold, we have Big Navi easily maintaining clocks north of 2.3 GHz and he had the nerve to post crap like that.

This Medium article does a far better job at explaining the differences between the cores without the author coming off as haughty or self-righteous: Why is Apple’s M1 Chip So Fast?
While it explains the differences better, it has so many glaring errors that it's hard to take seriously.

The funniest one was claiming that AMD's APUs don't have IO controllers. Even the "CPU" from AMD has an IO controller.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,262
5,259
136
No, it does not have linear scaling. And no, you won't get double performance from double cores or double threads. This was demonstrated within this very thread, just two pages ago.

I have seen Cinebench scale linear up to 24 threads on a Xeon processor. It's embarrassingly parallel. If it goes non-linear before that, it's most likely the processor throttling power at higher thread counts, or hitting some kind of other processor issue (NUMA, inter CCX, bandwidth).

Also, let me take your question and ask a similar one: Where do we ever encounter a situation where we are confined to a single THREAD that makes this arbitrary "disregard SMT and the rest of the chip" suddenly important?

As I already said. Most software isn't embarrassingly parallel, there single threaded bottlenecks throughout, that will only have their speed increased by ST performance. No amount of cores/SMT will speed up the ST bottlenecks that exist in most applications.

And it's not disregard for SMT and the rest of the chip. I said with ST, full MT , and applicaiton testing, we have everything we need to know.

Not antagonistic to investigation, just to clickbait sites like WCCFTech, and the motivation behind this.
 
Last edited:

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
I have seen it scale linear up to 24 threads on a Xeon processors. It's embarrassingly parallel. If it goes non-linear before that, it's most likely the processor throttling power at higher thread counts, or hitting some kind of other processor issue (NUMA, inter CCX, bandwidth).
It doesn't scale linearly.

As I already said. Most software isn't embarrassingly parallel, there single threaded bottlenecks throughout, that will only have their speed increased by ST performance. No amount of cores/SMT will speed up the ST bottlenecks that exist in most applications.

Not antagonistic to investigation, just to clickbait sites like WCCFTech, and the motivation behind this.
There are also bottlenecks even at 2 threads, though fewer bottlenecks than at 1 thread. There are still fewer bottlenecks at 4 threads. But bottlenecks don't just stop at 1 thread.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
The point of ST benchmarks was never to saturate anything. IBM has a CPU with up to 8 way SMT, would you consider only appropriate to run tests with a minimum of 8 Threads??

An interesting question, which kind of exposes the hilarity of this debate. If throughput and end performance are what matters the most, why even bother with single threaded benchmarks since they are artificially limited?

I mean, why bother running Geekbench in single thread mode, when it is clearly much faster in multithreaded mode for instance?

There is no real need to suddenly run 2 thread specific benchmarks, it really won't tell you anything you don't get from the currently done testing.

I can't say I agree here. Running a benchmark in single threaded mode, dual thread, quad thread etcetera can all be relevant depending on what you are attempting to accomplish.

For example, if you want to see how well an application can scale with multithreaded performance, many reviewers will benchmark the CPU in different configurations.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
It is quite complicated. For many end user computing tasks, a single thread can dominate the performance characteristics of the application - so single threaded performance is still very relevant.

I was certainly not trying to imply that single threaded performance is not relevant or important. It is EXTREMELY important.

My point was that it has no intrinsic meaning outside of how it contributes to the overall throughput of a CPU, because no intensive applications are purely single threaded anymore. Even on mobile, multicore is leveraged quite heavily, though not nearly as much as in other platforms like desktop/workstation/server due to power constraints.

For many applications, where it is multithreaded, the SMT score is quite a useful metric. But keep in mind Apple is heavily focused on performance/watt, so example Apple can field 2 or more high performance cores (especially in comparison with Intel) within the same power envelope compared to a single SMT-2 core. I suspect we will soon see this with the upcoming 'm1x' where it will start dominating the mid-high end laptops in benchmarks.

I agree that for Apple, it makes no sense to implement SMT.

FYI, if you consider the best performance per core - IBM's Power9, first released way back in 2017 is still the king on optimized benchmarks for the platform and there are many examples where it still far outpaces any x86 processor in commercial server type benchmarks. It can implement up to SMT-8. The Power9 however, has significantly lower single thread performance compared to AMD or Intel. IBM gets approx 3X speedup going from single thread to SMT-8 on a single core.

This ties into what I was saying earlier, about how various CPU manufacturers design their CPUs around their target workloads and platforms. With that said, why is it controversial if Apple creates a CPU that has the fastest single thread performance, if that type of workload is more prevalent in the mobile platform?
 

Nicola Telecco

Junior Member
Nov 22, 2020
3
6
36
My free thoughts:

It is up to the OS kernel's load balancing algorithms, together with the "special sauce" built into the SOC power management circuitry, to maximize thread execution efficiency while preserving the system integrity, responsiveness, within all the defined thermal boundaries.

Since temperature and bias play a major role in silicon aging, the modern kernels of MP systems do not lock the execution of single threads to individual cores, unless the threads are short "enough" in time (for true efficient "burst" execution).
"enough" is clearly not an absolute and constant number, as it is calibrated based on the system and its boundaries, although the order of magnitude to keep in mind "here" (portable personal computing) may be ms, certainly not tens of seconds or minutes (...)

I'm not a "pro" developer, however I don't know (at least on macOS) of any way for a developer or a user to force a thread execution to a fixed cpu core. There may be private APIs to do so (...), or it may be solely a kernel's prerogative (...).
I am certainly curious to learn more about the efficient thread management "secrets" in macOS on AS systems.

As a result, with the traditional benchmarking of an MP system "single core" performance, we are really benchmarking the kernel and the whole system, not just a single core. From this perspective, it may be more semantically correct referring to "single thread" mode when benchmarking an MP system using long single thread loads, instead of the commonly used "single core" mode. Of course let's be clear, semantics do not change the overall perf/W stories (...)

Furthermore, unless they are calibrated to be suitable for true burst mode execution across the different systems, single thread benchmark loads may take different levels of power management resources and energy to be moved around across the cpu cores. For example, the higher the # of cpu cores "exposed" to the thread, the higher the system overhead. From this perspective, we could argue that long single thread benchmarks may tend to penalize higher core count MP systems (...).
 
Last edited:

jeanlain

Member
Oct 26, 2020
159
136
86
They changed it (likely after pushback) because when they announced it, they said it was the fastest CPU core and they didn't offer any specificity beyond that.

Now, they should have said "at single threaded tasks" to account for SMT, but I suppose everyone understood that.
Note also that the tests Apple relies on were performed on October with commercially available CPUs. At that time, the fastest core, including desktop CPUs was intel's (the 10900k I suppose), which is beaten by the M1 at almost every single-threaded task. So Apple's claim appear quite conservative. Specifying "when it comes to low-power silicon" makes the claim valid today.
Still, the M1 trade blows with the current best desktop CPU core in ST SPEC tests.
 
Last edited:
Reactions: Tlh97
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |