Question Zen 6 Speculation Thread

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

esquared

Forum Director & Omnipotent Overlord
Forum Director
Oct 8, 2000
23,769
4,962
146
Way too many reported posts.

The Intel-talk needs to stop unless it is absolutely relevant to the topic at hand.
Which I will remind you is a Zen 6 speculation thread.

Continuing to do so after this official post will most likely result in infractions and vacations


esquared
Anandtech Forum Director
 

Anhiel

Member
May 12, 2022
69
27
61
Well, it's good Intel provided some official data of their own. They are slightly worse than what I saw in past Anandtech articles.
I did use past data in various calculations I've posted before with the exact claimed numbers. So nothing world changing.

Anyhow, for comparison the Zen architecture is superior but also somewhat worse depending on how you interpretate things.

There's only a 2% difference in power. You could say it's bad because it needed to keep a lot powered regardless of usage.
On the other hand the Zen architecture on average has a higher SMT gain (1.25x) than Intel (1.15x).

What most here don't realize or understand is there's a different evaluation of performance in SMT vs. ST.
Also to keep things simple we only care about performance and maybe energy consumption but not area etc.

In SMT-2 the work is divided into 2 threads and each contributes a different amount of performance. The difference to iso-ST... this is the main reason for the gains or loss with or without SMT.
With SMT Zen has a 12% latency so the 1st thread is 1/1.12=0.89285 = 90% ; 2nd thread 1.25-0.9=0.35 = 35%
As you can see without SMT it would only gain 10% while losing 25%.
Intel gains 10% and loses 15%.
That's why I said more than a year ago AMD doesn't benefit from losing SMT as Intel does.
 

naukkis

Senior member
Jun 5, 2002
768
633
136
In SMT-2 the work is divided into 2 threads and each contributes a different amount of performance. The difference to iso-ST... this is the main reason for the gains or loss with or without SMT.
With SMT Zen has a 12% latency so the 1st thread is 1/1.12=0.89285 = 90% ; 2nd thread 1.25-0.9=0.35 = 35%
As you can see without SMT it would only gain 10% while losing 25%.
Intel gains 10% and loses 15%.
That's why I said more than a year ago AMD doesn't benefit from losing SMT as Intel does.

SMT stands for symmetrical multithreading. So both threads run at similar speeds, with 25% SMT gain that means both threads run at 62.5% of 1T speed.
 
Reactions: Bigos

StefanR5R

Elite Member
Dec 10, 2016
5,680
8,223
136
Think of the process scheduler (task scheduler) of an operating system kernel.
  • Coming up with a most-of-the-time-near-optimum scheduling policy on homogeneous multicore SMT processors is hard.
  • On heterogeneous non-SMT processors it is even harder.
  • Forget about optimum scheduling on heterogeneous SMT processors, outside of a narrow band of load scenarios which the scheduler architects researched deeply enough, on the given hardware (which varies between CPU models of a generation/ varies by whole system performance characteristics, e.g. RAM performance/ and varies between CPU generations).
Edit, PS:
AMD introduced heterogeneity with Zen 4 Phoenix 2, but in a way which is *way* easier to deal with in real software than on Intel's heterogeneous client CPUs. Allegedly, AMD will go another step into heterogeneity with Zen 5 Strix Halo, but in a way which is targeted to very specific load scenarios. (Low power island for background stuff/ always-on stuff...? — thus, probably not posing the same degree of SMT conundrums as Intel's current client CPUs.) Let's see to which degree heterogeneity will be present in Zen 6. My guess is that heterogeneity won't be seen in neither AMD's nor Intel's server CPUs, just like at present.
 
Last edited:
Reactions: Tlh97

naukkis

Senior member
Jun 5, 2002
768
633
136
Simultaneous.
SMP is symmetric multiprocessing, though.

Simultaneous and symmetric multithreadings are synonyms, both SMT. There's no way in SMT to split running threads unequally, both threads share resources equally. Opposite to them is asymmetric or non-simultaneous multithreading which is not implemented in any x86 cpu.
 
Reactions: Tlh97 and Bigos

naukkis

Senior member
Jun 5, 2002
768
633
136
Thats definitely not true for SMT on Zen - resources can be dynamically partitioned between the two threads.

Internal structures are splitted by threads needs but threads are equal - there's no way to differentiate executed threads speed.
 

naukkis

Senior member
Jun 5, 2002
768
633
136
No need to speculate, the SOG says they are shared dynamically. Only 3 resources are statically partitioned rest are not.

Dynamically shared doesn't mean that theads are not symmetrical. Both threads have equal rights to all cpu structures and caches and those are given to whatever thread needs it. By sharing them dynamically cpu best uses it's resources. Both running threads still has equal right to resources - and if running exactly same code all should be shared even.
 

Nothingness

Platinum Member
Jul 3, 2013
2,717
1,347
136
Dynamically shared doesn't mean that theads are not symmetrical. Both threads have equal rights to all cpu structures and caches and those are given to whatever thread needs it. By sharing them dynamically cpu best uses it's resources. Both running threads still has equal right to resources - and if running exactly same code all should be shared even.
Even when you share code, data used will be different so miss traffic will be different so stalls that allows switching between HW threads will occur at different times. So no, you can't deduce that being 25% faster in SMT means each thread runs at 62.5% of ST.
 

Bigos

Member
Jun 2, 2019
137
320
136
If both threads run roughly the same code that might be an approximation but even then it might not be exactly true.

What I do not like is people thinking there is a "main thread" and "SMT thread" and the "main thread" is 100% and the "SMT thread" is 25% or whatever the speed-up from SMT for a particular workload is. That is not how the force works.

For +25% speed-up it means the two threads have average speed of 62.5% of a lone thread, but one can be faster than the other and this can vary over time.
 

naukkis

Senior member
Jun 5, 2002
768
633
136
Even when you share code, data used will be different so miss traffic will be different so stalls that allows switching between HW threads will occur at different times. So no, you can't deduce that being 25% faster in SMT means each thread runs at 62.5% of ST.

You don't get what simultaneous means. Threads won't switch - both threads are executed concurrently all the time. So yes, both threads execute about half the speed one thread running that same core. That was a bit surprise to few my friend when P4 HT arrived and Seti@home was a thing - Seti running on HT P4 actually slowed system down which wasn't happening one thread systems as idle priority threads only used idle cpu time.

And it seems that many people don't get what SMT actually does - it slows thread execution speed to about half. It's just opposite of what most people need and want to fast system to be.
 

Nothingness

Platinum Member
Jul 3, 2013
2,717
1,347
136
You don't get what simultaneous means. Threads won't switch - both threads are executed concurrently all the time. So yes, both threads execute about half the speed one thread running that same core. That was a bit surprise to few my friend when P4 HT arrived and Seti@home was a thing - Seti running on HT P4 actually slowed system down which wasn't happening one thread systems as idle priority threads only used idle cpu time.

And it seems that many people don't get what SMT actually does - it slows thread execution speed to about half. It's just opposite of what most people need and want to fast system to be.
You can't deduce how SMT performs by considering the deficient first implementation Intel made.

For the rest of your claim, I won't try to disprove it as my experience with SMT implementation is not deep enough, though I think I know a little more than you about it.
 

Nothingness

Platinum Member
Jul 3, 2013
2,717
1,347
136
If both threads run roughly the same code that might be an approximation but even then it might not be exactly true.

What I do not like is people thinking there is a "main thread" and "SMT thread" and the "main thread" is 100% and the "SMT thread" is 25% or whatever the speed-up from SMT for a particular workload is. That is not how the force works.

For +25% speed-up it means the two threads have average speed of 62.5% of a lone thread, but one can be faster than the other and this can vary over time.
It might be an interesting experiment to run the exact same 1T program on two SMT threads and see how it goes. Of course one would have to force the OS scheduler not to move threads by using taskset. But life's too short 😀
 

SarahKerrigan

Senior member
Oct 12, 2014
585
1,397
136
You don't get what simultaneous means. Threads won't switch - both threads are executed concurrently all the time. So yes, both threads execute about half the speed one thread running that same core. That was a bit surprise to few my friend when P4 HT arrived and Seti@home was a thing - Seti running on HT P4 actually slowed system down which wasn't happening one thread systems as idle priority threads only used idle cpu time.

And it seems that many people don't get what SMT actually does - it slows thread execution speed to about half. It's just opposite of what most people need and want to fast system to be.

There is switching in pretty much the whole frontend in most SMT implementations - ie, hardly anyone does concurrent fetch across two threads. Additionally, in some SMT implementations, there are more threads than there are backend resources, so switching happens there too (usually with a thread pick stage of some kind.)
 

blackangus

Member
Aug 5, 2022
115
161
76
So yes, both threads execute about half the speed one thread running that same core.
With a 62.5% average speed from your quote, wouldn't that be more like they execute both threads at about 2/3rds the speed?
Or is my medicine addled brain playing tricks on me?
 

naukkis

Senior member
Jun 5, 2002
768
633
136
There is switching in pretty much the whole frontend in most SMT implementations - ie, hardly anyone does concurrent fetch across two threads. Additionally, in some SMT implementations, there are more threads than there are backend resources, so switching happens there too (usually with a thread pick stage of some kind.)

In x86 implementations decoder decodes from different thread feed every other clock cycle. But that's 100% equal thread balancing without switching execution from one thread execution to other - and execution from that on is buffered and both threads are threaded equally and every clock cycle both threads ops are executed if threads aren't stalled. SMT is implementation where threads are run concurrently without switching - if there's thread execution switching happening that multi-threading scheme is something other than SMT.
 

naukkis

Senior member
Jun 5, 2002
768
633
136
With a 62.5% average speed from your quote, wouldn't that be more like they execute both threads at about 2/3rds the speed?
Or is my medicine addled brain playing tricks on me?

Nearer to half as to full speed. With SMT4 you only got one fourth of single thread speed when all threads are in use. Point is, for usual MT workloads there's a need for best possible thread execution performance to have best possible MT scaling due Amdahl's law. Using SMT to reduce thread execution speed is mostly harmful to usual workloads, it's not a free lunch. And Intel states that they got 15% performance/watt uplift from ditching SMT alltogether.
 

naukkis

Senior member
Jun 5, 2002
768
633
136
It might be an interesting experiment to run the exact same 1T program on two SMT threads and see how it goes. Of course one would have to force the OS scheduler not to move threads by using taskset. But life's too short 😀

That's not a time taking experience. Actually have to say why aren't you tried it? Check yourself, just any program two instances give them affinity to one cores both threads and see yourself how they perform. About 99% speed to each other. Actually multi-threading implementation where there could be super slow threads would be worst possible configuration for user-oriented use cases like desktop and laptops. And it also would ruin MT performance for most workloads that are not hand-tuned to keep mind for a such a low speed grade threads.
 

StefanR5R

Elite Member
Dec 10, 2016
5,680
8,223
136
Intel states that they got 15% performance/watt uplift from ditching SMT alltogether.
It's a Null statement, pretty much useless to discuss. Completely rhetorical questions:
To which workloads did Intel refer to? (Applications and their configs/ datasets, and operating system, including kernel version.)
Was that for heterogeneous cores or homogeneous cores?
 

Nothingness

Platinum Member
Jul 3, 2013
2,717
1,347
136
It's a Null statement, pretty much useless to discuss. Completely rhetorical questions:
To which workloads did Intel refer to? (Applications and their configs/ datasets, and operating system, including kernel version.)
Was that for heterogeneous cores or homogeneous cores?
And as I read slide, it's almost impossible to say the 15% decrease in power is due only to SMT removal. To have the correct figure you'd need 2 designs both properly tuned for the feature or its absence. Economically this makes no sense. So they have to sell their decision.
 

Nothingness

Platinum Member
Jul 3, 2013
2,717
1,347
136
That's not a time taking experience. Actually have to say why aren't you tried it? Check yourself, just any program two instances give them affinity to one cores both threads and see yourself how they perform. About 99% speed to each other. Actually multi-threading implementation where there could be super slow threads would be worst possible configuration for user-oriented use cases like desktop and laptops. And it also would ruin MT performance for most workloads that are not hand-tuned to keep mind for a such a low speed grade threads.
Thing is I'm not really interested in the experiment, it's just a thought exercise. OTOH I was not the one making the original claim, so I think the burden of proof is on you 😉
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |