Question Zen 6 Speculation Thread

esquared · Jun 10, 2024

Way too many reported posts.

The Intel-talk needs to stop unless it is absolutely relevant to the topic at hand.
Which I will remind you is a Zen 6 speculation thread.

Continuing to do so after this official post will most likely result in infractions and vacations

esquared
Anandtech Forum Director

Anhiel · Jun 12, 2024

Well, it's good Intel provided some official data of their own. They are slightly worse than what I saw in past Anandtech articles.
I did use past data in various calculations I've posted before with the exact claimed numbers. So nothing world changing.

Anyhow, for comparison the Zen architecture is superior but also somewhat worse depending on how you interpretate things.

Investigating Performance of Multi-Threading on Zen 3 and AMD Ryzen 5000

www.anandtech.com

There's only a 2% difference in power. You could say it's bad because it needed to keep a lot powered regardless of usage.
On the other hand the Zen architecture on average has a higher SMT gain (1.25x) than Intel (1.15x).

What most here don't realize or understand is there's a different evaluation of performance in SMT vs. ST.
Also to keep things simple we only care about performance and maybe energy consumption but not area etc.

In SMT-2 the work is divided into 2 threads and each contributes a different amount of performance. The difference to iso-ST... this is the main reason for the gains or loss with or without SMT.
With SMT Zen has a 12% latency so the 1st thread is 1/1.12=0.89285 = 90% ; 2nd thread 1.25-0.9=0.35 = 35%
As you can see without SMT it would only gain 10% while losing 25%.
Intel gains 10% and loses 15%.
That's why I said more than a year ago AMD doesn't benefit from losing SMT as Intel does.

naukkis · Jun 13, 2024

Anhiel said:
In SMT-2 the work is divided into 2 threads and each contributes a different amount of performance. The difference to iso-ST... this is the main reason for the gains or loss with or without SMT.
With SMT Zen has a 12% latency so the 1st thread is 1/1.12=0.89285 = 90% ; 2nd thread 1.25-0.9=0.35 = 35%
As you can see without SMT it would only gain 10% while losing 25%.
Intel gains 10% and loses 15%.
That's why I said more than a year ago AMD doesn't benefit from losing SMT as Intel does.

SMT stands for symmetrical multithreading. So both threads run at similar speeds, with 25% SMT gain that means both threads run at 62.5% of 1T speed.

FlameTail · Jun 13, 2024

naukkis said:
SMT stands for symmetrical multithreading

What.

StefanR5R · Jun 13, 2024

Think of the process scheduler (task scheduler) of an operating system kernel.

Coming up with a most-of-the-time-near-optimum scheduling policy on homogeneous multicore SMT processors is hard.
On heterogeneous non-SMT processors it is even harder.
Forget about optimum scheduling on heterogeneous SMT processors, outside of a narrow band of load scenarios which the scheduler architects researched deeply enough, on the given hardware (which varies between CPU models of a generation/ varies by whole system performance characteristics, e.g. RAM performance/ and varies between CPU generations).

Edit, PS:
AMD introduced heterogeneity with Zen 4 Phoenix 2, but in a way which is *way* easier to deal with in real software than on Intel's heterogeneous client CPUs. Allegedly, AMD will go another step into heterogeneity with Zen 5 Strix Halo, but in a way which is targeted to very specific load scenarios. (Low power island for background stuff/ always-on stuff...? — thus, probably not posing the same degree of SMT conundrums as Intel's current client CPUs.) Let's see to which degree heterogeneity will be present in Zen 6. My guess is that heterogeneity won't be seen in neither AMD's nor Intel's server CPUs, just like at present.

adroc_thurston · Jun 13, 2024

naukkis said:
SMT stands for symmetrical multithreading

Simultaneous.
SMP is symmetric multiprocessing, though.

naukkis · Jun 13, 2024

adroc_thurston said:
Simultaneous.
SMP is symmetric multiprocessing, though.

Simultaneous and symmetric multithreadings are synonyms, both SMT. There's no way in SMT to split running threads unequally, both threads share resources equally. Opposite to them is asymmetric or non-simultaneous multithreading which is not implemented in any x86 cpu.

TimCh · Jun 13, 2024

Thats definitely not true for SMT on Zen - resources can be dynamically partitioned between the two threads.

Saylick · Jun 13, 2024

TimCh said:
Thats definitely not true for SMT on Zen - resources can be dynamically partitioned between the two threads.

For some parts of the pipeline only, iirc. There’s also some parts of the pipeline which are statically partitioned with SMT turned on.

From OG Zen slides:

naukkis · Jun 13, 2024

TimCh said:
Thats definitely not true for SMT on Zen - resources can be dynamically partitioned between the two threads.

Internal structures are splitted by threads needs but threads are equal - there's no way to differentiate executed threads speed.

DisEnchantment · Jun 13, 2024

naukkis said:
Internal structures are splitted by threads needs but threads are equal - there's no way to differentiate executed threads speed.

No need to speculate, the SOG says they are shared dynamically. Only 3 resources are statically partitioned rest are not.

naukkis · Jun 13, 2024

DisEnchantment said:
No need to speculate, the SOG says they are shared dynamically. Only 3 resources are statically partitioned rest are not.

Dynamically shared doesn't mean that theads are not symmetrical. Both threads have equal rights to all cpu structures and caches and those are given to whatever thread needs it. By sharing them dynamically cpu best uses it's resources. Both running threads still has equal right to resources - and if running exactly same code all should be shared even.

Nothingness · Jun 13, 2024

naukkis said:
Dynamically shared doesn't mean that theads are not symmetrical. Both threads have equal rights to all cpu structures and caches and those are given to whatever thread needs it. By sharing them dynamically cpu best uses it's resources. Both running threads still has equal right to resources - and if running exactly same code all should be shared even.

Even when you share code, data used will be different so miss traffic will be different so stalls that allows switching between HW threads will occur at different times. So no, you can't deduce that being 25% faster in SMT means each thread runs at 62.5% of ST.

Bigos · Jun 13, 2024

If both threads run roughly the same code that might be an approximation but even then it might not be exactly true.

What I do not like is people thinking there is a "main thread" and "SMT thread" and the "main thread" is 100% and the "SMT thread" is 25% or whatever the speed-up from SMT for a particular workload is. That is not how the force works.

For +25% speed-up it means the two threads have average speed of 62.5% of a lone thread, but one can be faster than the other and this can vary over time.

naukkis · Jun 13, 2024

Nothingness said:
Even when you share code, data used will be different so miss traffic will be different so stalls that allows switching between HW threads will occur at different times. So no, you can't deduce that being 25% faster in SMT means each thread runs at 62.5% of ST.

You don't get what simultaneous means. Threads won't switch - both threads are executed concurrently all the time. So yes, both threads execute about half the speed one thread running that same core. That was a bit surprise to few my friend when P4 HT arrived and Seti@home was a thing - Seti running on HT P4 actually slowed system down which wasn't happening one thread systems as idle priority threads only used idle cpu time.

And it seems that many people don't get what SMT actually does - it slows thread execution speed to about half. It's just opposite of what most people need and want to fast system to be.

Nothingness · Jun 13, 2024

naukkis said:
You don't get what simultaneous means. Threads won't switch - both threads are executed concurrently all the time. So yes, both threads execute about half the speed one thread running that same core. That was a bit surprise to few my friend when P4 HT arrived and Seti@home was a thing - Seti running on HT P4 actually slowed system down which wasn't happening one thread systems as idle priority threads only used idle cpu time.

And it seems that many people don't get what SMT actually does - it slows thread execution speed to about half. It's just opposite of what most people need and want to fast system to be.

You can't deduce how SMT performs by considering the deficient first implementation Intel made.

For the rest of your claim, I won't try to disprove it as my experience with SMT implementation is not deep enough, though I think I know a little more than you about it.

Nothingness · Jun 13, 2024

Bigos said:
If both threads run roughly the same code that might be an approximation but even then it might not be exactly true.

What I do not like is people thinking there is a "main thread" and "SMT thread" and the "main thread" is 100% and the "SMT thread" is 25% or whatever the speed-up from SMT for a particular workload is. That is not how the force works.

For +25% speed-up it means the two threads have average speed of 62.5% of a lone thread, but one can be faster than the other and this can vary over time.

It might be an interesting experiment to run the exact same 1T program on two SMT threads and see how it goes. Of course one would have to force the OS scheduler not to move threads by using taskset. But life's too short 😀

SarahKerrigan · Jun 13, 2024

naukkis said:
You don't get what simultaneous means. Threads won't switch - both threads are executed concurrently all the time. So yes, both threads execute about half the speed one thread running that same core. That was a bit surprise to few my friend when P4 HT arrived and Seti@home was a thing - Seti running on HT P4 actually slowed system down which wasn't happening one thread systems as idle priority threads only used idle cpu time.

And it seems that many people don't get what SMT actually does - it slows thread execution speed to about half. It's just opposite of what most people need and want to fast system to be.

There is switching in pretty much the whole frontend in most SMT implementations - ie, hardly anyone does concurrent fetch across two threads. Additionally, in some SMT implementations, there are more threads than there are backend resources, so switching happens there too (usually with a thread pick stage of some kind.)

blackangus · Jun 13, 2024

naukkis said:
So yes, both threads execute about half the speed one thread running that same core.

With a 62.5% average speed from your quote, wouldn't that be more like they execute both threads at about 2/3rds the speed?
Or is my medicine addled brain playing tricks on me?

naukkis · Jun 14, 2024

SarahKerrigan said:
There is switching in pretty much the whole frontend in most SMT implementations - ie, hardly anyone does concurrent fetch across two threads. Additionally, in some SMT implementations, there are more threads than there are backend resources, so switching happens there too (usually with a thread pick stage of some kind.)

In x86 implementations decoder decodes from different thread feed every other clock cycle. But that's 100% equal thread balancing without switching execution from one thread execution to other - and execution from that on is buffered and both threads are threaded equally and every clock cycle both threads ops are executed if threads aren't stalled. SMT is implementation where threads are run concurrently without switching - if there's thread execution switching happening that multi-threading scheme is something other than SMT.

naukkis · Jun 14, 2024

blackangus said:
With a 62.5% average speed from your quote, wouldn't that be more like they execute both threads at about 2/3rds the speed?
Or is my medicine addled brain playing tricks on me?

Nearer to half as to full speed. With SMT4 you only got one fourth of single thread speed when all threads are in use. Point is, for usual MT workloads there's a need for best possible thread execution performance to have best possible MT scaling due Amdahl's law. Using SMT to reduce thread execution speed is mostly harmful to usual workloads, it's not a free lunch. And Intel states that they got 15% performance/watt uplift from ditching SMT alltogether.

naukkis · Jun 14, 2024

Nothingness said:
It might be an interesting experiment to run the exact same 1T program on two SMT threads and see how it goes. Of course one would have to force the OS scheduler not to move threads by using taskset. But life's too short 😀

That's not a time taking experience. Actually have to say why aren't you tried it? Check yourself, just any program two instances give them affinity to one cores both threads and see yourself how they perform. About 99% speed to each other. Actually multi-threading implementation where there could be super slow threads would be worst possible configuration for user-oriented use cases like desktop and laptops. And it also would ruin MT performance for most workloads that are not hand-tuned to keep mind for a such a low speed grade threads.

StefanR5R · Jun 14, 2024

naukkis said:
Intel states that they got 15% performance/watt uplift from ditching SMT alltogether.

It's a Null statement, pretty much useless to discuss. Completely rhetorical questions:
To which workloads did Intel refer to? (Applications and their configs/ datasets, and operating system, including kernel version.)
Was that for heterogeneous cores or homogeneous cores?

Nothingness · Jun 14, 2024

StefanR5R said:
It's a Null statement, pretty much useless to discuss. Completely rhetorical questions:
To which workloads did Intel refer to? (Applications and their configs/ datasets, and operating system, including kernel version.)
Was that for heterogeneous cores or homogeneous cores?

And as I read slide, it's almost impossible to say the 15% decrease in power is due only to SMT removal. To have the correct figure you'd need 2 designs both properly tuned for the feature or its absence. Economically this makes no sense. So they have to sell their decision.

Nothingness · Jun 14, 2024

naukkis said:
That's not a time taking experience. Actually have to say why aren't you tried it? Check yourself, just any program two instances give them affinity to one cores both threads and see yourself how they perform. About 99% speed to each other. Actually multi-threading implementation where there could be super slow threads would be worst possible configuration for user-oriented use cases like desktop and laptops. And it also would ruin MT performance for most workloads that are not hand-tuned to keep mind for a such a low speed grade threads.

Thing is I'm not really interested in the experiment, it's just a thought exercise. OTOH I was not the one making the original claim, so I think the burden of proof is on you 😉

Question Zen 6 Speculation Thread

Forum Director & Omnipotent Overlord

Member

Senior member

Diamond Member

Elite Member

Diamond Member

Senior member

Member

Diamond Member

Senior member

Golden Member

Senior member

Platinum Member

Member

Senior member

Platinum Member

Platinum Member

Senior member

Member

Senior member

Senior member

Senior member

Elite Member

Platinum Member

Platinum Member