Question Zen 6 Speculation Thread

gdansk · Aug 12, 2024

carancho said:
And why wouldn't it be possible to achieve the ST performance levels of Apple? Without resorting to discredited ISA superiority arguments....

Relatively? They stand no chance - Apple is on a 12 month cadence, has a higher R&D budget, monopolizes leading nodes, and ships far more units than AMD.

Absolutely? They are losing to designs that don't even have nor need a uop cache and save area and power as a result.

But your argument doesn't hold up at all because Zen 5, Lion Cove and Zen 6 all deliver meager ST improvements. It isn't a choice of more MT at the expense of ST. MT is the only thing they can increase more than 10-15% next generation. Unless Intel can deliver another Skymont-like upgrade (and I'm not convinced) it seems we're at the whimpering end of the less competitive x64 market. There's no way Zen 6 is competitive without more cores in 2026.

poke01 · Aug 12, 2024

gdansk said:
Relatively? They stand no chance - Apple is on a 12 month cadence, has a higher R&D budget, monopolizes leading nodes, and ships far more units than AMD.

Absolutely? They are losing to designs that don't even have nor need a uop cache and save area and power as a result.

But your argument doesn't hold up at all because Zen 5, Lion Cove and Zen 6 all deliver meager ST improvements. It isn't a choice of more MT at the expense of ST. MT is the only thing they can increase more than 10-15% next generation. Unless Skymont successor solves the x64 front end problem (and I'm not convinced)

before that AMD needs to fix the memory bandwidth issue as well. Dual channel DRR5 isn't enough. We need more bandwidth on desktop and upgrade the old fabric AMD!

gdansk · Aug 12, 2024

poke01 said:
before that AMD needs to fix the memory bandwidth issue as well. Dual channel DRR5 isn't enough. We need more bandwidth on desktop and upgrade the old fabric AMD!

For a useful part. But that doesn't matter for Cinebench. And that's enough to sell it a bit.

poke01 · Aug 13, 2024

LightningZ71 said:
The ACHIEVABLE triangle (performance, efficiency, density) of N4P is better than N5P in every way. This does NOT preclude a customer from using a dense N5P configuration that has greater density than a performance, relaxed density product on N4P.

While I don't have hard figures in front of me, it is entirely possible that the density and efficiency focused M2 product could be denser than the performance and efficiency focused Strix Point product, even if N4P is a better node than N5P.

My point is there is nothing stopping AMD from designing a SoC/APU like the M Pro. The die size is comparable to Strix Point so it shouldn’t cost more.

The way AMD and Intel design cores for laptops is not good. Clocking to 5.1GHz and using 20 watts to score lower in cinebench 2024 ST than a M2 @ 3.6GHz from 2022.

when do get OoO superscalar architectures from AMD? Is Sound Wave custom ARM IP?

gdansk · Aug 13, 2024

poke01 said:
The way AMD and Intel design cores for laptops is not good. Clocking to 5.1GHz and using 20 watts to score lower in cinebench 2024 ST than a M2 @ 3.6GHz from 2022.

That's one thing that has been bothering me.
Qualcomm, for example, shows as much more efficient than HX370 in Cinebench R24 in ST. At ~4.2GHz.
But in MT, the X1E-84 and HX 370 are very close in total score and score per watt when both are limited to the same TDP (at least according to Notebook Check tests).
How does the HX 370 gain so much relative efficiency when they're both the same core count? SMT? David Huang's tests show that power increases nearly in proportion to increased throughput for SMT in a variety of workloads. Shouldn't even more active inefficient x64 decoders (now 24 of them for Strix instead of apparently only 1 in ST) cause HX 370 to become even less efficient relative to SDXE in MT than in ST? But instead it catches up. Sure they are now throttled to a more efficient clock rate but so too is SDXE.

🤔 It just doesn't make sense to me.

FlameTail · Aug 13, 2024

gdansk said:
That's one thing that has been bothering me.
Qualcomm, for example, shows as much more efficient than HX370 in Cinebench R24 in ST. At ~4.2GHz.
But in MT, the X1E-84 and HX 370 are very close in total score and score per watt when both are limited to the same TDP (at least according to Notebook Check tests).
How does the HX 370 gain so much relative efficiency when they're both the same core count? SMT? David Huang's tests show that power increases nearly in proportion to increased throughput for SMT in a variety of workloads. Shouldn't even more active inefficient x64 decoders (now 24 of them for Strix instead of apparently only 1 in ST) cause HX 370 to become even less efficient relative to SDXE in MT than in ST? But instead it catches up. Sure they are now throttled to a more efficient clock rate but so too is SDXE.

🤔 It just doesn't make sense to me.

This has been on my mind for a while too. I was writing it off as AMD having the benefit of SMT.

poke01 · Aug 13, 2024

gdansk said:
That's one thing that has been bothering me.
Qualcomm, for example, shows as much more efficient than HX370 in Cinebench R24 in ST. At ~4.2GHz.
But in MT, the X1E-84 and HX 370 are very close in total score and score per watt when both are limited to the same TDP (at least according to Notebook Check tests).
How does the HX 370 gain so much relative efficiency when they're both the same core count? SMT? David Huang's tests show that power increases nearly in proportion to increased throughput for SMT in a variety of workloads. Shouldn't even more active inefficient x64 decoders (now 24 of them for Strix instead of apparently only 1 in ST) cause HX 370 to become even less efficient relative to SDXE in MT than in ST? But instead it catches up. Sure they are now throttled to a more efficient clock rate but so too is SDXE.

🤔 It just doesn't make sense to me.

SMT plays a huge part in AMD's MT score. Around 25-26% is due to SMT.

We found the Missing Performance: Zen 5 Tested with SMT Disabled

Reviews of AMD’s Zen 5 processors this week surprised many, with lower-than-expected results. After some investigation, we discovered that turning off Simultaneous Multithreading (SMT) can yield notable performance gains, particularly in gaming. This article presents our findings, including...

www.techpowerup.com

gdansk · Aug 13, 2024

poke01 said:
SMT plays a huge part in AMD's MT score. Around 25-26% is due to SMT.

We found the Missing Performance: Zen 5 Tested with SMT Disabled

Reviews of AMD’s Zen 5 processors this week surprised many, with lower-than-expected results. After some investigation, we discovered that turning off Simultaneous Multithreading (SMT) can yield notable performance gains, particularly in gaming. This article presents our findings, including...

www.techpowerup.com

View attachment 105208

Yes, that explains matching the final score but according to Huang's test it should increase power too? So how does it also catch up in performance per watt? If it was some bandwidth limit SDXE has the advantage there too and it was my understanding CB isn't that memory bandwidth sensitive.

poke01 · Aug 13, 2024

gdansk said:
Yes, that explains matching the final score but according to Huang's test it should increase power too? So how does it also catch up in performance per watt?

Well not in all cases. In blender SMT uses no extra power comsumption. More than Zen5, its AMD's version of SMT that impresses me more.

Edit: added full app chart

FlameTail · Aug 13, 2024

Also another thing to note is that Strix Point has 8 Zen5C cores, and those are more efficient than standard Zen5 cores.

gdansk · Aug 13, 2024

poke01 said:
Well not in all cases. In blender SMT uses no extra power comsumption. More than Zen5, its AMD's version of SMT that impresses me more.

Edit: added full app chart

Hmm, the full chart gives totally different impression than David Huang's test. It includes many tests and in none(?) of them is the power going up 30-40% like Huang measured. I assume that's because with SMT disabled these parts then boost slightly higher to consume the available power? And Huang's tests were at a fixed clock rate.
Also if SMT is enough to transform x64 from wildly inefficient to competitive performance per watt in some workloads then why hasn't ARM pursued it for use in servers?

Secondly (and this may be a joke) why isn't AMD now pursuing SMT4 for Zen 6? Diminishing returns?

poke01 · Aug 13, 2024

gdansk said:
Secondly (and this may be a joke) why isn't AMD now pursuing SMT4 for Zen 6? Diminishing returns?

relying on SMT too much is also a bad thing for gaming.

It’s better for AMD to eke out as much IPC out of Zen as possible and remove bottlenecks from the core.

It’s going to be much harder to increase single core performance without thinking outside the box. Node progress slowed down and ingenious solutions must be sought by chip designers.

FlameTail · Aug 13, 2024

gdansk said:
Also if SMT is enough to transform x64 from wildly inefficient to competitive performance per watt in some workloads then why hasn't ARM pursued it for use in servers?

It's not less of an x86 thing and more of an AMD thing. It's been known for a while that AMD's SMT implementation is better than Intel's.

moinmoin · Aug 13, 2024

gdansk said:
People are buying 24 core 14900K/13900K despite all its faults.

Doesn't seem to be too many people though going by sales charts, even before the current issues.

gdansk said:
That's one thing that has been bothering me.
Qualcomm, for example, shows as much more efficient than HX370 in Cinebench R24 in ST. At ~4.2GHz.
But in MT, the X1E-84 and HX 370 are very close in total score and score per watt when both are limited to the same TDP (at least according to Notebook Check tests).
How does the HX 370 gain so much relative efficiency when they're both the same core count? SMT? David Huang's tests show that power increases nearly in proportion to increased throughput for SMT in a variety of workloads. Shouldn't even more active inefficient x64 decoders (now 24 of them for Strix instead of apparently only 1 in ST) cause HX 370 to become even less efficient relative to SDXE in MT than in ST? But instead it catches up. Sure they are now throttled to a more efficient clock rate but so too is SDXE.

🤔 It just doesn't make sense to me.

Beside SMT another factor to keep in mind is the base power use of the uncore. Traditionally AMD has a pretty bad idle and base power use, but the excellent efficiency of their cores can make up for it. That's what's happening here as well, ST looks bad due to the worse starting point, but in MT the cores' efficiency can catch up, masking the uncore handicap.

Tuna-Fish · Aug 13, 2024

gdansk said:
Yes, that explains matching the final score but according to Huang's test it should increase power too? So how does it also catch up in performance per watt? If it was some bandwidth limit SDXE has the advantage there too and it was my understanding CB isn't that memory bandwidth sensitive.

SMT increases power consumption linearly with increasing performance. Raising clocks raises power consumption to the second or third power with increasing performance. If you get 1.3x perf with 1.3x power from SMT, and then drop clocks (and voltage with them) until your power is back at 1x, your perf will be above 1x.

poke01 said:
Well not in all cases. In blender SMT uses no extra power comsumption.

This looks to me like both cases are up against a fixed power limit and throttling to maintain it.

StefanR5R · Aug 13, 2024

gdansk said:
You know your "just add more nodes" is specious. Adding more nodes doesn't change anything. It just doubles or triples any throughput/$ disadvantage.

Side note: As long as the application scales with node count¹, throughput/$ doesn't increase or decrease with node count, it remains constant.

________
¹) and if it doesn't require increasingly complex cluster interconnect topology

gdansk · Aug 13, 2024

StefanR5R said:
Side note: As long as the application scales with node count¹, throughput/$ doesn't increase or decrease with node count, it remains constant.

________
¹) and if it doesn't require increasingly complex cluster interconnect topology

Yes, exactly. Adding nodes does nothing to help. Adding more nodes with a worse throughput/$ simply wastes more money in total.

And since core spam products usually have good throughput/$ for these workloads they are, in fact, complementary (to node spam).

StefanR5R · Aug 13, 2024

Yes, exactly: It does not double or triple any throughput/$ disadvantage. :-)

gdansk · Aug 13, 2024

StefanR5R said:
Yes, exactly: It does not double or triple any throughput/$ disadvantage. :-)

How do you want me to say it? Instead of being down $200 you're down $400. The amplitude of the waste simply grows larger with more nodes.

It does nothing to help and only makes things worse in an absolute scale. It makes a small difference in throughput/$ into a larger difference in $. It simply multiplies the consequences of selecting a part with less throughput/$.

You clearly understand that more nodes doesn't solve any deficit. It isn't a solution, only a multiplier of AM5's fewer core problem for these type of workloads.

StefanR5R · Aug 13, 2024

At this point I am completely lost as to what people really want. Seemingly not just more cores, but gratis cores.

I thoroughly regret to have engaged in this discussion and apologize for my part in bringing down the S/N ratio of this thread.

inquiss · Aug 13, 2024

StefanR5R said:
At this point I am completely lost as to what people really want. Seemingly not just more cores, but gratis cores.

I thoroughly regret to have engaged in this discussion and apologize for my part in bringing down the S/N ratio of this thread.

You've nailed it. People in this thread seem to want the option to buy higher core count CPUs to use in sockets that will be murdered by low memory bandwidth. And they think AMD should either hamper the cost of mainstream by increasing channels on the mainstream platform or they just want them free and hampered because reasons.

gdansk · Aug 13, 2024

Not at all. I said that you cannot discount AMD making a 8+16 part. Even if memory bandwidth doesn't increase, they did so on Strix Point already. It isn't a fairy tale or Santa's wishlist it is literally one AMD exec wanting to increase their client group ASP by 0.1% away from existing. If Intel provides the motivation... so it may be.

And in Zen 6 it is *inevitable* even if some SKUs launch in AM5 with the same memory bandwidth that the core count increases. 10% IPC generation has to deliver something.

maddie · Aug 13, 2024

gdansk said:
How do you want me to say it? Instead of being down $200 you're down $400. The amplitude of the waste simply grows larger with more nodes.

It does nothing to help and only makes things worse in an absolute scale. It makes a small difference in throughput/$ into a larger difference in $. It simply multiplies the consequences of selecting a part with less throughput/$.

You clearly understand that more nodes doesn't solve any deficit. It isn't a solution, only a multiplier of AM5's fewer core problem for these type of workloads.

Waste? Don't you get double the throughput for double the money?

Fjodor2001 · Aug 13, 2024

maddie said:
Waste? Don't you get double the throughput for double the money?

You have to pay overhead for each additional node. Additional PSU, chassis, motherboard, etc.

Better to have a single node with X cores, than 2 nodes with X/2 cores.

Also, not all workloads even support or are suitable for multiple nodes. So it's DOA for those use cases. Additionally, a lot of people think it's to much of a hassle to bother with multiple nodes. Messier to configure, takes up more space, latency when communicating between nodes etc.

If someone is having use cases where they really want a huge number of cores, then I can understand that multiple nodes could be a good solution. Or going cloud and rent whatever you like. But not if you're looking for a 24/32C type of system (or even 64C).

StefanR5R · Aug 13, 2024

After we concluded that gratis cores must be provided, does it follow that we are entitled to get host consolidation for free too?

Question Zen 6 Speculation Thread

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Attachments

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Elite Member