Question Zen 6 Speculation Thread

naukkis · Jun 14, 2024

Nothingness said:
Thing is I'm not really interested in the experiment, it's just a thought exercise. OTOH I was not the one making the original claim, so I think the burden of proof is on you 😉

I usually know what I talk about and sure I'm tested it. That's about one minute job to confirm yourself how SMT hardware works when running two threads on one core, no need to speculate about things that can be easily verified.

naukkis · Jun 14, 2024

Nothingness said:
And as I read slide, it's almost impossible to say the 15% decrease in power is due only to SMT removal. To have the correct figure you'd need 2 designs both properly tuned for the feature or its absence. Economically this makes no sense. So they have to sell their decision.

Of course designs are simulated, about 100% accuracy nowadays. After Intel results they sure should not brother to implement SMT-version at all. Sure you can speculate that Intel simulation is wrong, but that will see in few years. If they are right AMD has no choice but also get rid of SMT from their big cores meant to compete in ST performance race.

StefanR5R · Jun 14, 2024

naukkis said:
Of course designs are simulated, about 100% accuracy nowadays.

Nice. But the variance between workload scenarios is tremendous. That's magnitudes bigger influence on the outcome if simulators are as accurate as you state.

naukkis said:
After Intel results they sure should not brother to implement SMT-version at all. Sure you can speculate that Intel simulation is wrong, but that will see in few years.

On heterogeneous¹ CPUs, sure. Or they should simply ditch heterogeneous¹ CPUs.
¹) of the "E core spam" fashion, as somebody else called it; not of the low power island fashion

naukkis said:
If they are right AMD has no choice but also get rid of SMT from their big cores meant to compete in ST performance race.

Not as long as AMD pursues their current strategy of either homogeneous CPUs or of mere classic+dense heterogeneity. In the ST performance race you need, for instance, high clocks (that is, you compromise power efficiency), and e.g. large but fast caches (that is, you compromise area). Both should have magnitudes greater impact than SMT logic.

Nothingness · Jun 14, 2024

naukkis said:
Of course designs are simulated, about 100% accuracy nowadays.

My day job is to write these kinds of simulators. "Tweaking" them to study micro architecture ideas is extremely costly. As costly if not more than writing the RTL after the directions are known because you have to test several ideas. And once you're done with that, you can't run large workloads because simulation is slow.

And I have horrible news for you: these simulators never are 100% accurate, quite close, but untested areas are bound to be inaccurate and propagate inaccuracies all over the place.

So no, simulation is not the miracle solution in particular when playing with features that have system wide impact such as SMT. You can write a prototype but as long as you don't spend time tuning it and discussing with RTL engineers you don't know if it makes sense or not and if the projected performance is meaningful in any way. Yes, that's extremely costly.

Of course that doesn't apply when you play with toy uarch as found in low-end CPU where basically nothing happensy. But that's not what we're discussing here.

And then power simulation often is very inaccurate (10% or more is common) and runs so slowly that you can't run long sequences of code. One can extrapolate statistical results but that's it. That gives rough ideas.

I'm not saying Intel didn't do that. I say it's so costly that it wouldn't make sense to make a very thorough pertinent analysis.

SarahKerrigan · Jun 14, 2024

The slides, as is, are sufficiently vague as to be utterly meaningless for drawing any kind of broader conclusions.

They're marketing. They have no details. They have no value.

"Intel put some vague numbers on a marketing slide! SMT IS DOOMED!" is not a position I'm really inclined to take.

Nothingness · Jun 14, 2024

SarahKerrigan said:
The slides, as is, are sufficiently vague as to be utterly meaningless for drawing any kind of broader conclusions.

At least just looking at them makes it unclear whether Intel is claiming they gained 15% of perf/power by just removing SMT. I guess there's a video with that slide in it that could clarify that point.

And while typing and looking again at the slide, I just realized I missed the note "All figures based on hypothetical comparison of HT-capable P-core vs Optimized P-core". That closes the case: they didn't measure anything concretely.

Ha well this derives into Intel talk, I'll stop here.

KompuKare · Jun 14, 2024

SarahKerrigan said:
"...SMT IS DOOMED!" is not a position I'm really inclined to take.

While the hyperbolic marketing slides about SMT are very much from that blue company, I guess the somewhat tangent "SMT is doomed for anyone competing for the ST crown" thinking could be something for Zen6 or Zen7.

Thing is, those proclaiming the latest marketing slides are possibly the same ones who for years were flaunting AVX-512 and similar talking points.

And IMO the main thing which doomed the blue company's AVX-512 über alles strategy was their move to heterogeneity cores.

And heterogeneity cores is probably why they want to drop HT too.

I think it would be madness of AMD to copy either of those ideas.

In fact, that AMD came up with a better implementation of SMT and an implementation of AVX-512 without the huge drawback is rather embarrassing for a certain other company, so let them talk about process leadership really-soon-now™. Being to big to fail, they are getting enough governmental largesse anyhow!

MangoX · Jun 14, 2024

KompuKare said:
While the hyperbolic marketing slides about SMT are very much from that blue company, I guess the somewhat tangent "SMT is doomed for anyone competing for the ST crown" thinking could be something for Zen6 or Zen7.

Thing is, those proclaiming the latest marketing slides are possibly the same ones who for years were flaunting AVX-512 and similar talking points.

And IMO the main thing which doomed the blue company's AVX-512 über alles strategy was their move to heterogeneity cores.

And heterogeneity cores is probably why they want to drop HT too.

I think it would be madness of AMD to copy either of those ideas.

In fact, that AMD came up with a better implementation of SMT and an implementation of AVX-512 without the huge drawback is rather embarrassing for a certain other company, so let them talk about process leadership really-soon-now™. Being to big to fail, they are getting enough governmental largesse anyhow!

Not only that, they also created x64 where intel failed hard with Itanium.

SarahKerrigan · Jun 14, 2024

MangoX said:
Not only that, they also created x64 where intel failed hard with Itanium.

This again?

Itanium was a success at the high end.

adroc_thurston · Jun 14, 2024

SarahKerrigan said:
Itanium was a success at the high end.

Shame Opteron and Hypertransport killed big iron as a concept.
:3

Don't make me post projected_itanium_sales_chart.png

SarahKerrigan · Jun 14, 2024

adroc_thurston said:
Shame Opteron and Hypertransport killed big iron as a concept.
:3

Don't make me post projected_itanium_sales_chart.pmg

Not really tho? If anything, Nehalem-EX did - that marked the point where HP Integrity ceased to have a credible value proposition (and HP knew it.) Even so, Power (and Z, and the NEC systems) continues to march forward, increasingly sad and forlorn.

If Opteron killed anything, it was SPARC, the cheap-and-cheerful RISC for folks who wanted 64-bit addressing, didn't care if it was particularly fast, and didn't want to pay too much for it.

adroc_thurston · Jun 14, 2024

SarahKerrigan said:
Not really tho

Yeah really, 8p (and more with switches) Opterons invaded fat mem RISC lands very successfully.

SarahKerrigan said:
If anything, Nehalem-EX did

That was the final blow, but by then, RISCs were a dead end anyway.

SarahKerrigan · Jun 14, 2024

adroc_thurston said:
Yeah really, 8p (and more with switches) Opterons invaded fat mem RISC lands very successfully.

Never saw one, though largely before my time. At least among the HP and IBM user bases, I heard a general sentiment - not necessarily based in anything - that Opteron did not have the RAS chops to be a credible RISC/UNIX replacement.

I know that HP developed and then canceled before release at least one scale-up Opteron system and it was causing some heartburn in their relationship with Intel.

adroc_thurston said:
That was the final blow, but by then, RISCs were a dead end anyway.

IPF revenue peaked in 2008-2009, well after Opteron's availability. The Great Recession, the Tukwila and Poulson delays, and the DL980 did more to kill IPF than anything I ever saw of Opteron.

adroc_thurston · Jun 14, 2024

SarahKerrigan said:
At least among the HP and IBM user bases, I heard a general sentiment - not necessarily based in anything - that Opteron did not have the RAS chops to be a credible RISC/UNIX replacement.

Nowhere near all scale-up deployements were RAS-driven!
Fat memory pile is what RISCs were also for before Opteron came and commoditized it thru the magick of socket spam.

SarahKerrigan said:
The Great Recession, the Tukwila and Poulson delays, and the DL980 did more to kill IPF than anything I ever saw of Opteron.

Depends on your POV or your segment of the market.

SarahKerrigan · Jun 14, 2024

adroc_thurston said:
Nowhere near all scale-up deployements were RAS-driven!
Fat memory pile is what RISCs were also for before Opteron came and commoditized it thru the magick of socket spam.

Certainly very few of the people buying Superdome2's or large Integrity blades in 2010-2015 were displaying a lot of interest in Numascale. I'd assume it looked different for the SPARC base, which from my understanding was a lot less explicitly mission-critical. (I will note that while SPARC is a truly awful ISA, it also resulted in a lot of strange, interesting, and periodically good microarchitectures. RIP.)

adroc_thurston said:
Depends on your POV or your segment of the market.

Big mission-critical consolidation and scale-up OLTP - which, by the time I became old enough to drink, was 80% of the Integrity userbase. (The other 20 were VMS users running tiny systems, and Nonstop users - mostly telco IME - doing whatever Nonstop users do.)

If you mean "Opteron killed Itanium's prospects as a merchant CPU", I'm in pretty broad agreement with that, but it was also kind of irrelevant past about 2005. IPF had retreated firmly to the mission-critical niche, where it did okay - as long as it was at least a little perf-competitive with other RISC/UNIX.

Thibsie · Jun 15, 2024

What does this have to do with Zen6 ?
Go your own thread. Thanks.

naukkis · Jun 15, 2024

KompuKare said:
In fact, that AMD came up with a better implementation of SMT and an implementation of AVX-512 without the huge drawback is rather embarrassing for a certain other company, so let them talk about process leadership really-soon-now™. Being to big to fail, they are getting enough governmental largesse anyhow!

Actually they just changed their approach, Zen5 is just like Intel's AVX512-designs and Intel choose for their laptop and desktop-systems AMD's pre Zen5 strategy. CPU to trying to utilize AVX512 efficiently also needs SMT as data movement needs are huge and SMT scaling in many workloads could be near 100%. Even SMT4 could be really useful to extract AVX512-performance. But for dekstop-class workloads AVX512 isn't useful - even 256 bit vectors are too big to be really useful and best strategy is to use may 128 bit vector units. Like ARM designs are doing and now Intel with Skymont - resulting straight away better PPC in vector loads even from small core against their big one.

adroc_thurston · Jun 15, 2024

SarahKerrigan said:
I'd assume it looked different for the SPARC base, which from my understanding was a lot less explicitly mission-critical.

Bingo.
And SPARCs drove the bulk of the RISC volume.

SarahKerrigan said:
Big mission-critical consolidation and scale-up OLTP

Yeah that's not my stuff.
Fat DB nodes tho.

Thibsie said:
What does this have to do with Zen6 ?
Go your own thread. Thanks.

It's a tiny, non-violent history lesson for people who weren't there.

StefanR5R · Jun 15, 2024

naukkis said:
But for dekstop-class workloads AVX512 isn't useful - even 256 bit vectors are too big to be really useful and best strategy is to use may 128 bit vector units.

AVX-512 instruction set support and 512 bit vector pipeline width are separate things.

naukkis said:
Actually they just changed their approach, Zen5 is just like Intel's AVX512-designs

Is it though? We don't know a lot about it yet.

However, according to the current state of rumors, 512 bit vector pipeline width is an "option" in the Zen 5 cores generation. Supposedly, both Zen 5 and Zen 5-dense will feature this option (after all, they are both used in server; though a counter argument is that many servers don't need wide vector pipelines either) but the rumored Zen 5LP won't feature 512 bit pipelines but certainly will retain exact ISA parity with Zen 5. Furthermore, rumor has it that Zen 6 desktop client will no longer be server-derived, but derived from Zen 6 mobile client. Will be interesting to see whether or not Zen 6 will have more core variants too in the process.

SarahKerrigan · Jun 15, 2024

Thibsie said:
What does this have to do with Zen6 ?
Go your own thread. Thanks.

Understood. Now returning to your regularly-scheduled twenty-six pages of how two vague Intel marketing slides mean SMT is dead.

FlameTail · Jun 15, 2024

Will V-cache come to Zen 6 APUs?

Not in the form of CPU L3, but as an LLC/SLC/MALL for all the components.

Joe NYC · Jun 15, 2024

FlameTail said:
Will V-cache come to Zen 6 APUs?

Not in the form of CPU L3, but as an LLC/SLC/MALL for all the components.

It is already coming to Strix Halo, and Strix Halo may be the prototype of future APUs.

Kepler_L2 · Jun 15, 2024

Joe NYC said:
It is already coming to Strix Halo, and Strix Halo may be the prototype of future APUs.

SLC yes, V-Cache no.

adroc_thurston · Jun 15, 2024

Kepler_L2 said:
SLC yes, V-Cache no.

Halo CCD size is a wee bit smaller so I wonder if AMD excavated BPVs outta the layout.

Joe NYC · Jun 15, 2024

Kepler_L2 said:
SLC yes, V-Cache no.

So then, only V-Cache for desktop client variants, but probably no SLC.

And SLC for APUs, but no V-Cache?

I still wonder if and how AMD is going to leverage chiplet technology to use less expensive N6 (for all the thing it is good at) in APUs. Strix Halo notably doesn't seem to be using any N6.

Question Zen 6 Speculation Thread

Senior member

Senior member

Elite Member

Diamond Member

Senior member

Diamond Member

Golden Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Elite Member

Senior member

Diamond Member

Platinum Member

Senior member

Diamond Member

Platinum Member