Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

FlameTail · Aug 8, 2024

JustViewing said:
As I said before, we need to wait for APX instruction set implementation before we see huge IPC increase.

It's not a given.

DisEnchantment · Aug 8, 2024

CouncilorIrissa said:
We'll have our preview with STX Halo soon enough I guess.
The uncore is just pure cope at this point.

Discussing the bits and pieces of architectural weaknesses and how to overcome them is not coping. I don't know why this word is very much used in the shillicon twitter universe.

I am saying this based on the visible improvements in SPEC int, fixed clock, when Z4 is equipped with 3D V-Cache . I would think removing the uncore bottlenecks which 3D Vcache attempts to work around would improve the situation, until the next bottle neck at least
Also Z4 in MI300A benefits from the LLC prefetching as per AMD themselves.

CouncilorIrissa · Aug 8, 2024

DisEnchantment said:
Discussing the bits and pieces of architectural weaknesses and how to overcome them is not coping. I don't know why this word is very much used in the shillicon twitter universe.

I am saying this based on the visible improvements in SPEC int, fixed clock, when Z4 is equipped with 3D V-Cache . I would think removing the uncore bottlenecks which 3D Vcache attempts to work around would improve the situation, until the next bottle neck at least
Also Z4 in MI300A benefits from the LLC prefetching as per AMD themselves.

I don't mean that discussing it is cope, it isn't. I meant that the uncore is just poor. It's just downright funny that CCD is unable to use all of the memory bandwidth because of a single GMI3 link.

DisEnchantment · Aug 8, 2024

BTW, there are changes in the prefetch logic across L2 and L3 in Zen 5. Not sure if the motivation is to enable it to work better in different configurations but it might have added some regressions
Could be to handle bigger CCXs sharing L3 and snooping other CCXs

DisEnchantment said:
Additionally, they have prefetching updates for L1/L2 instead of just stream, stride, burst, nextline

[PATCH 1/4] perf vendor events amd: Add Zen 5 core events - Sandipan Das

moinmoin · Aug 8, 2024

DisEnchantment said:
Well, It is not exactly stellar, saying it is mild improvement is being too generous considering the time frame involved.

I am mostly looking at Alexander Yee's blog to make this statement.

Other than AVX512 there is not much improvement

Yee did point out huge improvements in scalar integer though. It's everything in-between that's stagnating.

The bigger issue is that AMD continues to slip on its cadence, it was to be under 18 months. But competition in DC appears not to be strong enough for AMD to keep that up, causing it to lag more and more in mobile (and desktop as far as Apple and ARM can be considered competition there already). Wouldn't matter as much if there were a realistic chance of the cadence catching up, but it seems to get worse instead.

inquiss · Aug 8, 2024

JustViewing said:
So this is another typical AMD launch. Couple users over hype the product. Others fall for this hype. When the product is actually released, everyone feels disappointed. For me the performance meets the expectation from the architectural perspective.
It is well know that it is very difficult to increase integer IPC. The number of general purpose registers is a bottle neck. More read/write ports will help, but it may also increase power usage. As I said before, we need to wait for APX instruction set implementation before we see huge IPC increase.
Having said that, there is still lots of potential still left in AVX. With AVX512 they can probably go over 16 execution units.

My real disappointment is there is no 24/32 core AM5 Zen5 CPU.

How would you feed these cores in AM5?

Hitman928 · Aug 8, 2024

Looking at Phoronix results today, seems like Turin-D is going to be really, really good for cloud customers. It's a shame these kind of improvements didn't translate well to the consumer side (excluding a couple of areas like browsers).

gdansk · Aug 8, 2024

moinmoin said:
Wouldn't matter as much if there were a realistic chance of the cadence catching up, but it seems to get worse instead.

It's about the same time-between-releases as Zen 4. But for this length of time people expect bigger gains (even if the process uplift was less)

DisEnchantment · Aug 8, 2024

moinmoin said:
Yee did point out huge improvements in scalar integer though

But is constrained by memory bandwidth, that is they have throughput as long as there is no data to be fetched from somewhere lower in the memory hierarchy.
But they kept the L2 at 1MiB and kept the L2 to L3 at 32B/cycle. So no respite there too.

moinmoin · Aug 8, 2024

gdansk said:
It's about the same time-between-releases as Zen 4.

Which was known to be delayed to account for CXL. So Zen 5 to spend the same time means it is actually doubly delayed instead catching up with the intended cadence.

DisEnchantment said:
But is constrained by memory bandwidth, that is they have throughput as long as there is no data to be fetched from somewhere lower in the memory hierarchy.
But they kept the L2 at 1MiB and kept the L2 to L3 at 32B/cycle. So no respite there too.

That was to be expected though considering we already knew bigger uncore/IO changes would only happen with Zen 6 going by previous gens.

gdansk · Aug 8, 2024

moinmoin said:
Which was known to be delayed to account for CXL. So Zen 5 to spend the same time means it is actually doubly delayed instead catching up with the intended cadence.

Let's just put it this way. Only once did a Zen land on time and that was Zen 3. And Zen 5 is right on average. If all Zens but one are delayed then well, what's the exception? It isn't Zen 5.

DrMrLordX · Aug 8, 2024

HurleyBird said:
Maybe you need to know your stuff to interpret what any one particular result means, but if you're just looking at the average it really doesn't matter. When it comes to distilling performance to a single number, regression to the mean is real, and sample size is king.

When you have 50 or more individual benchmarks with bizarre performance profiles that don't match anything you do irl, it taints the geomean. Phoronix has been like this for years.

Hitman928 said:
Looking at Phoronix results today, seems like Turin-D is going to be really, really good for cloud customers. It's a shame these kind of improvements didn't translate well to the consumer side (excluding a couple of areas like browsers).

No surprises on server side, but also consider how many office monkeys spend all day running bloated javascript/electron crap. It will be the GOAT for those people. And there are a LOT of them. AMD knew exactly what they were doing. Once again, AMD produces products that are not "for us".

JustViewing · Aug 8, 2024

inquiss said:
How would you feed these cores in AM5?

If DDR4 is enough for 16 cores, I am sure DDR5 with double the bandwidth enough for 32 cores. At least should be enough for 24 cores.

Timmah! · Aug 8, 2024

Tuna-Fish said:
Soo... hype train back on tracks?

Yes, my body is ready again.

inquiss · Aug 8, 2024

For Zen 3 DDR4 4000 was the sweet spot, for zen 5 it's 6000. So that's 50% more bandwidth. The codes are also higher performance so need more memory bandwidth. You can't have more than 26 cores without increasing the memory channels, which taxes everyone on the platform. If you need more cores or bandwidth you go to TR.

del42sa · Aug 8, 2024

coercitiv said:
More good news about the 9600X.

AMD just released their new architecture, ZEN 5%

JustViewing · Aug 8, 2024

inquiss said:
For Zen 3 DDR4 4000 was the sweet spot, for zen 5 it's 6000. So that's 50% more bandwidth. The codes are also higher performance so need more memory bandwidth. You can't have more than 26 cores without increasing the memory channels, which taxes everyone on the platform. If you need more cores or bandwidth you go to TR.

I am using DDR4 3200 with 5950X, so DDR5 6400 should be enough for 24 cores if not 32 Zen 5 cores. Remember L2 was increased in Zen4, so it will relieve some of the memory pressure. Sure more memory bandwidth will help, but what matters is whether 24/32 cores will out perform 16 cores in multi threaded applications. If even with limited bandwidth, a 24 core Zen5 beats 16 Zen5 within same power/bandwidth envelope, it is a win for the user.

Timmah! · Aug 8, 2024

inquiss said:
For Zen 3 DDR4 4000 was the sweet spot, for zen 5 it's 6000. So that's 50% more bandwidth. The codes are also higher performance so need more memory bandwidth. You can't have more than 26 cores without increasing the memory channels, which taxes everyone on the platform. If you need more cores or bandwidth you go to TR.

You know you dont have to buy hypothetical 24 core product, if you feel its too constrained by memory bandwith, right?

inquiss · Aug 8, 2024

JustViewing said:
I am using DDR4 3200 with 5950X, so DDR5 6400 should be enough for 24 cores if not 32 Zen 5 cores. Remember L2 was increased in Zen4, so it will relieve some of the memory pressure. Sure more memory bandwidth will help, but what matters is whether 24/32 cores will out perform 16 cores in multi threaded applications. If even with limited bandwidth, a 24 core Zen5 beats 16 Zen5 within same power/bandwidth envelope, it is a win for the user.

It would be a pointless product that nobody would buy. Partly because no one buys the 16 core chips anyway, but additionally because they would be severely bandwidth constrained. If you've read this thread you can see that the current chips seem bandwidth constrained as they are already.

inquiss · Aug 8, 2024

Timmah! said:
You know you dont have to buy hypothetical 24 core product, if you feel its too constrained by memory bandwith, right?

I wouldn't, and I'm not going to because AMD isn't releasing it.hiw many people are going to be interested in an underperforming 24 or 32 core chips that's so memory contained? Either you want the chip to work (get epyc or thread ripper) or you don't. Not many people are in the "I want to buy a high core count processor and would buy it even if it's memory starved" camp.

igor_kavinski · Aug 8, 2024

AMD could still surprise us with a single CCD 16C/32T Zen5c model with the full width AVX-512! It could turn out to be the most power efficient cinememe CPU ever.

Timmah! · Aug 8, 2024

inquiss said:
I wouldn't, and I'm not going to because AMD isn't releasing it.hiw many people are going to be interested in an underperforming 24 or 32 core chips that's so memory contained? Either you want the chip to work (get epyc or thread ripper) or you don't. Not many people are in the "I want to buy a high core count processor and would buy it even if it's memory starved" camp.

Arent you constrained by memory bandwith only when you saturate entire available memory?

inquiss · Aug 8, 2024

Timmah! said:
Arent you constrained by memory bandwith only when you saturate entire available memory?

No? You're memory bandwidth constrained when you can't get things out of the memory fast enough.

Timmah! · Aug 8, 2024

inquiss said:
No? You're memory bandwidth constrained when you can't get things out of the memory fast enough.

Allright then. There are still tasks, like 3D rendering, that does not benefit from faster RAM significantly, that would immensely benefit from additionaĺ cores. RAM speed perhaps becomes important factor when you run out of it and data needs to be fetched from drive, but thats better to be resolved by more RAM anyway.

CakeMonster · Aug 8, 2024

Back during Z3 I was all aboard the MOAR CORES train, as it seemed even games were rapidly using 50% of threads, suggesting maxing the main cores and reaching into SMT. But the later generations have proved that better cores do make up for a lot of those scenarios, I think we'll be perfectly fine with 16c/32t for the duration of Z6 (up to 44~48 months from now). However, if Z6 release slips, or it does not improve much IPC wise, I could turn out to be wrong. I'm much more worried about the IPC race and cache now after initial Z5 results, like hopefully we'll get 12c and 16c X3D models without the heterogenous cores and the mess that is thread prioritization of those now.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Golden Member

Senior member

Golden Member

Diamond Member

Member

Diamond Member

Platinum Member

Golden Member

Diamond Member

Platinum Member

Lifer

Senior member

Golden Member

Member

Member

Senior member

Golden Member

Member

Member

Lifer

Golden Member

Member

Golden Member

Golden Member