Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Gideon · Oct 28, 2024

MS_AT said:
It does have the similar latency in cycles, but worse absolute latency [ns].

Yeah, that's true. I wish we had (relatively) apples to apples latency comparison between M4 @ 4.4 Ghz and Zen 5 to see what are the actual latency. The only info that chipsandcheese has up is quite out of date 7950X vs M1 (from here):

Roughly 5.4ns for M1 vs 2.4ns for 7950X. So yeah, a very significant over ~2x difference.

But M1 clocked only up to 3.2 M4 clocks to 4.4 GHz. I'm more than certain Apple relaxed the L2 latency by a few cycles doing that, but I'd still very much like to see where they ended up (and i hope reviewers measure it).

Tuna-Fish said:
The AMD L2 is extremely tight, you are not increasing it's size at all without a latency regression. You are absolutely not sharing it with anything without a latency regression.

That's true, it's not possible without some latency regression. My whole point was that with the ever-more-prevalent 3D cache there is a growing gap between the rather small 1MB L2 and the gigantic 96MB L3

Take techpowerup reviews as an example (as they use the same mobo and ram configs):

For Ryzen 9700X review they registered 7.7ns L3 latency
For Ryzen 7700X it was 9.9ns L3 latency
For Ryzen 7800X3D it was 12.7ns L3 latency for 3x bigger cache

A <30% regression for 3x the size. Looks to be a pretty decent tradeoff (and I expect it to be less for 9800X3D as it clocks higher!).

But then again, the latency gap between L2 and L3 went from 3x to 4x.

As many consumer applications are heavily cache/memory bound, there seems to be performance there, waiting to be extracted.

What options are there to do that?

1. Adding extra cache layers - possible, but numerous other significant drawbacks
2. Upsizing the private L2 to 2MB or 3MB (as Intel did) - this is the easiest solution, but even with "just" 2MB of L2, we use 16MB of the CCD's SRAM budget on L2, while limiting the amount a single thread can use to 2 MB. Going beyond that (3MB for 24MB total) seems insanely wasteful to me.
3. sharing the L2 between cores - a much more complex solution with obvious latency regressions as you stated

Extrapolating what AMD did with L3 it should be possible to go from 1MB to 3MB with a 30% latenchy increase (3ns -> 4ns). Actually i think AMD would do better, as AFAIK going from 512KB to 1MB AMD managed to regress much less than that!

TL;DR: So a private 2MB L2 is indeed the most obvious solution to address this.

It's just that in my La La land, I'd like to see a shared L2 solution where the banks next to the core have almost no latency regression and the ones further away have 20-30% but allow a core to use up to 8MB of L2 instead of "just" 2MB.

The intriguing alternative is to keep the L2 at 2MB on the base SKU and take the "2-3 cycle hit” on 3D cache parts by also double the private L2 on the V-cache die to 4MB (keeping the relative latency between L2 and L3 the same)

LightningZ71 · Oct 28, 2024

If Zen continues to get wider and slower, than I can see them using the reduced clockspeed targets to double the size of the L2 while keeping the same number of cycles of latency. That should help with throughput a bit.

The next iteration of cache die will likely be on TSMC N4C as SRAM scaling falls off a cliff after N5, and won't really recover another increment of shrinking until BSPD and GAA get applied to it, which is going to be a few years after those things are in volume in the core dies as they will be very expensive processes on a per wafer basis. N4C should allow a bit of a shrink, especially if it's a cache targeted chip design, while allowing better performance/power curves. If the CCD stays roughly it's current size, then they could likely fit more cache on the cache die, so a 50-100% increase wouldn't be unreasonable. It's possible that they may decouple it into an L4 cache to allow the first 32MB of L3 to have a lower latency, at the expense of a few extra cycles of RAM latency.

Mopetar · Oct 28, 2024

Joe NYC said:
BTW, if it is true and L3 die is below, then why not make SRAM amount > 64 MB? There would be room for more on the die.

Increase in hit time may not be worth the added capacity for most apps. There are already a lot that don't gain anything from the v-cache and increasing the latency hurts the performance for all of those.

sl0519 said:
So 5.2 max boost is pretty much confirmed. If thermal restraint was lifted, why is boost still .3 Ghz down from non V-cache model??

It may still be voltage constrained, limiting the clock speed.

Another possibility is binning/market segmentation. If you want the faster boost you'll have to shell out for a 9900X3D or a 9950X3D.

Josh128 · Oct 28, 2024

9800X3D Blender Open Data entry. 11% faster than 9700X. OC maybe? Or maybe its due to its 120W TDP vs the original 65W TDP of 9700X. Its massively faster than 7800X3D.

Blender - Open Data

Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.

opendata.blender.org

StefanR5R · Oct 28, 2024

Gideon said:
Roughly 5.4ns for M1 [= 12 MB shared L2$] vs 2.4ns for 7950X [= 1 MB private L2$].

Or 3.8 ns for Telum [= 32 MB private L2$¹]
But that's when neither die area nor power consumption are of immediate concern.²

________
¹) of which parts can be dynamically repurposed into shared virtual L3$ (12 ns on average) or shared virtual L4$ even (which is off-chip cache).
²) almost a square inch of 7nm Samsung silicon for an 8-core chiplet, with 200 W power budget — but this is a real and honest way to obtain a ticket to La La Land. ;-)

Gideon · Oct 28, 2024

LightningZ71 said:
It's possible that they may decouple it into an L4 cache to allow the first 32MB of L3 to have a lower latency, at the expense of a few extra cycles of RAM latency.

In client (where you can probably only afford design with modification for both mobile and desktop) I'd much rather have an SLC instead of L4.

Each layer of cache adds extra complications, more tags to keep track of, etc.

That's one of the reasons Apple and Qualcomm forego L3. Unless you can afford to make the L3 big enough (say 24GB - 32GB+) you might be better off with bigger shared L2 caches and a SLC, that also benefits the GPU, NPU ...

inquiss · Oct 28, 2024

Gideon said:
In client (where you can probably only afford design with modification for both mobile and desktop) I'd much rather have an SLC instead of L4.

Each layer of cache adds extra complications, more tags to keep track of, etc.

That's one of the reasons Apple and Qualcomm forego L3. Unless you can afford to make the L3 big enough (say 24GB - 32GB+) you might be better off with bigger shared L2 caches and a SLC, that also benefits the GPU, NPU ...

That's a giant L3 you're not gonna see in a long time...

igor_kavinski · Oct 28, 2024

inquiss said:
That's a giant L3 you're not gonna see in a long time...

Maybe not a L3 but it could be an L5 made up of really fast highly parallel Optane memory.

inquiss · Oct 28, 2024

igor_kavinski said:
Maybe not a L3 but it could be an L5 made up of really fast highly parallel Optane memory.

Nah

Gideon · Oct 28, 2024

inquiss said:
That's a giant L3 you're not gonna see in a long time...

Yeah, that's why you're not gonna see L3 on qualcomm / apple SoCs.

At least until they are 90% mobile focused. Apple's rumored server SKUs might actually have L3 and it might trickle down to higher end desktop / M Max SKUs

inquiss · Oct 28, 2024

Gideon said:
Yeah, that's why you're not gonna see L3 on qualcomm / apple SoCs.

At least until they are 90% mobile focused. Apple's rumored server SKUs might actually have L3 and it might trickle down to higher end desktop / M Max SKUs

Well, yeah that and how big or high Gigabyte size L3s would be. It's absurd.

yuri69 · Oct 28, 2024

LightningZ71 said:
If Zen continues to get wider and slower, than I can see them using the reduced clockspeed targets to double the size of the L2 while keeping the same number of cycles of latency. That should help with throughput a bit.

From my layman PoV investing into increasing L2 size does not yield much in terms of general-purpose IPC. Zen 4 doubled the Zen 3's relatively small 512kB L2. Yet, it was trailing the rest of "major IPC contributors" with sub-2% IPC points. Intel went the same odd L2-growing route since Willow... 512kB -> 1.25MB -> 2MB -> 2.5/3MB.

Josh128 · Oct 28, 2024

CopeFrameX is saying on Xitter that 9800X3D is looking strong in reviews. "Better than 8% in demanding scenarios".

Y'all dont think AMD will price this thing above $449, do you?

gdansk · Oct 28, 2024

Josh128 said:
CopeFrameX is saying on Xitter that 9800X3D is looking strong in reviews. "Better than 8% in demanding scenarios".

Y'all dont think AMD will price this thing above $449, do you?

It'll be $450 for a few reasons. Competition isn't one of them.

gaav87 · Oct 28, 2024

Steve will run out of chart space for 9800x3d release i think it will look smt like this:

gdansk · Oct 28, 2024

gaav87 said:
Steve will run out of chart space for 9800x3d release i think it will look smt like this:

View attachment 110469

It'll be lower

gaav87 · Oct 28, 2024

gdansk said:
It'll be lower

You will need ultrawide screen to see this chart. Trust me bro.

Timorous · Oct 28, 2024

gaav87 said:
You will need ultrawide screen to see this chart. Trust me bro.

Maybe in ACC or Flight Sim.

Not so sure on average.

gaav87 · Oct 28, 2024

Timorous said:
Maybe in ACC or Flight Sim.

Not so sure on average.

Even 10% is close to off screen on this chart xD

igor_kavinski · Oct 28, 2024

gaav87 said:
Even 10% is close to off screen on this chart xD

Not an issue if you can view the chart in VR. Then just tilt your head a bit to the right and you will see the bar poking out in 3D.

Gideon · Oct 28, 2024

inquiss said:
Well, yeah that and how big or high Gigabyte size L3s would be. It's absurd.

Oof yeah i wrote that without double checking. Obviously I meant 24 - 32+ MB. My bad

Det0x · Oct 28, 2024

Josh128 said:
9800X3D Blender Open Data entry. 11% faster than 9700X. OC maybe? Or maybe its due to its 120W TDP vs the original 65W TDP of 9700X. Its massively faster than 7800X3D.

Blender - Open Data

Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.

opendata.blender.org

View attachment 110463

View attachment 110464

More optimzed V/F curve
X3D have always had a better one, but it have been capped by temps and voltage limits in the past. (look at 7950X3D vs 7950X at lower PPT limits (sub 160w) and compare the efficiency)

Now that Z5X3D are unhindered by temperature and nearly all voltage limits are lifted, the new V/F curve finally get the time to shine... You will see the Z5X3D models beat vanilla Zen5 in pretty much all MT workloads @ stock PPT limits
The only remaining place where regular Zen5 wins is in the light ST workloads since the v-cache cant handle much more than 5.7ghz (when silicon pushed to the limit)

CouncilorIrissa · Oct 28, 2024

Josh128 said:
CopeFrameX is saying on Xitter that 9800X3D is looking strong in reviews. "Better than 8% in demanding scenarios".

Y'all dont think AMD will price this thing above $449, do you?

No. But they won't price it any lower either.

SteinFG · Oct 29, 2024

If it's priced higher than 450 it's getting negative reviews for sure.
7800X3D was selling in large numbers at $350 just a year or so after its launch. I think AMD can just price it at 450 and give smaller discounts later, it's better long-term.
But it's AMD, they like to miss.

Joe NYC · Oct 29, 2024

Josh128 said:
Y'all dont think AMD will price this thing above $449, do you?

If past is a predictor of future, AMD will price it at $499 to get all the hate in the first reviews, and then within 2 months will discount it to $449

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Golden Member

Golden Member

Diamond Member

Senior member

Elite Member

Golden Member

Senior member

Lifer

Senior member

Golden Member

Senior member

Senior member

Senior member

Diamond Member

Member

Diamond Member

Member

Golden Member

Member

Lifer

Golden Member

Golden Member

Senior member

Senior member

Platinum Member