Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Gideon · Oct 31, 2024

SteinFG said:
+200 MHz and +7% to MSRP, sad.
No meaningful frequency boost plus it's increased in price.

According to the video, the base clock is 500Mhz higher and overclocking is enabled. The average gaming uplift was 8% so the MSRP hike (while not fun) makes sense.

And I'm pretty sure there is some headroom (based on the leak) They'll almost certainly clock the 7900X3D and 7950X3D higher, as there would be more reasons to buy them.

I'm not a fan of the price hikes, considering that you could buy a 7800X3D for 380€ (with 20% VAT) in my country, but now it's about 480€. But that's just the way it is with no competition. Gen on gen that hike makes a lot of sense, particularily because of the upped productivity performance as well.

Hans de Vries · Oct 31, 2024

Very nice: The 3D V- cache is now as wide as the CCD and indeed below it!

Hitman928 · Oct 31, 2024

Joe NYC said:
So is the die below? Andreas Schilling thinks so:

https://twitter.com/x/status/1851975685071421659

According to the animation (which may not be technically accurate but usually AMD doesn't miss that badly with this stuff), yes, the cache die is on bottom and all of the signaling and power is coming through the cache chip as well. In the animation, you can see TSVs distributed all throughout the die which is why the cache die is the same size as the CCD now. I thought I was misremembering this fact earlier when discussing this as I couldn't find the evidence again, but it turns out I wasn't, the TSVs seem to be everywhere now. It also shows the CCD is not flip chip.

I mentioned previously if the top die is CCD and not flip chip, this means that AMD figured something out in getting the heat out without a flip chip orientation. Either that or the chip runs cool enough that the increased hotspotting from not being flip chip doesn't limit the chip. Probably a bit of both. I'm thinking this has been born out of research AMD has done in BSPD designs where the top die isn't flip chip and is a good sign that they will be able to transition to BSPD without issue.

Joe NYC · Oct 31, 2024

Hans de Vries said:
Very nice: The 3D V- cache is now as wide as the CCD and indeed below it!

View attachment 110671

This could, in theory, allow Wafer on Wafer packaging, meaning much greater capacity to produce these chips in volume.

Hitman928 · Oct 31, 2024

Someone else pointed out earlier that the Vcache dies are a different stepping than the vanilla dies. I wonder if they did this to change the power and ground routing and they are actually distinct dies, or if they leave both paths on the chip and the additional stepping was to fix something for the Vcache dies but vanilla dies will move to the new stepping as well?

Hans de Vries · Oct 31, 2024

Joe NYC said:
This could, in theory, allow Wafer on Wafer packaging, meaning much greater capacity to produce these chips in volume.

Now wondering if they could possibly have a united 3D V-Cache under both CCD's for the 9950X3D and 9900X3D, with direct L3 connect and bypassing the serial interconnect via the IOD die....

That would resolve a perceived weakness of the current design.

And it would also be an explanation why they did go through this more complex design change, instead of just extra tweaking the old concept for better performance.

sl0519 · Oct 31, 2024

Joe NYC said:
The claim is:
- 8% faster than 7800x3d
- 20% faster than Arrow Lake

Which sounds credible.

HU had 78X3D 20% faster than 285K already, but with a 4090. With those claim, how much of a boost can we expect out of a 4090? Do you guys think they've learned their lessons from the disastrous Zen 5 launch?

*edit: They tested with 4090 according to the end notes. If 78X3D was 20% faster than U9 285K, wouldn't that put it around the same performance as the 98X3D? Sounds very fishy to me!

Hitman928 · Oct 31, 2024

Hans de Vries said:
Now wondering if they could possibly have a united 3D V-Cache under both CCD's for the 9950X3D and 9900X3D, with direct L3 connect and bypassing the serial interconnect via the IOD die....

So making a 16 core CCD basically? I don't think that's possible with the current cores, but it might be possible for a future product. I think it would probably be after a cache hierarchy rework and the bottom cache acts as a SLC though rather than a multi-die CCD.

Joe NYC · Oct 31, 2024

Hans de Vries said:
Now wondering if they could possibly have a united 3D V-Cache under both CCD's for the 9950X3D and 9900X3D, with direct L3 connect and bypassing the serial interconnect via the IOD die....

I think Strix Halo might be a better candidate for that, with Fanout RDL packaging.

Joe NYC · Oct 31, 2024

sl0519 said:
HU had 78X3D 20% faster than 285K already, but with a 4090. With those claim, how much of a boost can we expect out of a 4090? Do you guys think they've learned their lessons from the disastrous Zen 5 launch?

The chart shown did not reflect +8% and +20% average. It had games performing better. But AMD is only claiming +8% and +20%, which seems like a safe claim for gaming.

maddie · Oct 31, 2024

Hans Gruber said:
I heard an 8% performance increase over the 7800x3D. If that is correct, AMD could release a 7800x3D on N4P and it would be equal to the 9800x3D with nice efficiency gains as well. If they throw in the new v-cache stacking used on Zen 5. We could have an old 7800x3D being the king of gaming again. I know that is not what AMD wants but that is what it looks like on paper. N4P is good silicon and gives a nice efficiency gain and performance gains close to the performance difference between the 7800x3D and 9800x3D.

This is straight from TSMC.

N4P offers an 11% performance boost compared to N5 and entered risk production in July 2022. The N4 and N4P nodes are design rules-compatible with 5nm technology for easy design migration. The 5nm (N5) node is optimized for both mobile and High Performance Computing (HPC) applications.

If they throw in the new v-cache stacking used on Zen 5.

This is hilarious.

OneEng2 · Oct 31, 2024

Hans Gruber said:
I heard an 8% performance increase over the 7800x3D. If that is correct, AMD could release a 7800x3D on N4P and it would be equal to the 9800x3D with nice efficiency gains as well. If they throw in the new v-cache stacking used on Zen 5. We could have an old 7800x3D being the king of gaming again. I know that is not what AMD wants but that is what it looks like on paper. N4P is good silicon and gives a nice efficiency gain and performance gains close to the performance difference between the 7800x3D and 9800x3D.

This is straight from TSMC.

N4P offers an 11% performance boost compared to N5 and entered risk production in July 2022. The N4 and N4P nodes are design rules-compatible with 5nm technology for easy design migration. The 5nm (N5) node is optimized for both mobile and High Performance Computing (HPC) applications.

This is non sensical. Zen 5 X3D absolutely has many architectural improvements over Zen 4 X3D. Just shrinking Zen 4 will not give you Zen 5 performance.

gdansk · Oct 31, 2024

OneEng2 said:
This is non sensical. Zen 5 X3D absolutely has many architectural improvements over Zen 4 X3D. Just shrinking Zen 4 will not give you Zen 5 performance.

Yes, TSMC's performance statements aren't regarding designs already at the very edge of their process. Shrink a 5GHz design and you are not getting 5.5GHz. If you look into it TSMC are comparing a ~3GHz test chip.

But it won't stop Hans "It's all process" Gruber.

stayfrosty · Oct 31, 2024

The sram chiplet basically doubled in size (36mm² -> 70mm²). I can't imagine this only being due to more tsv's. I bet there's actually more than 64mb of cache on it if only to keep yields high. Maybe Turin-X will get some sku's with a bit more cache... maybe +96mb instead of +64mb.

Wafer on wafer packaging is probably more economical even with the extra 6nm die space. I can't imagine they would only do this for heat/power reasons...

Hitman928 · Oct 31, 2024

stayfrosty said:
The sram chiplet basically doubled in size (36mm² -> 70mm²). I can't imagine this only being due to more tsv's. I bet there's actually more than 64mb of cache on it if only to keep yields high. Maybe Turin-X will get some sku's with a bit more cache... maybe +96mb instead of +64mb.

Wafer on wafer packaging is probably more economical even with the extra 6nm die space. I can't imagine they would only do this for heat/power reasons...

It's not for heat/power reasons, it's so you can distribute the power and signals through the bottom die to the top. We'll have to wait and see on the L3 amount, but I'm pretty sure we have leaked screenshots and reports that the amount hasn't increased.

StefanR5R · Oct 31, 2024

gdansk said:
If you look into it TSMC are comparing a ~3GHz test chip.

Plus, TSMC's figure of +11% is for core logic performance. Whereas AMD's figure of +8% is for average frame-per-seconds of video game computer system performance.

maddie · Oct 31, 2024

Hitman928 said:
It's not for heat/power reasons, it's so you can distribute the power and signals through the bottom die to the top. We'll have to wait and see on the L3 amount, but I'm pretty sure we have leaked screenshots and reports that the amount hasn't increased.

Surely they have room now to migrate the entire L3 off the core die. Probably hedged this round by seeing if the cache-under layout worked. Expect Zen 6 to remedy this.

inquiss · Oct 31, 2024

igor_kavinski said:
Let's suppose it doesn't help in bandwidth. There's still the possibility of running the CUDIMM at lower latencies at 6400 MT/s, like CL26 or even lower. The stabilized signal integrity of CUDIMM should help there.

Sure, but why invest the resources when you already have the biggest stick you know. Does this really help compared to the engineering effort compared to say, improve the future IOD or work on other products. This X3D chips mitigates the need for faster RAM, it's already the fastest.

igor_kavinski · Oct 31, 2024

inquiss said:
Does this really help compared to the engineering effort compared to say, improve the future IOD or work on other products.

Depends on how much effort it is to enable CUDIMM support. I hope it's not a lot.

igor_kavinski · Oct 31, 2024

Hitman928 said:
So making a 16 core CCD basically? I don't think that's possible with the current cores

Why is this not possible with the current cores? If they use a single large V-cache die for 9900/9950X3D, I suppose it would only take a microcode update or a slightly updated stepping for all the cores to use the unified L3 cache and if that's too complex, how about the V-cache acting as a victim L4 cache for both CCD's L3 caches?

Joe NYC · Oct 31, 2024

Hitman928 said:
It's not for heat/power reasons, it's so you can distribute the power and signals through the bottom die to the top. We'll have to wait and see on the L3 amount, but I'm pretty sure we have leaked screenshots and reports that the amount hasn't increased.

The amount of L3 in 9800x3d is known to be 64MB (in V-Cache), but if there is room for more SRAM, there could possibly be a different model in the future.

The spec is here:

https://www.amd.com/en/products/processors/desktops/ryzen/9000-series/amd-ryzen-7-9800x3d.html

Jan Olšan · Oct 31, 2024

Hans Gruber said:
I heard an 8% performance increase over the 7800x3D. If that is correct, AMD could release a 7800x3D on N4P and it would be equal to the 9800x3D with nice efficiency gains as well. If they throw in the new v-cache stacking used on Zen 5. We could have an old 7800x3D being the king of gaming again. I know that is not what AMD wants but that is what it looks like on paper. N4P is good silicon and gives a nice efficiency gain and performance gains close to the performance difference between the 7800x3D and 9800x3D.

This is straight from TSMC.

N4P offers an 11% performance boost compared to N5 and entered risk production in July 2022. The N4 and N4P nodes are design rules-compatible with 5nm technology for easy design migration. The 5nm (N5) node is optimized for both mobile and High Performance Computing (HPC) applications.

maddie said:
If they throw in the new v-cache stacking used on Zen 5.

This is hilarious.

Process nodes don't work like that. You seem to be thinking that N4P gives you 11% frequenc boost at the very top - that would mean non-X3D Zen 4 being able to run at 6300 MHz (6450 MHz unofficial Fmax) and X3D Zen 4 at 5550 MHz (unofficial Fmax 5600-5650). That's obviously not happening.

The small print on these performance claims is usually that it is for some lower voltage range or most likely, it is possibility to raise frequency at the same/lower voltage * while being iso power *, but that only happens on some lower point of the voltage curve, not at the point where you are maxing the clocks.

If you want an illustration, the claim basically is something like: "Let's say we have chips made from the same IP and we want to run them at 50W power. N4 will allow you to hit 3.6 GHz which needs 1.17 V, in that envelope. N4P can do 4.0 GHz in 50W envelope, and that is thanks to being able to achieve that clock at 1.15 V" [not mentioned: maximum clock for the IP still is 5.7 GHz at 1.45 V for both, or just marginally higher on N4P - here I honestly dunno]. (Made up numbers just for example.)

Hans Gruber said:
with nice efficiency gains as well

Also keep in mind that when TSMC/Intel/Samsung list the benefits of a new node, it is never the performance and lower power consumption as well, it is one or the other. Either you get higher frequency (but not at the maximum end of the curve) at the same power, or you select the same frequency and then you end up with lower power.

tsamolotoff · Oct 31, 2024

igor_kavinski said:
Depends on how much effort it is to enable CUDIMM support. I hope it's not a lot.

Just read what Intel entusiasts have posted with regards to CUDIMM in ARL topic, it's basically useless unless the delta is > 2000 MT/s and you need bandwidth, otherwise unbuffered DIMMs are better. Don't hype yourself needlessly.

Hitman928 · Oct 31, 2024

maddie said:
Surely they have room now to migrate the entire L3 off the core die. Probably hedged this round by seeing if the cache-under layout worked. Expect Zen 6 to remedy this.

They have the room and it's theoretically possible to move it all off die, but they also have vanilla designs that need on-die L3. I think maybe what they will do is have a smaller on-die L3, like they put on the mobile chips, on all of the dies, but then the higher end desktop dies all become V-cache dies. So, rather than like they have now with 3 different levels of L3, where there are mobile (and desktop APU) and desktop dies with different L3s and then desktop with V-cache, you get mobile/desktop with smaller L3 on-die, and then higher end desktop with V-cache. That would allow them to merge all the client designs into essentially V-cache and non-Vcache designs, at least for mainstream and higher markets.

igor_kavinski said:
Why is this not possible with the current cores? If they use a single large V-cache die for 9900/9950X3D, I suppose it would only take a microcode update or a slightly updated stepping for all the cores to use the unified L3 cache and if that's too complex, how about the V-cache acting as a victim L4 cache for both CCD's L3 caches?

Because the cores right now are all CCX=CCD=one "ring" for communication. You can't just hook the rings together through a V-cache and hope for the best. What you said about an L4 is basically what I was saying about the cache in a base tile being used for an SLC which could be used by all the cores and potentially the GPU and an NPU as well.

gaav87 · Oct 31, 2024

sl0519 said:
HU had 78X3D 20% faster than 285K already, but with a 4090. With those claim, how much of a boost can we expect out of a 4090? Do you guys think they've learned their lessons from the disastrous Zen 5 launch?

*edit: They tested with 4090 according to the end notes. If 78X3D was 20% faster than U9 285K, wouldn't that put it around the same performance as the 98X3D? Sounds very fishy to me!

No they tested 7800x3d vs 9800x3d with, a 7900xtx and 9800x3d vs 285k with 4090

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Golden Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Senior member

Member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Senior member

Diamond Member

Junior Member

Diamond Member

Elite Member

Diamond Member

Senior member

Lifer

Lifer

Platinum Member

Senior member

Senior member

Diamond Member

Member