Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 865 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Gideon

Golden Member
Nov 27, 2007
1,842
4,379
136
+200 MHz and +7% to MSRP, sad.
No meaningful frequency boost plus it's increased in price.
According to the video, the base clock is 500Mhz higher and overclocking is enabled. The average gaming uplift was 8% so the MSRP hike (while not fun) makes sense.

And I'm pretty sure there is some headroom (based on the leak) They'll almost certainly clock the 7900X3D and 7950X3D higher, as there would be more reasons to buy them.

I'm not a fan of the price hikes, considering that you could buy a 7800X3D for 380€ (with 20% VAT) in my country, but now it's about 480€. But that's just the way it is with no competition. Gen on gen that hike makes a lot of sense, particularily because of the upped productivity performance as well.
 

Hitman928

Diamond Member
Apr 15, 2012
6,391
11,392
136
So is the die below? Andreas Schilling thinks so:


According to the animation (which may not be technically accurate but usually AMD doesn't miss that badly with this stuff), yes, the cache die is on bottom and all of the signaling and power is coming through the cache chip as well. In the animation, you can see TSVs distributed all throughout the die which is why the cache die is the same size as the CCD now. I thought I was misremembering this fact earlier when discussing this as I couldn't find the evidence again, but it turns out I wasn't, the TSVs seem to be everywhere now. It also shows the CCD is not flip chip.

I mentioned previously if the top die is CCD and not flip chip, this means that AMD figured something out in getting the heat out without a flip chip orientation. Either that or the chip runs cool enough that the increased hotspotting from not being flip chip doesn't limit the chip. Probably a bit of both. I'm thinking this has been born out of research AMD has done in BSPD designs where the top die isn't flip chip and is a good sign that they will be able to transition to BSPD without issue.
 

Hitman928

Diamond Member
Apr 15, 2012
6,391
11,392
136
Someone else pointed out earlier that the Vcache dies are a different stepping than the vanilla dies. I wonder if they did this to change the power and ground routing and they are actually distinct dies, or if they leave both paths on the chip and the additional stepping was to fix something for the Vcache dies but vanilla dies will move to the new stepping as well?
 

Hans de Vries

Senior member
May 2, 2008
340
1,156
136
www.chip-architect.com
This could, in theory, allow Wafer on Wafer packaging, meaning much greater capacity to produce these chips in volume.

Now wondering if they could possibly have a united 3D V-Cache under both CCD's for the 9950X3D and 9900X3D, with direct L3 connect and bypassing the serial interconnect via the IOD die....

That would resolve a perceived weakness of the current design.

And it would also be an explanation why they did go through this more complex design change, instead of just extra tweaking the old concept for better performance.
 
Last edited:

sl0519

Member
Aug 10, 2024
46
128
66
The claim is:
- 8% faster than 7800x3d
- 20% faster than Arrow Lake

Which sounds credible.

HU had 78X3D 20% faster than 285K already, but with a 4090. With those claim, how much of a boost can we expect out of a 4090? Do you guys think they've learned their lessons from the disastrous Zen 5 launch?

*edit: They tested with 4090 according to the end notes. If 78X3D was 20% faster than U9 285K, wouldn't that put it around the same performance as the 98X3D? Sounds very fishy to me!
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,391
11,392
136
Now wondering if they could possibly have a united 3D V-Cache under both CCD's for the 9950X3D and 9900X3D, with direct L3 connect and bypassing the serial interconnect via the IOD die....

So making a 16 core CCD basically? I don't think that's possible with the current cores, but it might be possible for a future product. I think it would probably be after a cache hierarchy rework and the bottom cache acts as a SLC though rather than a multi-die CCD.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,672
3,839
106
HU had 78X3D 20% faster than 285K already, but with a 4090. With those claim, how much of a boost can we expect out of a 4090? Do you guys think they've learned their lessons from the disastrous Zen 5 launch?

The chart shown did not reflect +8% and +20% average. It had games performing better. But AMD is only claiming +8% and +20%, which seems like a safe claim for gaming.
 

maddie

Diamond Member
Jul 18, 2010
4,932
5,075
136
I heard an 8% performance increase over the 7800x3D. If that is correct, AMD could release a 7800x3D on N4P and it would be equal to the 9800x3D with nice efficiency gains as well. If they throw in the new v-cache stacking used on Zen 5. We could have an old 7800x3D being the king of gaming again. I know that is not what AMD wants but that is what it looks like on paper. N4P is good silicon and gives a nice efficiency gain and performance gains close to the performance difference between the 7800x3D and 9800x3D.


This is straight from TSMC.

N4P offers an 11% performance boost compared to N5 and entered risk production in July 2022. The N4 and N4P nodes are design rules-compatible with 5nm technology for easy design migration. The 5nm (N5) node is optimized for both mobile and High Performance Computing (HPC) applications.
If they throw in the new v-cache stacking used on Zen 5.

This is hilarious.
 

OneEng2

Senior member
Sep 19, 2022
259
358
106
I heard an 8% performance increase over the 7800x3D. If that is correct, AMD could release a 7800x3D on N4P and it would be equal to the 9800x3D with nice efficiency gains as well. If they throw in the new v-cache stacking used on Zen 5. We could have an old 7800x3D being the king of gaming again. I know that is not what AMD wants but that is what it looks like on paper. N4P is good silicon and gives a nice efficiency gain and performance gains close to the performance difference between the 7800x3D and 9800x3D.


This is straight from TSMC.

N4P offers an 11% performance boost compared to N5 and entered risk production in July 2022. The N4 and N4P nodes are design rules-compatible with 5nm technology for easy design migration. The 5nm (N5) node is optimized for both mobile and High Performance Computing (HPC) applications.
This is non sensical. Zen 5 X3D absolutely has many architectural improvements over Zen 4 X3D. Just shrinking Zen 4 will not give you Zen 5 performance.
 

gdansk

Diamond Member
Feb 8, 2011
3,276
5,186
136
This is non sensical. Zen 5 X3D absolutely has many architectural improvements over Zen 4 X3D. Just shrinking Zen 4 will not give you Zen 5 performance.
Yes, TSMC's performance statements aren't regarding designs already at the very edge of their process. Shrink a 5GHz design and you are not getting 5.5GHz. If you look into it TSMC are comparing a ~3GHz test chip.

But it won't stop Hans "It's all process" Gruber.
 

stayfrosty

Junior Member
Apr 4, 2024
13
48
51
The sram chiplet basically doubled in size (36mm² -> 70mm²). I can't imagine this only being due to more tsv's. I bet there's actually more than 64mb of cache on it if only to keep yields high. Maybe Turin-X will get some sku's with a bit more cache... maybe +96mb instead of +64mb.

Wafer on wafer packaging is probably more economical even with the extra 6nm die space. I can't imagine they would only do this for heat/power reasons...
 

Hitman928

Diamond Member
Apr 15, 2012
6,391
11,392
136
The sram chiplet basically doubled in size (36mm² -> 70mm²). I can't imagine this only being due to more tsv's. I bet there's actually more than 64mb of cache on it if only to keep yields high. Maybe Turin-X will get some sku's with a bit more cache... maybe +96mb instead of +64mb.

Wafer on wafer packaging is probably more economical even with the extra 6nm die space. I can't imagine they would only do this for heat/power reasons...

It's not for heat/power reasons, it's so you can distribute the power and signals through the bottom die to the top. We'll have to wait and see on the L3 amount, but I'm pretty sure we have leaked screenshots and reports that the amount hasn't increased.
 
Reactions: Joe NYC

maddie

Diamond Member
Jul 18, 2010
4,932
5,075
136
It's not for heat/power reasons, it's so you can distribute the power and signals through the bottom die to the top. We'll have to wait and see on the L3 amount, but I'm pretty sure we have leaked screenshots and reports that the amount hasn't increased.
Surely they have room now to migrate the entire L3 off the core die. Probably hedged this round by seeing if the cache-under layout worked. Expect Zen 6 to remedy this.
 

inquiss

Senior member
Oct 13, 2010
250
354
136
Let's suppose it doesn't help in bandwidth. There's still the possibility of running the CUDIMM at lower latencies at 6400 MT/s, like CL26 or even lower. The stabilized signal integrity of CUDIMM should help there.
Sure, but why invest the resources when you already have the biggest stick you know. Does this really help compared to the engineering effort compared to say, improve the future IOD or work on other products. This X3D chips mitigates the need for faster RAM, it's already the fastest.
 
Reactions: Joe NYC
Jul 27, 2020
20,917
14,491
146
So making a 16 core CCD basically? I don't think that's possible with the current cores
Why is this not possible with the current cores? If they use a single large V-cache die for 9900/9950X3D, I suppose it would only take a microcode update or a slightly updated stepping for all the cores to use the unified L3 cache and if that's too complex, how about the V-cache acting as a victim L4 cache for both CCD's L3 caches?
 

Joe NYC

Platinum Member
Jun 26, 2021
2,672
3,839
106
It's not for heat/power reasons, it's so you can distribute the power and signals through the bottom die to the top. We'll have to wait and see on the L3 amount, but I'm pretty sure we have leaked screenshots and reports that the amount hasn't increased.

The amount of L3 in 9800x3d is known to be 64MB (in V-Cache), but if there is room for more SRAM, there could possibly be a different model in the future.

The spec is here:

 

Jan Olšan

Senior member
Jan 12, 2017
427
776
136
I heard an 8% performance increase over the 7800x3D. If that is correct, AMD could release a 7800x3D on N4P and it would be equal to the 9800x3D with nice efficiency gains as well. If they throw in the new v-cache stacking used on Zen 5. We could have an old 7800x3D being the king of gaming again. I know that is not what AMD wants but that is what it looks like on paper. N4P is good silicon and gives a nice efficiency gain and performance gains close to the performance difference between the 7800x3D and 9800x3D.


This is straight from TSMC.

N4P offers an 11% performance boost compared to N5 and entered risk production in July 2022. The N4 and N4P nodes are design rules-compatible with 5nm technology for easy design migration. The 5nm (N5) node is optimized for both mobile and High Performance Computing (HPC) applications.

If they throw in the new v-cache stacking used on Zen 5.

This is hilarious.
Process nodes don't work like that. You seem to be thinking that N4P gives you 11% frequenc boost at the very top - that would mean non-X3D Zen 4 being able to run at 6300 MHz (6450 MHz unofficial Fmax) and X3D Zen 4 at 5550 MHz (unofficial Fmax 5600-5650). That's obviously not happening.

The small print on these performance claims is usually that it is for some lower voltage range or most likely, it is possibility to raise frequency at the same/lower voltage * while being iso power *, but that only happens on some lower point of the voltage curve, not at the point where you are maxing the clocks.

If you want an illustration, the claim basically is something like: "Let's say we have chips made from the same IP and we want to run them at 50W power. N4 will allow you to hit 3.6 GHz which needs 1.17 V, in that envelope. N4P can do 4.0 GHz in 50W envelope, and that is thanks to being able to achieve that clock at 1.15 V" [not mentioned: maximum clock for the IP still is 5.7 GHz at 1.45 V for both, or just marginally higher on N4P - here I honestly dunno]. (Made up numbers just for example.)

with nice efficiency gains as well
Also keep in mind that when TSMC/Intel/Samsung list the benefits of a new node, it is never the performance and lower power consumption as well, it is one or the other. Either you get higher frequency (but not at the maximum end of the curve) at the same power, or you select the same frequency and then you end up with lower power.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,391
11,392
136
Surely they have room now to migrate the entire L3 off the core die. Probably hedged this round by seeing if the cache-under layout worked. Expect Zen 6 to remedy this.

They have the room and it's theoretically possible to move it all off die, but they also have vanilla designs that need on-die L3. I think maybe what they will do is have a smaller on-die L3, like they put on the mobile chips, on all of the dies, but then the higher end desktop dies all become V-cache dies. So, rather than like they have now with 3 different levels of L3, where there are mobile (and desktop APU) and desktop dies with different L3s and then desktop with V-cache, you get mobile/desktop with smaller L3 on-die, and then higher end desktop with V-cache. That would allow them to merge all the client designs into essentially V-cache and non-Vcache designs, at least for mainstream and higher markets.

Why is this not possible with the current cores? If they use a single large V-cache die for 9900/9950X3D, I suppose it would only take a microcode update or a slightly updated stepping for all the cores to use the unified L3 cache and if that's too complex, how about the V-cache acting as a victim L4 cache for both CCD's L3 caches?

Because the cores right now are all CCX=CCD=one "ring" for communication. You can't just hook the rings together through a V-cache and hope for the best. What you said about an L4 is basically what I was saying about the cache in a base tile being used for an SLC which could be used by all the cores and potentially the GPU and an NPU as well.
 

gaav87

Member
Apr 27, 2024
180
380
96
HU had 78X3D 20% faster than 285K already, but with a 4090. With those claim, how much of a boost can we expect out of a 4090? Do you guys think they've learned their lessons from the disastrous Zen 5 launch?

*edit: They tested with 4090 according to the end notes. If 78X3D was 20% faster than U9 285K, wouldn't that put it around the same performance as the 98X3D? Sounds very fishy to me!
No they tested 7800x3d vs 9800x3d with, a 7900xtx and 9800x3d vs 285k with 4090
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |