Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 772 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Mahboi

Golden Member
Apr 4, 2024
1,007
1,831
96
AMDs response to this inquiry will be very telling. I doubt they admit what is really going on. Is it a regression that was necessary due to architectural design choices, a result of halting design at a specific point to meet an internal launch date goal, or is it a silicon level bug that might or might not be fixable by new stepping or microcode.

More and more it looks like desktop Zen 5 should have just been delayed, even if it would be a 6 month plus delay, to get this and other performance anomalies ironed out. I cant wait to see the core latencies on Zen 5C 3nm Turin, which is rumored to have the fabled 16 core CCX.
It was delayed. Yes technically AMD never respected the 18 months target, but 22 months is already effectively beyond 3 months late and with the extra time to actually come out, it's closer to 6 months late. From late september 2022 to mid August 2024.
 

inf64

Diamond Member
Mar 11, 2011
3,863
4,540
136
Another thought which someone probably already posted: what if the inter CCD latency increase was just a conscious tradeoff to save power? I doubt they will come out and just admit that, but it's possible this was a design choice to increase the power budget.
 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
I hope not. Any Zen 5 + launch would delay Zen 6 by at least 8 months. I say screw it, just ride it out with better AGESA/driver updates on X3D stack until Zen 6 on a new platform. Admit failure and move on.

If you go through AT articles about TSMC you ll notice that N4P provided only 11% better perf/Watt than N5P wich was used by Zen4, yet AMD managed to increase the perfs by 10% at 5-10% lower power, that s quite a remarkable achievement, now compare N7 to N5P and you ll uderstand why things were so easy for Zen 4 even with an IPC uplift that was smaller, so talking of failure is somewhat a huge stretch.
 

Josh128

Senior member
Oct 14, 2022
296
409
96
Another thought which someone probably already posted: what if the inter CCD latency increase was just a conscious tradeoff to save power? I doubt they will come out and just admit that, but it's possible this was a design choice to increase the power budget.

It was me that posted that idea. And by now, we know its not related to the separate CCDs, its related to the separate CCXs (looking at monolithic Strix). It seems true that Zen 5 cores are super-power hungry and scale perf well past what Zen 4 does, and scale much worse than Zen 4 at very low power. Reducing cross CCX traffic frequency/speed by 2.5X could indeed have a power benefit at the cost of high latency.
 

Hans Gruber

Platinum Member
Dec 23, 2006
2,298
1,212
136
If you go through AT articles about TSMC you ll notice that N4P provided only 11% better perf/Watt than N5P wich was used by Zen4, yet AMD managed to increase the perfs by 10% at 5-10% lower power, that s quite a remarkable achievement, now compare N7 to N5P and you ll uderstand why things were so easy for Zen 4 even with an IPC uplift that was smaller, so talking of failure is somewhat a huge stretch.
Zen 4 was on N5.
 

inf64

Diamond Member
Mar 11, 2011
3,863
4,540
136
If you go through AT articles about TSMC you ll notice that N4P provided only 11% better perf/Watt than N5P wich was used by Zen4, yet AMD managed to increase the perfs by 10% at 5-10% lower power, that s quite a remarkable achievement, now compare N7 to N5P and you ll uderstand why things were so easy for Zen 4 even with an IPC uplift that was smaller, so talking of failure is somewhat a huge stretch.
It is a failure in a broader sense: huge resource investment in the FP/AVX512 while integer was left behind, launch was delayed as they KNEW something is off with the chips/drivers/AGESA/Windows support, very poor marketing and communication from AMD's side (vague materials, no Zen 5 concrete benchmark numbers, deliberately omitting the important comparison points such as 7700X, no feedback from AMD to reviewers about the poor results they got before the NDA, etc.).

This whole launch reminds me of RDNA3 to the T. Starting with misleading/untrue performance claims, to bad drivers and poor communication from AMD. It is very underwhelming.
They should have just waited for X3D and launched the full stack with slides showing the true performance with claims that they focused on power efficiency and gamers have a choice to pick one of the X3D variants. It would have worked out MUCH better than this hot mess.
 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
Zen 4 was on N5.

I linked the article of AT where Lisa Su say that it s N5P.
Dr Su reinforced that technology roadmaps are all about making the right choices and the right junctures, and explicitly stated that our 5nm technology is highly optimized for high-performance computing – it’s not necessarily the same as some other 5nm technologies out there.


It is a failure in a broader sense: huge resource investment in the FP/AVX512 while integer was left behind, launch was delayed as they KNEW something is off with the chips/drivers/AGESA/Windows support, very poor marketing and communication from AMD's side (vague materials, no Zen 5 concrete benchmark numbers, deliberately omitting the important comparison points such as 7700X, no feedback from AMD to reviewers about the poor results they got before the NDA, etc.).

INT has been historically much more difficult to improve than FP, you can check the Cinebench scores of all previous gen from both AMD and Intel and do a comparsion with 7 Zip wich is representative of INT perf, you ll see that when the former could be improved by 60% over three gen when the latter was improved by barely 15% wth the same CPUs.

Foremost Zen 5 is process limited, if they had shrinked the 7950X to N4P they could had either increased the perfs by 5.7% or reduced power by 14%, think about it, the two options being mutualy exclusive.
 
Last edited:
Reactions: lightmanek

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
The problem is that article is from January 2022 and Zen 4 was released in September 2022. The plan may have been for N5P. Ultimately, Zen 4 used N5 silicon. I remember AMD said that N5 was good enough when Zen 4 was released.

In january 2022 they already had Zen 4 QSs on hands, otherwise they couldnt release it in September of the same year.

For a release in september mass production should be launched roughly 4 to 5 months before the first sales, it take two months from waffers entering the fab to get the packaged dies.
 
Reactions: lightmanek

LightningZ71

Golden Member
Mar 10, 2017
1,785
2,139
136
I think that the big bump on Zen6 will be a MALL cache. With the overhaul of the platform and I/O structure and no real progress on DDR throughput, that's the only way forward.
 
Reactions: inf64

Hans Gruber

Platinum Member
Dec 23, 2006
2,298
1,212
136
In january 2022 they already had Zen 4 QSs on hands, otherwise they couldnt release it in September of the same year.

For a release in september mass production should be launched roughly 4 to 5 months before the first sales, it take two months from waffers entering the fab to get the packaged dies.
Zen 5 was supposed to be on TSMC N3 (3nm). The plan changed. There is not a significant difference between N5 and N5P because it's based on the 5nm process that Zen 4 was designed for. There is a huge difference between 5nm and 3nm.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,403
136
Another thought which someone probably already posted: what if the inter CCD latency increase was just a conscious tradeoff to save power? I doubt they will come out and just admit that, but it's possible this was a design choice to increase the power budget.

I've been beating this drum basically from the beginning, lol. All the way back on page 759 I pointed this possibility out and that TPU's testing showed a large reduction in 1T power consumption.

I'm guessing it's related to the new core parking which is turning off the cores on 1 CCD. This has the effect of reducing lightly threaded power consumption, see the improvements shown in TPU's testing. Theoretically it should also help gaming performance by keeping all the game threads on 1 CCD, but I think that has been a mixed bag according to reviewer numbers, but who knows if some reviewers have the driver installed correctly or not.

View attachment 105296

View attachment 105298

It was also postulated in the very AT article that exposed the high latency:

Our current working theory is that this is a side-effect of AMD's core parking changes for Ryzen 9000. That cores are being aggressively put to sleep, and that as a result, it's taking an extra 100ns to wake them up. If that is correct, then our core-to-core latency test is just about the worst case scenario for that strategy, as it's sending data between cores in short bursts, rather than running a sustained workload that keeps the cores alive over the long-haul.

But then multiple people just jumped in that it is clearly a bug and Zen 5 is a dumpster fire, etc., etc. and off the the races we went with that. Now, I'm not saying we know for sure it was a design choice, but it appears that way to me. There is no increased latency from the CCDs to the memory, so it's not a reduction in latency or bandwidth in the connections themselves. Even if it is a weird bug, it appears that this has basically no effect on performance as there doesn't seem to be any actual benchmarks that are affected. Hopefully we get a good explanation of this behavior at some point from AMD, but from what I've seen, this appears to be a deliberate choice to save power, only shows up in a negative way in a purely synthetic test, and was probalby primarily designed for the mobile market (see upcoming STX Halo and Fire Range which are muli-CCD products on mobile and STX which has 2 clusters with independent L3s even though it is monolithic).
 
Jul 27, 2020
19,613
13,481
146
I don't know. We have no idea how aggressively the unused cores are being put to sleep. What kind of core utilization algorithm are they using to decide when to put a mostly under-utilized CCD to sleep? Suppose there are only 17 active threads and one thread has nowhere to go on the active die so it has to go to CCD2. Now how much does the utilization of that thread have to drop (like waiting for something from RAM?) where the CCD is put to sleep? If this is happening often enough, all those increased latency cycles will add up over the duration of whatever the 17th thread is doing. At least AMD should give a BIOS option. People who want this behavior, can turn it on while others can turn it off.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,403
136
In january 2022 they already had Zen 4 QSs on hands, otherwise they couldnt release it in September of the same year.

For a release in september mass production should be launched roughly 4 to 5 months before the first sales, it take two months from waffers entering the fab to get the packaged dies.

2 months is extremely optimistic for a retail production.
 

LightningZ71

Golden Member
Mar 10, 2017
1,785
2,139
136
I'm willing to go out on a limb and suggest that Linux isn't being aggressive as Windows is on power management with the cores and clusters. That difference of 4% is similar to the suggestions that I've seen of core parking having a roughly 3% performance hit.

I still think there's something to the idea that there is a further complication in inter-CCX communications due to the new requirement to support non-symetric CCX layouts.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,403
136
I'm willing to go out on a limb and suggest that Linux isn't being aggressive as Windows is on power management with the cores and clusters. That difference of 4% is similar to the suggestions that I've seen of core parking having a roughly 3% performance hit.

I still think there's something to the idea that there is a further complication in inter-CCX communications due to the new requirement to support non-symetric CCX layouts.

Could be. They changed up the physical structure of the L3 cache, which I think most just assumed was done by the PD team to reduce area, but maybe they made more changes to accommodate the new CCX options. It still seems to be too excessively high to me for that to be the reason (way higher than latency to RAM), but maybe that's part of it.
 

Josh128

Senior member
Oct 14, 2022
296
409
96
But then multiple people just jumped in that it is clearly a bug and Zen 5 is a dumpster fire, etc., etc. and off the the races we went with that. Now, I'm not saying we know for sure it was a design choice, but it appears that way to me. There is no increased latency from the CCDs to the memory, so it's not a reduction in latency or bandwidth in the connections themselves. Even if it is a weird bug, it appears that this has basically no effect on performance as there doesn't seem to be any actual benchmarks that are affected. Hopefully we get a good explanation of this behavior at some point from AMD, but from what I've seen, this appears to be a deliberate choice to save power, only shows up in a negative way in a purely synthetic test, and was probalby primarily designed for the mobile market (see upcoming STX Halo and Fire Range which are muli-CCD products on mobile and STX which has 2 clusters with independent L3s even though it is monolithic).

lol, I never claimed it was a bug, but I did call it a dumpster fire (and I, along with many others still believe it is). Dont conflate the two!

The latencies are specifically CCX, not CCD, related, and it is 99.9% probable that it was a design choice. I disagree with you, however, that it only shows up in a negative way in a purely synthetic test. Its clearly negative to gaming. If that wasnt the case, AMD would not have required the use of the PPM provisioning driver for dual CCD/CCX Zen 5 desktop parts, which is itself a negative requirement, IMO.

 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,065
15,205
136
lol, I never claimed it was a bug, but I did call it a dumpster fire (and I, along with many others still believe it is). Dont conflate the two!

The latencies are definitely CCX, not CCD, related, and it is 99.9% probable that it was a design choice. I disagree with you, however, that it only shows up in a negative way in a purely synthetic test. Its clearly negative to gaming. If that wasnt the case, AMD would not have required the use of the PPM provisioning driver for dual CCD/CCX Zen 5 desktop parts, which is itself a negative requirement, IMO.

By the way, if you use an app that makes use (maybe extensive use) of avx-512, so far the testing done on that chip is showing over 30% improvement over the 7950x. There has already been benchmarks presented here (or in the 9700x review thread) that in a very heavy avx-512 it was getting 98% improvement. So I guess it depends on how extensively it is used. Since Zen 5 is "server first" I would not say its a dumpster fire. Its just not that much better in games or office/utility tasks. In anything scientific, its 10-98% better.

I would not call that a dumpster fire.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |