Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 770 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hans Gruber

Platinum Member
Dec 23, 2006
2,298
1,212
136
AMDs response to this inquiry will be very telling. I doubt they admit what is really going on. Is it a regression that was necessary due to architectural design choices, a result of halting design at a specific point to meet an internal launch date goal, or is it a silicon level bug that might or might not be fixable by new stepping or microcode.

More and more it looks like desktop Zen 5 should have just been delayed, even if it would be a 6 month plus delay, to get this and other performance anomalies ironed out. I cant wait to see the core latencies on Zen 5C 3nm Turin, which is rumored to have the fabled 16 core CCX.
I found some leaks of Arrow Lake 5 series CPU geekbench results. Zen 5 beat Intel Arrow Lake significantly in single thread and got smoked in multi thread results. I can almost guarantee you 100% that AMD's problem is in their AGESA bios. I will not even hazard a guess as to what the real Zen 5 performance should/will be when they sort out their mess.

It took AMD 6 months or more to get Zen 4 ironed out. Out of the box Zen 4 was pretty good to begin with. With all the testing anomalies reviewers ran into. It seems obvious that AMD released a completely flawed AGESA bios with Zen 5.

AMD really needs to fix their DDR5 bandwidth limits. They should have upgrade their infinity fabric controller with Zen 4 and certainly by Zen 5. That is holding back significant performance gains faster ram would provide.
 

tsamolotoff

Member
May 19, 2019
177
306
136
AMD really needs to fix their DDR5 bandwidth limits. They should have upgrade their infinity fabric controller with Zen 4 and certainly by Zen 5. That is holding back significant performance gains faster ram would provide.
While I completely agree, the issue here isn't fabric or IMC, it's something that happens when one CCD accesses the nonlocal L3 cache of the second CCD. As Ryan noted, it's slower than memory access and on par with typical access time of non-local memory of a 2S/4S/8S system, which should never happen considering this is the same IO die as AMD had with Zen4 (and as we see, this 'far cache' access time was roughly the same for all CPUs since Zen2
 

marees

Senior member
Apr 28, 2024
374
436
96
While I completely agree, the issue here isn't fabric or IMC, it's something that happens when one CCD accesses the nonlocal L3 cache of the second CCD. As Ryan noted, it's slower than memory access and on par with typical access time of non-local memory of a 2S/4S/8S system, which should never happen considering this is the same IO die as AMD had with Zen4 (and as we see, this 'far cache' access time was roughly the same for all CPUs since Zen2
Wouldn't AMD have caught this in their internal testing ?
 

Hans Gruber

Platinum Member
Dec 23, 2006
2,298
1,212
136
While I completely agree, the issue here isn't fabric or IMC, it's something that happens when one CCD accesses the nonlocal L3 cache of the second CCD. As Ryan noted, it's slower than memory access and on par with typical access time of non-local memory of a 2S/4S/8S system, which should never happen considering this is the same IO die as AMD had with Zen4 (and as we see, this 'far cache' access time was roughly the same for all CPUs since Zen2
From my understanding Zen 4 CPU's are only stable up to DDR5 6000mhz in some cases 6400mhz. With Zen 5 it's more of the same. Newegg has memory kits that go up to 8400mhz. I am adding the DDR5 memory issues with Ryzen to the Zen 5 problem. I figured they would have solved that problem. Lisa said that Zen 4 was going to be a memory OCer's dream. They said the sweet spot was 6400mhz. All the ryzen builders are focusing on DDR5 6000mhz sticks with Cas 30 timings or better.

There is no advancement in DDR5 memory support with Zen 5. The memory market went from DDR5 7200mhz kits during Zen 4 early days to now 8400mhz kits. Zen 5 cannot even support DDR5 7000mhz or better. There should be no limitation on memory speeds in a Zen 5 system or a Zen 4 system. The sky should be the limit.
 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
Zen 5 cannot even support DDR5 7000mhz or better. There should be no limitation on memory speeds in a Zen 5 system or a Zen 4 system. The sky should be the limit.

In their Zen 5 specific review for games Computerbase use up to 8000MT/s RAM and they say that it run smoothly at this frequency, and that s with a 2 x 24GB kit, even the 7950X3D get up to 7200.

 

Attachments

  • Screenshot 2024-08-16 at 10-16-04 Ryzen 9 9950X & 9900X Gaming-Benchmarks Speicherskalierung m...png
    71.8 KB · Views: 29

moinmoin

Diamond Member
Jun 1, 2017
5,063
8,025
136
Maybe Linux slants more towards the performance side of power management (more aggressive with clock ramping, which would help Geekbench) and Windows slants more towards saving power.
First of all this is highly configurable, both under Windows and under Linux. But there well may be a slant in that while the Windows scheduler is about the worst one out there, energy saving features are usually supported earlier under Windows than under Linux. But I doubt Ryzen 9000's performance oddities can be ascribed to (correctly working) energy saving features.

If benchmarks were to include joules used one could see if power usage differs along the performance, or whether something else must be amiss.
 

Hans Gruber

Platinum Member
Dec 23, 2006
2,298
1,212
136
In their Zen 5 specific review for games Computerbase use up to 8000MT/s RAM and they say that it run smoothly at this frequency, and that s with a 2 x 24GB kit, even the 7950X3D get up to 7200.

Hopefully all of the current Zen 5 issues and the DDR5 memory issues will be worked out in the upcoming AGESA bios updates.
 

tsamolotoff

Member
May 19, 2019
177
306
136
All the ryzen builders are focusing on DDR5 6000mhz sticks with Cas 30 timings or better.
Not all, and not really. Also, you can't really compare 1:1 amd mode to anything intel does, as Intel always has 1:2 mode enabled. And in 1:2 mode, it's not hard to get to 7800-8000, sometimes even with 2DPC board. In any case, it's not relevant to this interchiplet communication issue, slower or faster RAM shouldn't affect this at all, as far as I can understand.

In any case, my current setup (cheap $110 24 gbit chinese sticks), works just fine in memory-intensive calculations (actual speedup as compared to 6400 1:1 mode with same timings in terms of nanoseconds).

 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
Hopefully all of the current Zen 5 issues and the DDR5 memory issues will be worked out in the upcoming AGESA bios updates.
For the time there s no memory issues since max RAM frequency has been extended significantly, the only issue so far is the inter CCDs latency that is unexpected and still not explicated.


If benchmarks were to include joules used one could see if power usage differs along the performance, or whether something else must be amiss.

Looking at Computebase numbers AMD made a very good job on the efficency front and comparisons with the 7000 series are pointless since that s not their competition.

At 142W PPT the 9700X is 7% faster than the 14600K that pull 191W, actually it would require something like 112W to just match the Intel chip, that s 70% better perf/watt at isoperf .

The 9900X at 162W is just 3% below the 14700K that pull 266W, so 180W would be enough to match the Intel opponent and that would amount to 47% better perf/watt at isoperf.

Last but not least at 200W the 9950X is 10% faster than the 14900K@275W and 150W are enough to match Intel s top contender, wich would be a massive 83% better perf/watt at isoperf.
 

MS_AT

Senior member
Jul 15, 2024
210
507
96
In any case, it's not relevant to this interchiplet communication issue, slower or faster RAM shouldn't affect this at all, as far as I can understand.
Actually, somebody with Zen5 2ccd chip could try to vary RAM timings keeping IF frequency constant, and run core to core latency test. If the curve wouldn't be flat that would mean CCDs are synced through RAM, what would be surprising to say the least
 

sl0519

Junior Member
Aug 10, 2024
20
51
46
This is the ARL preliminary benchmarks, from Jaykihn at Xwitter.
https://x.com/jaykihn0



As a comparison, I pulled Uniko's testing of the recent 9950X.



Looking at GB 5.4.5, the QS scored 2455, which trails 9950X by around 5%. Multicore however the QS is winning 9950X by almost 8%, but also keep in mind that GB doesn't scale well with more cores (correct me if I'm wrong). It's impressive if we consider the fact the score is obtained without HT, but for people looking for a generational improvement, this is exactly Zen 5% lol.
 

Jan Olšan

Senior member
Jan 12, 2017
400
689
136
First of all this is highly configurable, both under Windows and under Linux. But there well may be a slant in that while the Windows scheduler is about the worst one out there, energy saving features are usually supported earlier under Windows than under Linux. But I doubt Ryzen 9000's performance oddities can be ascribed to (correctly working) energy saving features.

If benchmarks were to include joules used one could see if power usage differs along the performance, or whether something else must be amiss.
It could also be the scheduler never giving a single process maximum cycles available, so that QoS is better for multitasking/GUI responsiveness.

Also Windows probably does more work in background compared to a Linux system and I think that is a factor in lots of the "game X runs faster on Linux OMG" articles. IMHO these differences should not be automatically ascribed to "bad scheduler".

I used to hear the same about memory management. Then I read a writeup by somebody who actually knew how memory management works in Windows and Linux, the strengths/weaknesses and it was not a simple story, at all.

Internet randos also routinely spit on NTFS but it's a FS that is so ridiculously reliable IMHO (while I recall data-endangering issues being discussed for almost any Linux FS.. except JFS which went out of fashion by the time I started following this.)
 
Last edited:

Det0x

Golden Member
Sep 11, 2014
1,232
3,883
136
In their Zen 5 specific review for games Computerbase use up to 8000MT/s RAM and they say that it run smoothly at this frequency, and that s with a 2 x 24GB kit, even the 7950X3D get up to 7200.

Seems like mainstream reviewers have finally started to run higherspeed that "we overclockers" have ran for like 1.5 years already 👍
Just too bad its slow expo timings atm
 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136

Why, are the 7000 series manufactured and sold by Intel.?.

Seems like mainstream reviewers have finally started to run higherspeed that "we overclockers" have ran for like 1.5 years already 👍
Just too bad its slow expo timings atm
That s the timings he apparently used :
DDR5-8000 und den Timings 38-48-48-98

He also said that the 9600X/9700X/9950X worked flawlessly but not the 9900X, also he had no time to tweak the timings since he was hard pressed by the short time available before releasing the review, hence he used only out of the box timings and added that the 9900X could eventually work as well at 8000MT/s with some tweaks.
 

Det0x

Golden Member
Sep 11, 2014
1,232
3,883
136
Why, are the 7000 series manufactured and sold by Intel.?.


That s the timings he apparently used :


He also said that the 9600X/9700X/9950X worked flawlessly but not the 9900X, also he had no time to tweak the timings since he was hard pressed by the short time available before releasing the review, hence he used only out of the box timings and added that the 9900X could eventually work as well with some tweaks.
tFRC+tREFI is the timing that will give the highest performance gain, sub-timings matter also! ;-)
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
tFRC is the timing that will give the highest performance gain, sub-timings matter also! ;-)
I'd say that RFC, REFI, RRDL/RRDS/FAW and SCLs are more important for gaming performance than primaries

If the reviewers keep the chips for some time we ll surely have revised reviews for the tests with ocked RAM, the current reviews were rushed since at Computerbase the CPU perfs reviewer, Volker Rißka, said on their forum that he got only 3 days to make all the perfs measurements, guess that it was no different for other outlets.
 

majord

Senior member
Jul 26, 2015
491
622
136
This is the ARL preliminary benchmarks, from Jaykihn at Xwitter.
https://x.com/jaykihn0



As a comparison, I pulled Uniko's testing of the recent 9950X.



Looking at GB 5.4.5, the QS scored 2455, which trails 9950X by around 5%. Multicore however the QS is winning 9950X by almost 8%, but also keep in mind that GB doesn't scale well with more cores (correct me if I'm wrong). It's impressive if we consider the fact the score is obtained without HT, but for people looking for a generational improvement, this is exactly Zen 5% lol.
actually by the numbers you posted there its zen 13%
 

Det0x

Golden Member
Sep 11, 2014
1,232
3,883
136
If the reviewers keep the chips for some time we ll surely have revised reviews for the tests with ocked RAM, the current reviews were rushed since at Computerbase the CPU perfs reviewer, Volker Rißka, said on their forum that he got only 3 days to make all the perfs measurements, guess that it was no different for other outlets.
Yeah its kinda strange i got 2 months to play with my sample before release, and reviewers got ~1week 🤔
 

Josh128

Senior member
Oct 14, 2022
296
410
96
Hmm.. has nobody else yet made the correlation between the horrible cross CCX latency on Strix with the horrible cross CCD/CCX latency on Granite Ridge? They are very similar, with GNR having some worse outliers. I think this puts to bed it has anything to do with AGESA or packaging. This is a conscious design decision.

Crazy thought, but what if it has something to do with power efficiency management? Lower speed cross CCX comms could possibly mean lower power consumption? I think if thats the case it seems like a ridiculous choice, but if mobile was the priority, it might be. Who knows?



 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,892
8,764
136
From what we have so far, the driver does not do anything for parking the cores, that behavior is controlled by the CPU/firmware itself. The driver is there to prevent thread migration from the active CCD to the inactive CCD unnecessarily.
Isn't "core parking" an interaction between Windows process scheduler, power management driver, and CPU? (That is, the process scheduler avoids putting tasks onto a subset of logical CPUs, idle CPUs go into low power state. The latter was the whole reason why "core parking" was invented back in the day; now it its used/ abused for quite different purposes, as processors have become increasingly complex, making it ever more difficult for OSs' process schedulers and power managers to make good decisions.)

The 7950x wouldn't park cores on the 2nd CCD.
Although maybe it could if one wanted to(?).

Benefits of spreading a lowly…medium parallel workload onto both CCXs of dual-CCX Ryzens:
  • More cache is available.
  • More GMI link width is available.
Benefits of concentrating a lowly…medium parallel workload onto one of the CCXs of dual-CCX Ryzens:
  • Threads which happen to interact share cache lines.
The latter is important in such cases as — for example — increasing lo-res/low-detail video game FPS from 450 to 600, or to efficiently run multithreaded FFTs to search for large primes.

What I would like is AMD to come out and explain how they got the application Performance numbers they showed in the charts (which is not reproducible by anyone else right now). There is no way they achieved those numbers and everyone else is doing something wrong.
First of all, it was borderline impossible to reproduce any of this for launch day reviews: In the tight time frame from sample reception until end of the review embargo, reviewers must have had their hands full with going through multiple BIOS updates and Windows reinstallations, perhaps going through questions and answers with AMD or other reviewers because they have a hard time to make things work, and getting as much of their own planned set of benchmarks as possible done and written up.

However, now that the rush to publish is over, somebody could take the sparse details which are given in the infamous endnotes GNR-01…GNR-04, tell themselves "imagine you had to produce some bar charts for a presentation which your very CEO is going to give at Computex", and then see if they could achieve what AMD subtitled with "* all results are 'up to'". There is certainly quite some room to optimize here and un-optimize there; it's mostly a question of whether you are more afraid of getting fired in May because the numbers aren't looking good enough, or of getting fired in August because 3rd parties won't match these numbers on different setups.

AMDs response to this inquiry will be very telling. I doubt they admit what is really going on. Is it a regression that was necessary due to architectural design choices, a result of halting design at a specific point to meet an internal launch date goal, or is it a silicon level bug that might or might not be fixable by new stepping or microcode.
One thing which they appear to have done in Zen 5 relative to Zen 3/4, according to Granite Ridge die shots, is to implement the L3 cache within a considerably reduced area. Right now I can't see though how this could affect penalties to cross-CCX traffic.

The other obvious change in Zen 5 is that they co-designed for 8c and 16c CCXs. But again it is not obvious to me how this pertains to traffic outside a CCX.

Third, Turin has got a new IOD, which notably has to support a) more GMI links than Genoa's IOD, b) up to 16 rather than 8 cores per CCX. Maybe this had repercussions on what they did with the CCD's GMI links or on how L3 tags are managed, all the while as they insisted to not update the client IOD alongside.

Wouldn't AMD have caught this in their internal testing ?
Likely they have; the question is what they determined is the impact on real workloads of this.
 
Last edited:

Hans Gruber

Platinum Member
Dec 23, 2006
2,298
1,212
136
Isn't "core parking" an interaction between Windows process scheduler, power management driver, and CPU? (That is, the process scheduler avoids putting tasks onto a subset of logical CPUs, idle CPUs go into low power state. The latter was the whole reason why "core parking" was invented back in the day; now it its used/ abused for quite different purposes, as processors have become increasingly complex, making it ever more difficult for OSs' process schedulers and power managers make good decisions.)


Although maybe it could if one wanted to(?).

Benefits of spreading a lowly…medium parallel workload onto both CCXs of dual-CCX Ryzens:
  • More cache is available.
  • More GMI link width is available.
Benefits of concentrating a lowly…medium parallel workload onto one of the CCXs of dual-CCX Ryzens:
  • Threads which happen to interact share cache lines.
The latter is important in such cases as — for example — increasing lo-res/low-detail video game FPS from 450 to 600, or to efficiently run multithreaded FFTs to search for large primes.


First of all, it was borderline impossible to reproduce any of this for launch day reviews: In the tight time frame from sample reception until end of the review embargo, reviewers must have had their hands full with going through multiple BIOS updates and Windows reinstallations, perhaps going through questions and answers with AMD or other reviewers because they have a hard time to make things work, and getting as much of their own planned set of benchmarks as possible done and written up.

However, now that the rush to publish is over, somebody could take the sparse details which are given in the infamous endnotes GNR-01…GNR-04, tell themselves "imagine you had to produce some bar charts for a presentation which your very CEO is going to give at Computex", and then see if they could achieve what AMD subtitled with "* all results are 'up to'". There is certainly quite some room to optimize here and un-optimize there; it's mostly a question of whether you are more afraid of getting fired in May because the numbers aren't looking good enough, or of getting fired in August because 3rd parties won't match these numbers on different setups.



One thing which they appear to have done in Zen 5 relative to Zen 3/4, according to Granite Ridge die shots, is to implement the L3 cache within a considerably reduced area. Right now I can't see though how this could affect penalties to cross-CCX traffic.

The other obvious change in Zen 5 is that they co-designed for 8c and 16c CCXs. But again it is not obvious to me how this pertains to traffic outside a CCX.

Third, Turin has got a new IOD, which notably has to support a) more GMI links than Genoa's IOD, b) up to 16 rather than 8 cores per CCX. Maybe this had repercussions on what they did with the CCD's GMI links or on how L3 tags are managed, all the while as they insisted to not update the client IOD alongside.


Likely they have; the question is what they determined is the impact on real workloads of this.
So what you are saying in a nutshell. A lot of people are getting fired at AMD for Zen 5 in September.
 
Reactions: igor_kavinski
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |