Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 951 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

inquiss

Senior member
Oct 13, 2010
352
527
136
It's very marginally faster (frequency wise) in Zen 5 - and no, that's not better because large discrepancy in cache size makes those cores not the same, therefore presenting problem for scheduler.

I am arguing for two chiplets with 3D cache rates to same frequency, as exactly the same as possible to reduce scheduling issues, plus NUMA mode should be activatable in BIOS so that OS knows it's different cache domains.
It doesn't solve any scheduling issues. If you want something with a large cache, pin it to the same CCD. AMD does this is software now.

You still don't want cross CCD talk, ideally, whether they both have cache, both don't have cache or are asymmetric. What you want is games to pin to one CCD. And then move to the other if threads are full. What games need more than 8 cores. Even with two vcache ccds you'd pin the game to the first 8 cores. You're in the same position.
 

gdansk

Diamond Member
Feb 8, 2011
4,030
6,638
136
I also doubt whether AMD would throw two good bin X3D CCD in the same part. You may get one mid X3D CCD mixed with a nice one like the the 9950X3D has. You're not getting uniformity unless AMD gives them an Epyc haircut.

Maybe still desirable to some.
 
Jul 27, 2020
23,517
16,526
146
The problem is they are NOT even trying. They could release a Ryzen PRO version that's priced maybe $250 higher. OK, maybe that's not enough for them. How about $500 higher? Let people discover what workloads are accelerated with V-cache on both CCDs. We have seen from both AMD and Intel that they are clueless about what people actually use for their workflow. AMD included Geekbench 5 in their Zen 5 slides and Intel keeps presenting Cinebench scores. In reality, there could be hundreds of applications benefitting from V-cache which AMD can't test unless they create a whole new division called Special V-cache Software Testing and Benchmarking Division.
 

StefanR5R

Elite Member
Dec 10, 2016
6,341
9,760
136
Sorry I thought we were discussing why there is no client halo part with 2 x3d chiplets.
Perhaps; I didn't pay close attention to all this circular discussion of dual V-cache. My own comments were on homogeneous vs. heterogeneous CPUs. And I let myself get pulled into this only because I too easily get irritated by apologism of heterogeneous CPUs.

vcache on both chiplets does not help with the scheduler issues!
Since with Zen5 the freq diff is small enough then x3d chiplet is the usually the better choice.
It is not this easy.
I assume the owner bought a 16c/32t CPU in order to frequently use 16c/32t. (It's a 2x 8c/16t CPU actually but limiting the thread pool size per computational subtask is something which operators have been doing for ages now; not just because machine topology may be asking for it but also plainly because of Amdahl's law.)
– Now what do you do when you have a homogeneous workload, CFD for instance? You want a homogeneous CPU.
– Or what do you do when you have a heterogeneous workload? Either you profile it sufficiently and give the necessary hints to the OS how to fit it onto the heterogeneous machine. Or you simply take a homogeneous machine if you can get your hands on one.

The problem is that the round-trip from one CCD to another takes forever, not just that only one of the CCDs has vcache. If they both had vcache, you still would take horrible penalty if game threads got scheduled across the split.
"The problem" which you are stating only exists if threads share large hot data.
This describes all games.
I take your word for it, as I last ran computer games myself (and even wrote one) in the 1990ies.

[However, if a game is CPU intensive and is parallel enough to make good use of more than eight cores, then everything which is old knowledge in the HPC world can and should be applied in game engines too — but isn't, because performance optimization is typically last in line for budget allocation.]
 

GTracing

Senior member
Aug 6, 2021
442
1,041
106
No, what I want is 3D cache on both chiplets that are rated to same frequency (so no B grade 2nd chiplet), that's what I want and prepared to pay for it ($1K max).

What's the problem? It's an upsell to AMD easily done.
The CCD with the 3D-Vcache runs slower because of the cache, not becuase of binning. See the 9800X3D clock speeds versus the 9700X clock speeds.

If AMD made a CPU where both CCDs had a cache die, it would have two slow CCDs, not two fast CCDs.
 

inquiss

Senior member
Oct 13, 2010
352
527
136
No, what I want is 3D cache on both chiplets that are rated to same frequency (so no B grade 2nd chiplet), that's what I want and prepared to pay for it ($1K max).

What's the problem? It's an upsell to AMD easily done.
Sure. So, in the cases where one core is maxed out in something, the other has a vcache and can perform those functions quicker if it's sensitive to vcache and slower if it's not. That's all well and good.

Doesn't help with scheduling though.

I guess all I can say is that people with those wants exist. You're one of them. But I can't think of a benefit to making the SKU beside pleasing some forum dwellers (affectionately) that can't monetise this new product. Maybe when games use more than 16 threads it will have a market.
 
Reactions: Thibsie

Win2012R2

Senior member
Dec 5, 2024
792
795
96
The CCD with the 3D-Vcache runs slower because of the cache, not becuase of binning
Obviously, but in Zen 5 clocks for 3D version is far closer to non-3D for the difference to be immaterial in my view, what's material is uneven chiplets both in terms of frequency and also cache, this may have been the only way in Zen 4, but now the downside of having non-3D chiplet with 3% faster clocks is a downside, not upside.

Anything that is not even makes scheduling harder
 

CouncilorIrissa

Senior member
Jul 28, 2023
620
2,405
96
Obviously, but in Zen 5 clocks for 3D version is far closer to non-3D for the difference to be immaterial in my view, what's material is uneven chiplets both in terms of frequency and also cache, this may have been the only way in Zen 4, but now the downside of having non-3D chiplet with 3% faster clocks is a downside, not upside.

Anything that is not even makes scheduling harder
It's not only about clock speed. The V$ die isn't universally faster than the non-V$ die even at the same clock, because larger cache incurs (well, at least on Zen 4 it did) 4-cycle penalty for accessing L3. So if your workload fits within 32MB, it would be faster on the non-V$ die even if it clocked the same.
 

StefanR5R

Elite Member
Dec 10, 2016
6,341
9,760
136
You're not getting uniformity unless AMD gives them an Epyc haircut.
I agree. Alas it's anyone's guess whether or not AMD is ever going to treat AM5 EPYC (alias EPYC 4000) better than a least-effort Ryzen derivate.

[…] plus NUMA mode should be activatable in BIOS so that OS knows it's different cache domains.
Actually the operating system is well aware of cache topology. (Linux is; I suppose Windows is too.) It just does not apply a cache-aware scheduling policy by default. I suspect this is because the kernel authors do not believe that a generally good enough default policy exists. This is in contrast to non-uniform main memory access (NUMA): Operating systems (Linux at least, I suppose several Windows flavors too) actually do apply a default NUMA-aware scheduling policy. (1. Try to keep a process, including all of its subthreads, running on one NUMA node, such that the process accesses mostly near memory. 2. Spread the overall load from different processes across NUMA nodes. This is a good NUMA related policy in many but not all cases.)

Now, this BIOS option which you mentioned — which lets the BIOS tell the OS that each last-level cache domain is a NUMA node — is actually a bit of a hack:
– It tricks the OS into applying its default NUMA-aware scheduling policy as if it was a cache-domain-aware scheduling policy.
– It tricks NUMA-aware userspace tools and settings to function like cache-domain-aware tools and settings.

If AMD made a CPU where both CCDs had a cache die, it would have two slow CCDs, not two fast CCDs.
a) He referred to performance determinism, not to extreme peak performance.
b) f_max = 5.2 GHz (9800X3D) or 5.55 GHz (9950X3D) is not "slow".

2 slow CCDs and cost $100 more, think of the value lol
a) They are not slow. b) Performance is workload dependent, and thereby is value.

The whole discussion about determinism using two more similar CCDs with X3D is very silly because perfect determinism is already defeated by the cores on the same die already have different max f and f/v behavior.
This is wrong if you consider the particular workloads in which 96 MB L3$/CCX actually make a difference to 32 MB L3$/CCX.

And servers achieve more determinism by generally clocking lower.
A dual-CCD CPU which runs twelve or more computationally intense threads does not run at f_max either.

if your workload fits within 32MB,
you buy a vanilla CPU.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,811
15,815
136
Question.. Why is a 16 core 4004 EPYC(4.5 ghz) + motherboard($399+$299 $698 total) close to the price of a 9950x(4.3 ghz, $542) and the 4004 runs faster ????
 

inquiss

Senior member
Oct 13, 2010
352
527
136
It's not only about clock speed. The V$ die isn't universally faster than the non-V$ die even at the same clock, because larger cache incurs (well, at least on Zen 4 it did) 4-cycle penalty for accessing L3. So if your workload fits within 32MB, it would be faster on the non-V$ die even if it clocked the same.
View attachment 120318
I think this a red herring? I think the point here is that, even if the cores were exactly the same, going across to the other CCD will incur a penalty whether it's the same or not. You want all threads on one CCD when you can. If everything was identical, you'd still want to pin the game to one CCD.
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,995
2,946
146
Question.. Why is a 16 core 4004 EPYC(4.5 ghz) + motherboard($399+$299 $698 total) close to the price of a 9950x(4.3 ghz, $542) and the 4004 runs faster ????
What generation is the 4004? I don't keep track of all the server parts, but if it is an older generation like Zen 2 or 3 etc, it may be faster in rated frequency but still be using an older architecture, thus often slower clock per clock. Also, in actual benchmarks/usage, it may be slower.
 

gdansk

Diamond Member
Feb 8, 2011
4,030
6,638
136
What generation is the 4004? I don't keep track of all the server parts, but if it is an older generation like Zen 2 or 3 etc, it may be faster in rated frequency but still be using an older architecture, thus often slower clock per clock. Also, in actual benchmarks/usage, it may be slower.
It's Zen 4. The 4564P he's talking about is a 7950X by another name.
 
Reactions: SteinFG

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,995
2,946
146
Ok so that helps answer the question...the 4004 is a generation older, though still not much older. You may be able to get it at a bit of a discount though. And where are you seeing it for this price? Prices will vary by seller, and especially can be lower from used on Ebay and similar.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,811
15,815
136
It's Zen 4. The 4564P he's talking about is a 7950X by another name.
Thanks, I missed that. It says gen 4 in the description of the product. No way I want it then,
Ok so that helps answer the question...the 4004 is a generation older, though still not much older. You may be able to get it at a bit of a discount though. And where are you seeing it for this price? Prices will vary by seller, and especially can be lower from used on Ebay and similar.
see above.
 

MS_AT

Senior member
Jul 15, 2024
555
1,168
96
It is not this easy.
I am afraid we are arguing about 2 different things. All what I am saying that 9950x3D has nullified the weakness of 7950x3D by minimising the frequency difference between vanilla and x3D cache, to the point where betting on x3D CCD as the default one is set and forget solution, as it won't hurt your general performance meaningfully. Something that could not be said about 7950x3D. Since the OS cannot know if the apps its running prefer MHz or MBs of cache, until they tell it, so 9950x3D is easier to setup for the scheduler than 7950x3D in general case. Adding second x3D CCD wouldn't make it job meaningfully easier as the biggest problem with 2 CCDs is that they are 2 CCDs. But yes some workloads would see better performance from 2x x3D CCDs. It's just orthogonal to the scheduling problem, IMO.
 

dr1337

Senior member
May 25, 2020
449
731
136
I think this a red herring? I think the point here is that, even if the cores were exactly the same, going across to the other CCD will incur a penalty whether it's the same or not. You want all threads on one CCD when you can. If everything was identical, you'd still want to pin the game to one CCD.
Its quite a penalty, both in latency and absolute bandwidth. Current GMI3 links are at 36 GB/s, but looking at Aida64 results of 9800X3Ds on google, L3 bandwidth clocks in at over 700GB/s. So an order of magnitude reduction (and then some) in IO speed just to request from the cache on a different CCD.
 

MadRat

Lifer
Oct 14, 1999
11,961
278
126
Its quite a penalty, both in latency and absolute bandwidth. Current GMI3 links are at 36 GB/s, but looking at Aida64 results of 9800X3Ds on google, L3 bandwidth clocks in at over 700GB/s. So an order of magnitude reduction (and then some) in IO speed just to request from the cache on a different CCD.
Seems like cache communication between cores is limited by the laws of physics. So until there's a new law of physics, focus on what is one off from the current market. So much speculation in the thread appears to throw the baby out with the bathwater. There is no magic whatif coming in Zen 5. Its awesome that AMD broke the decorum demanding no big caches for each successive generation. But the innovation today is pretty cool how talk about L1, L2, etc now focuses on whole chips being added on to the package.
 

fastandfurious6

Senior member
Jun 1, 2024
439
586
96
1) Medusa 12core CCD = easier for scheduler to fit more stuff in each

2) OS scheduler gets smarter until H2 '26

3) 3D cache production ramped up for next gen so Medusa likely has x2 3Dcache models
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |