Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 599 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
17,967
11,709
116
Windows-key + G --> "This is a game" keeps the program running on CCD0 (no scheduling issues)
Need updated Gamebar and AMD windows driver package with balanced powerplan to work properly
It just sucks that AMD got tricked into promoting Microsoft's garbage software.

https://answers.microsoft.com/en-us...r/873f97d2-e5a7-4017-b2c4-c53e2dd18f2f?page=1

That Game Bar update annoyance only got fixed last December but there's no telling when it could cause problems in future. Also, knowing Microsoft, there's also no telling when they might decide to demote Game Bar to legacy software status and stop updating it altogether. Anyone remember GfW?

AMD ideally should've integrated this proper scheduling functionality into Ryzen Master.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,945
106
The supply of V-cache dies being a limiting factor, using two of them in a single CPU prevents the "birth" of another CPU.

We really don't know that to be the case. There has been 3 years of ramping of capacity, and V-Cache sales are still tiny. I really don't think capacity is a problem.

If capacity was still the bottleneck, meaning TSMC has only the capacity for SoIC stacking that is equal to AMD V-Cache sales, then TSMC can just as well scrap SoIC from there web side and give up.

Something TSMC is not doing...

Alternative (and far more plausible) theory is that AMD is demand constrained currently, in all its CPU products, not supply constrained.
 
Reactions: Tlh97 and Timmah!

QuickyDuck

Junior Member
Nov 6, 2023
10
13
41
We really don't know that to be the case. There has been 3 years of ramping of capacity, and V-Cache sales are still tiny. I really don't think capacity is a problem.

If capacity was still the bottleneck, meaning TSMC has only the capacity for SoIC stacking that is equal to AMD V-Cache sales, then TSMC can just as well scrap SoIC from there web side and give up.

Something TSMC is not doing...

Alternative (and far more plausible) theory is that AMD is demand constrained currently, in all its CPU products, not supply constrained.
Just to remind you 3D V-Cache product isn't the only one leverages SoIC, MI300 series also rely on it.

IMO, there isn't dual X3D consumer cpu because marketing shenanigan.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,945
106
Look on eBay:

View attachment 102159

View attachment 102160

When the price difference between identical core count server CPUs due to V-cache is more than $2000, no way AMD is just gonna give away their V-cache dies. It also cuts into their server marketshare because people and even companies could start using the dual V-cache CPUs for their commercial workloads instead of investing in a server.

I think the mistake / wrong assumption you keep making is that there is some sort of shortage and there is a tradeoff AMD is constantly making due to shortage of capacity.

TSMC had plenty unused capacity for last 18-24 months. There may be some capacity tightness going into H2 in some nodes, but not in N6/N7, used to V-Cache, where TSMC is drowning in overcapacity, and unlikely in SoIC packaging, which TSMC has been ramping for 3 years now.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,945
106
Just to remind you 3D V-Cache product isn't the only one leverages SoIC, MI300 series also rely on it.

IMO, there isn't dual X3D consumer cpu because marketing shenanigan.

I don't think so. The reason there was no dual V-Cache 7950x3d is because there is a ~500 MHz clock speed deficit in V-Cache parts, and software that does not benefit from V-Cache just runs ~10% slower.

7950x3d was aimed to be the best of both worlds is the reason AMD did not put V-Cache on the 2nd CCD, and instead maximized the clock speed.

So there were some technical reasons behind the decision of having asymmetrical CPU. It the technical challenges (of clock speed regression) are overcome, then there is no longer reason to have a complicated asymmetrical CPU any more.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,945
106
Just to remind you 3D V-Cache product isn't the only one leverages SoIC, MI300 series also rely on it.

IMO, there isn't dual X3D consumer cpu because marketing shenanigan.

BTW, good point about Mi300.

But regarding Mi300, Lisa Su explicitly said that AMD is NOT capacity constrained (in H2 2024(, and can manufacture and sell a lot more than is currently being projected. Meaning, there is going to be spare capacity for Mi300, so Mi300 is not constraining the capacity for SoIC packaging.
 
Jul 27, 2020
17,967
11,709
116
By the way, is there a reason why 8 core die with V-Cache + 6 core die is not possible?
None I guess. But this is the reason I believe AMD is capacity constrained with regards to V-cache. If they had abundant supply, they would have a LOT more SKUs, like maybe so:

4-core V-cache single CCD (Ryzen 3 X3D!)
6-core V-cache single CCD (Ryzen 5 X3D!)
8-core V-cache CCD + 4 core CCD
8-core V-cache CCD + 6 core CCD
4-core V-cache CCD + 4-core V-cache CCD
and who knows how many more!

Please understand that V-cache is a big CACHE die, prone to defects more than logic dies. We have no idea what the V-cache yield rate is and the process of attaching V-cache to the CCD is no simple matter and causes production to slow down enough that they can only do something like 40,000 V-cache CPUs a month (last I heard. Not sure about their latest production figures).
 

coercitiv

Diamond Member
Jan 24, 2014
6,400
12,849
136
By the way, is there a reason why 8 core die with V-Cache + 6 core die is not possible?
Possible and probable, the two frenemies. Good 8 core dies already have their place in 8-core and 16-core SKUs, meanwhile 6-core and 12-core SKUs do a good job catching all kinds of imperfect dies.

8+6 is possible, but it ain't probable, the same could probably be said about 7+7 or 5+5.
 
Last edited:

PJVol

Senior member
May 25, 2020
622
556
136
So there were some technical reasons behind the decision of having asymmetrical CPU. It the technical challenges (of clock speed regression) are overcome, then there is no longer reason to have a complicated asymmetrical CPU any more.
I'm not sure if two v-cache CCDs are needed at all.
If we assume they managed to get the V-cache CCD Fmax equal to that of regular CCDs, you'll still end up with inter-CCD penalty. Different CCD Fmax will make things even worse.

P.S. Unless there's a big single L3 chunk glued-TSV'ed atop of both CCDs, with 16 slices forming a common for both CCDs ring (or whatever) bus ? )
 
Last edited:

tsamolotoff

Member
May 19, 2019
64
99
91
It's just a false impression (along with 'zen4 has thick ihs' FUD) that has sprang into life at x3d release because AMD for some reason decided to make the CCD1 as the high-priority by default in CPPC (despite the fact that according to ACPI the x3d cores have higher rating / quality / priority etc). I had to reinstall gamebar and related services three or four times to make it work and then I found out that it's like 3 or 4 games that I've owned that benefit from frequency rather than cache (mostly old games like RE5 and TF2), one of which (CSGO) was obsoleted a few months after x3d launch (and in cs2 x3d is 30%+ faster than vanilla zen4).

So in all, it'd be much better if AMD just kept CCD0 as the default high-priority chiplet and left the option to use CCD1 as the high-prio for those who need it (I don't know, people who need the highest scores in browser benchmarks or those who play old games)
 
Jul 27, 2020
17,967
11,709
116
(unless there's a big single L3 chunk glued atop of both CCDs, with 16 slices formed common for both CCD ring-bus ?)
I recall adroc saying something to the effect that cache coherency issues would come into play with a single shared large L3 across both CCDs. Also, not even sure if they can do that single shared L3. There might be issues with that thin V-cache die spreading across the boundaries of the two CCDs.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,945
106
I'm not sure if two v-cache CCDs are needed at all.
If we assume they managed to get the V-cache CCD Fmax equal to that of regular CCDs, you'll still end up with inter-CCD penalty (unless there's a big L3 chunk glued atop of both CCDs). Different CCD Fmax will make things even worse.
It's not much different from 7950x. The number of cases where 7700x is faster than 7950x is limited.

I am puzzled why many reviews analyze core to core latencies, as if that was a common usage. Vast majority of communication is core to memory, or core to L3 to Memory.

So in scenario where thread jumps to the other CCD, leaving its data behind, the thread just has to go back to memory to re-fetch the data.

Another bad scenario is if number of cores from different CCDs all work on the same data, in which case, L3 ends up not really being used and memory has to resolve these accesses.

These are 2 corner cases that may amount to low percentage of scenarios, probably low single digit. Most typical case is that a thread mostly uses its own data, and typically, it does not jump from CCD to CCD.
 
Last edited:

PJVol

Senior member
May 25, 2020
622
556
136
I recall adroc saying something to the effect that cache coherency issues would come into play with a single shared large L3 across both CCDs.
If two CCDs are placed very close and are restricted to run at the same frequency, idk why there should be any issues. Also, the cores are probably need to be arranged as close as possible to one of the chiplet edges.

But anyway, I agree with Joe NYC's thoughts above.
 
Last edited:
Reactions: Tlh97 and Joe NYC
Jul 27, 2020
17,967
11,709
116
I am puzzled why many reviews analyze core to core latencies, as if that was a common usage. Vast majority of communication is core to memory, or core to L3 to Memory.
It's due to user threads not being able to execute code without the involvement of a kernel thread. The kernel thread is in control of allowing the user thread to execute. So there's a LOT of inter-thread communication going on and core to core latencies need to be low to reduce that communication overhead.
 

Gideon

Golden Member
Nov 27, 2007
1,714
3,937
136
If two CCDs are placed very close and are restricted to run the same frequency, idk why there should be any issues
The issue is Server chips that have more than 2 CCDs IMO. All this extra complexity, that only really helps desktop usecases with 2 CCDs, makes it really unlikely AMD would design something like that.

With Zen 6, where presumably more complex chiplet layouts are used and Client is separated from Server, we are more likely to see such setups. A Strix-Halo like chip, with a huge LLC insted of a supersized GPU, etc ...
 
Jul 27, 2020
17,967
11,709
116
Most typical case is that a thread mostly uses its own data, and typically, it does not jump from CCD to CCD.
That's the problem. It's working on data in the shared memory space and the different threads have to behave by communicating with each other so that one thread does not corrupt any other thread's data in that shared memory space.
 

tsamolotoff

Member
May 19, 2019
64
99
91
Cache and memory coherency (as well as bandwidth in case of zen4) is not an issue if the OS and/or software you use is NUMA (NUCA in this case) aware, but most of the times it's not. Also, coherency issues that tank performance in games typically appear when the game tries to use threads above one CCD limit in a latency bound scenarios (check FPS in cs2 with threads affinity locked only to ccd0 and in 'normal' mode on 7950x3d, for example) either on purpose or by chance or mistake (like some old games, stalker cop for example).

Also, if solution was an obvious thing (like doing more cores in a CCX and putting them onto one (bidrectional) ringbus), then both Intel and AMD wouldn't expend money and effort on mesh interconnect and IF achitectures.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |