Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 458 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

leoneazzurro

Golden Member
Jul 26, 2016
1,005
1,597
136
I think the main problem is that APUs were so far monolithic and made for a very competitive market (in terms of costs) where the extra cache had a minimal impact in front of added area (i.e. a 16MB IC on Phoenix would have increased the performance, but by how much, and with what economical impact? Having to differentiate and validate a new product would have resulted in an added value for AMD?). In the future, with MCM/tiled APUs, the possibility to have different versions of an APU with more cache would be easier to realize. If AMD is willing to do it for their mainstream products and not for the Halo (niche) product line it is another matter and I am very unsure AMD would do that.
 

Rekluse

Member
Sep 16, 2022
36
46
51
I'm curious if there's a good benchmark for seeing how well a microarch does with VM CPU over-provisioning. It would be very interesting to see the Zen 1->2->3->4->4c->5 evolution in those terms, including a comparison with intel e-cores and whether a single fat Zen5 core can basically be the equivalent of 2xZen3 cores

Essentially the SPECperf equivalent of this test:
 
Jul 27, 2020
17,800
11,599
106
But this would introduce more latency as CCDs would need to reach IOD and back to CCD.
Put the V-cache between CCD and IOD. Have some sort of latch in there so if needed, CCD traffic can snoop in the V-cache to see if data is available there. In case of streaming sequential data access, no need to bother with checking V-cache. Engineers can do almost anything if sufficiently motivated or mandated to do from their management. Or create a 16 core CCD to bury the whole issue once and for all. Tons of people must've checked off the 7950X3D off their wanted list because of the cache discrepancy between the two CCDs. It's just very, very wasteful. I can see they went with it coz it was the quickest thing to do for them but it's very suboptimal in my opinion. There are benchmarks where the 7950X beats the 7950X3D. That should NOT be happening.
 

MadRat

Lifer
Oct 14, 1999
11,923
259
126
AMD needs to have a shared L4 V-cache for both CCDs.

An intelligent memory controller that can juggle non-standard memory would be better than the complexity of additional cache levels. The memory controller doesn't need to load balance across its memory banks, but report intricate working details to the OS so that the software does the actual load balancing to suite the OS design. If you wanted one specific app to focus work inside the fastest memory bank(s) then its possible. The OS could then transfer unimportant workloads to banks of slower memory. They already do this with storage and in high end systems, so its just a natural evolution of the methods to the consumer level.
 
Jul 27, 2020
17,800
11,599
106
That's called DDR5 I think 😄
That's a sick joke, sorry. I would rather it was eDRAM or even super low latency DDR3-2400 than high latency DDR5. DDR5 bursts are like the Deathstar firing a powerful barrage of data at the CPU. Excellent except when that's not what you need. Sometimes you just need a quick turbolaser from a Tie Interceptor or a Tie Defender.
 
Reactions: Shmee

SteinFG

Senior member
Dec 29, 2021
520
610
106
APUs, the possibility to have different versions of an APU with more cache would be easier to realize.
There's less than 100 people who are going to buy what you're talking about. CPU+dGPU wins hands down every time. APU on desktop will always be a low-end budget option, you won't upsell people who buy them to a "bigger cache" version
 
Reactions: Tlh97 and MadRat

MadRat

Lifer
Oct 14, 1999
11,923
259
126
That's a sick joke, sorry. I would rather it was eDRAM or even super low latency DDR3-2400 than high latency DDR5. DDR5 bursts are like the Deathstar firing a powerful barrage of data at the CPU. Excellent except when that's not what you need. Sometimes you just need a quick turbolaser from a Tie Interceptor or a Tie Defender.
So basically an SDR using DDR5/6 manufacturing technology. If the OS could access SDR and DDR banks separately then you'd truly have something special. SDR for complex random accesses of small data sets. DDR for rapid streams of more complex data sizes. Only works if both memory technologies share an interface.
 

Hans Gruber

Platinum Member
Dec 23, 2006
2,214
1,152
136
If we assume Zen 5 is released in late July. That would give AMD the performance title across all metrics for at least 3 months over Intel. One thing Intel never seems to struggle with is IPC gains over generations. They struggled with power consumption and inferior silicon nodes. Arrow Lake is going to be on 5nm (20A) down from 10nm for Alder Lake. Raptor Lake is a fake node so I do not county that as a generation.

I think there will be a Zen 5+ between Zen 5 and Zen 6 to introduce 3nm, probably late 2025. It depends on how well Arrow Lake performs. The scary part for AMD should be the efficiency gains Intel will bring with 5nm. The rumor is the standard Arrow Lake CPU's will be have a TDP of 65w and the K series parts will be 125w. I think the i5 K series will probably be less than 100w and more than 65w.
 
Reactions: spursindonesia

leoneazzurro

Golden Member
Jul 26, 2016
1,005
1,597
136
There's less than 100 people who are going to buy what you're talking about. CPU+dGPU wins hands down every time. APU on desktop will always be a low-end budget option, you won't upsell people who buy them to a "bigger cache" version
In fact, it is what I said ->

"If AMD is willing to do it for their mainstream products and not for the Halo (niche) product line it is another matter and I am very unsure AMD would do that."

Strix Halo seems to have IC so, for these niche product, there is a very thin possibility that something like V-cache or different versions with different GPU dies (differing in CU count + IC size) could be adopted in the future for these very premium/very high price parts. And I insist on the "very thin" part.
 
Jul 27, 2020
17,800
11,599
106
There's 2 cases where V$ does nothing at all:
- high computation, low data amounts
Normal cache works just fine for those and the extra cache is never needed
- low computation, very high data amounts
Even V$ will get overloaded and ends up being not more or less useful than normal cache
Forgetting the case of extremely large binary executables. Facebook used to have >1GB PHP binaries on their servers (don't know if that's changed). Such a huge binary would benefit immensely as the CPU would not need to access main memory or storage for parts of the executable code that cannot fit inside standard sized L1/L2/L3.
 

SteinFG

Senior member
Dec 29, 2021
520
610
106
"It looks like synchronizing 6k8. There seems to be a lot of changes on the if side, and the data is much higher than the crippled z4."
If it's actually working at 1:1 mode, I guess they've increased memory controller speed by about 13% (6800/6000), which isn't much, but still ok. Maybe new IOD revision with higher clocks
 
Last edited:

leoneazzurro

Golden Member
Jul 26, 2016
1,005
1,597
136
Citation(s) needed.
Current CPUs/APUs already have large L3 sizes. As seen with Zen3/Zen4, consumer applications that take advantage of this added cache (the V-cache) are limited essentially to gaming.
For a L4 cache having meaning its size should be not negligible (in most cases, bigger than L3, which means lot of silicon dedicated to it and more complex packaging, even if on a less expensive process), and even then not all the applications would take advantage of it, just like the larger L3 affects only some workloads.
It will be done when the performance increase will justify its adoption.
 
Jul 27, 2020
17,800
11,599
106
Current CPUs/APUs already have large L3 sizes. As seen with Zen3/Zen4, consumer applications that take advantage of this added cache (the V-cache) are limited essentially to gaming.
Disagree. Regular review sites don't know how to measure V-cache impact. AT writer Ganesh has an excellent multi-core performance hit benchmark methodology where he measures the impact of a compute intensive background application on the benchmark running in the foreground. If Gavin would care to learn that (or maybe AT or Future plc would pay royalties to Ganesh or whatever for using his testing methodology), we would know for sure if V-cache has a measurable impact in daily workload scenarios. I would love to test this myself but no one here wants to donate a V-cache CPU to me

And I blame AMD for not having 7800X3D available for sale when I bought my used 12700K!
 

leoneazzurro

Golden Member
Jul 26, 2016
1,005
1,597
136
Disagree. Regular review sites don't know how to measure V-cache impact. AT writer Ganesh has an excellent multi-core performance hit benchmark methodology where he measures the impact of a compute intensive background application on the benchmark running in the foreground. If Gavin would care to learn that (or maybe AT or Future plc would pay royalties to Ganesh or whatever for using his testing methodology), we would know for sure if V-cache has a measurable impact in daily workload scenarios. I would love to test this myself but no one here wants to donate a V-cache CPU to me

And I blame AMD for not having 7800X3D available for sale when I bought my used 12700K!
Are there practical cases where you want to run several compute-intensive applications in parallel, or it is simply "for science"?
 
Jul 27, 2020
17,800
11,599
106
If AMD figured either of:
a) cooling of the chip if entire CCD is covered by V-Cache
b) put V-Cache under the CCD

Then Wafer on Wafer stacking could be used, and it would overcome challenges you mentioned.

As far as "known good die", if V-Cache was so highly redundant that it would be close to 100%, yield would be affected minimally.
Maybe harder to do on desktop but in laptops AMD can make a dual-stacked die that needs to be cooled on both sides (top and bottom) and the connections to the motherboard are on the four sides of the die. It's doable, if it will add a little thickness to the laptop. Most 15.6 inch and 17.3 inch laptops could accommodate that.
 
Jul 27, 2020
17,800
11,599
106
Are there practical cases where you want to run several compute-intensive applications in parallel, or it is simply "for science"?
It would make browsing with multiple tabs and Microsoft Teams snappier, and may make other background workloads like a long running video encode job faster. Serious professionals may have a lot more usecases.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,005
1,597
136
I'd say serious professionals don't browse when their machine is doing critical work. Also, I would like to know how much of performance difference these cases you listed will have, or if it would be noticeable in the practical use.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |