Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Thibsie · May 28, 2024

igor_kavinski said:
AMD needs to have a shared L4 V-cache for both CCDs.

But this would introduce more latency as CCDs would need to reach IOD and back to CCD.

leoneazzurro · May 28, 2024

I think the main problem is that APUs were so far monolithic and made for a very competitive market (in terms of costs) where the extra cache had a minimal impact in front of added area (i.e. a 16MB IC on Phoenix would have increased the performance, but by how much, and with what economical impact? Having to differentiate and validate a new product would have resulted in an added value for AMD?). In the future, with MCM/tiled APUs, the possibility to have different versions of an APU with more cache would be easier to realize. If AMD is willing to do it for their mainstream products and not for the Halo (niche) product line it is another matter and I am very unsure AMD would do that.

leoneazzurro · May 28, 2024

igor_kavinski said:
AMD needs to have a shared L4 V-cache for both CCDs.

Substantial added costs with low return, it may be done when the economic return will be >0.

Rekluse · May 28, 2024

I'm curious if there's a good benchmark for seeing how well a microarch does with VM CPU over-provisioning. It would be very interesting to see the Zen 1->2->3->4->4c->5 evolution in those terms, including a comparison with intel e-cores and whether a single fat Zen5 core can basically be the equivalent of 2xZen3 cores

Essentially the SPECperf equivalent of this test:

SteinFG · May 28, 2024

igor_kavinski said:
AMD needs to have a shared L4

That's called DDR5 I think 😄

igor_kavinski · May 28, 2024

Thibsie said:
But this would introduce more latency as CCDs would need to reach IOD and back to CCD.

Put the V-cache between CCD and IOD. Have some sort of latch in there so if needed, CCD traffic can snoop in the V-cache to see if data is available there. In case of streaming sequential data access, no need to bother with checking V-cache. Engineers can do almost anything if sufficiently motivated or mandated to do from their management. Or create a 16 core CCD to bury the whole issue once and for all. Tons of people must've checked off the 7950X3D off their wanted list because of the cache discrepancy between the two CCDs. It's just very, very wasteful. I can see they went with it coz it was the quickest thing to do for them but it's very suboptimal in my opinion. There are benchmarks where the 7950X beats the 7950X3D. That should NOT be happening.

MadRat · May 28, 2024

igor_kavinski said:
AMD needs to have a shared L4 V-cache for both CCDs.

An intelligent memory controller that can juggle non-standard memory would be better than the complexity of additional cache levels. The memory controller doesn't need to load balance across its memory banks, but report intricate working details to the OS so that the software does the actual load balancing to suite the OS design. If you wanted one specific app to focus work inside the fastest memory bank(s) then its possible. The OS could then transfer unimportant workloads to banks of slower memory. They already do this with storage and in high end systems, so its just a natural evolution of the methods to the consumer level.

igor_kavinski · May 28, 2024

SteinFG said:
That's called DDR5 I think 😄

That's a sick joke, sorry. I would rather it was eDRAM or even super low latency DDR3-2400 than high latency DDR5. DDR5 bursts are like the Deathstar firing a powerful barrage of data at the CPU. Excellent except when that's not what you need. Sometimes you just need a quick turbolaser from a Tie Interceptor or a Tie Defender.

SteinFG · May 28, 2024

leoneazzurro said:
APUs, the possibility to have different versions of an APU with more cache would be easier to realize.

There's less than 100 people who are going to buy what you're talking about. CPU+dGPU wins hands down every time. APU on desktop will always be a low-end budget option, you won't upsell people who buy them to a "bigger cache" version

MadRat · May 28, 2024

igor_kavinski said:
That's a sick joke, sorry. I would rather it was eDRAM or even super low latency DDR3-2400 than high latency DDR5. DDR5 bursts are like the Deathstar firing a powerful barrage of data at the CPU. Excellent except when that's not what you need. Sometimes you just need a quick turbolaser from a Tie Interceptor or a Tie Defender.

So basically an SDR using DDR5/6 manufacturing technology. If the OS could access SDR and DDR banks separately then you'd truly have something special. SDR for complex random accesses of small data sets. DDR for rapid streams of more complex data sizes. Only works if both memory technologies share an interface.

vanplayer · May 28, 2024

IF - infinity fabric has changed a lot, memory reach 6800mhz?

"This generation is actually very interesting. The if bus has changed a lot, and the memory is very interesting.""It looks like synchronizing 6k8. There seems to be a lot of changes on the if side, and the data is much higher than the crippled z4."

Hans Gruber · May 28, 2024

If we assume Zen 5 is released in late July. That would give AMD the performance title across all metrics for at least 3 months over Intel. One thing Intel never seems to struggle with is IPC gains over generations. They struggled with power consumption and inferior silicon nodes. Arrow Lake is going to be on 5nm (20A) down from 10nm for Alder Lake. Raptor Lake is a fake node so I do not county that as a generation.

I think there will be a Zen 5+ between Zen 5 and Zen 6 to introduce 3nm, probably late 2025. It depends on how well Arrow Lake performs. The scary part for AMD should be the efficiency gains Intel will bring with 5nm. The rumor is the standard Arrow Lake CPU's will be have a TDP of 65w and the K series parts will be 125w. I think the i5 K series will probably be less than 100w and more than 65w.

leoneazzurro · May 28, 2024

SteinFG said:
There's less than 100 people who are going to buy what you're talking about. CPU+dGPU wins hands down every time. APU on desktop will always be a low-end budget option, you won't upsell people who buy them to a "bigger cache" version

In fact, it is what I said ->

"If AMD is willing to do it for their mainstream products and not for the Halo (niche) product line it is another matter and I am very unsure AMD would do that."

Strix Halo seems to have IC so, for these niche product, there is a very thin possibility that something like V-cache or different versions with different GPU dies (differing in CU count + IC size) could be adopted in the future for these very premium/very high price parts. And I insist on the "very thin" part.

igor_kavinski · May 28, 2024

@MadRat is full of ideas that could make a bored engineer go, Hmmmmm why didn't I think of that?

Dude, one word. PATENT!

igor_kavinski · May 28, 2024

Mahboi said:
There's 2 cases where V$ does nothing at all:
- high computation, low data amounts
Normal cache works just fine for those and the extra cache is never needed
- low computation, very high data amounts
Even V$ will get overloaded and ends up being not more or less useful than normal cache

Forgetting the case of extremely large binary executables. Facebook used to have >1GB PHP binaries on their servers (don't know if that's changed). Such a huge binary would benefit immensely as the CPU would not need to access main memory or storage for parts of the executable code that cannot fit inside standard sized L1/L2/L3.

igor_kavinski · May 28, 2024

leoneazzurro said:
Substantial added costs with low return, it may be done when the economic return will be >0.

Citation(s) needed.

SteinFG · May 28, 2024

vanplayer said:
"It looks like synchronizing 6k8. There seems to be a lot of changes on the if side, and the data is much higher than the crippled z4."

If it's actually working at 1:1 mode, I guess they've increased memory controller speed by about 13% (6800/6000), which isn't much, but still ok. Maybe new IOD revision with higher clocks

igor_kavinski · May 28, 2024

SteinFG said:
If it's actually working at 1:1 mode, that means they increased memory controller speed by about 13% (6800/6000), which isn't much, but still ok. Maybe new io die revision with higher clocks

Arrow Lake could possibly have an edge there.

leoneazzurro · May 28, 2024

igor_kavinski said:
Citation(s) needed.

Current CPUs/APUs already have large L3 sizes. As seen with Zen3/Zen4, consumer applications that take advantage of this added cache (the V-cache) are limited essentially to gaming.
For a L4 cache having meaning its size should be not negligible (in most cases, bigger than L3, which means lot of silicon dedicated to it and more complex packaging, even if on a less expensive process), and even then not all the applications would take advantage of it, just like the larger L3 affects only some workloads.
It will be done when the performance increase will justify its adoption.

igor_kavinski · May 28, 2024

leoneazzurro said:
Current CPUs/APUs already have large L3 sizes. As seen with Zen3/Zen4, consumer applications that take advantage of this added cache (the V-cache) are limited essentially to gaming.

Disagree. Regular review sites don't know how to measure V-cache impact. AT writer Ganesh has an excellent multi-core performance hit benchmark methodology where he measures the impact of a compute intensive background application on the benchmark running in the foreground. If Gavin would care to learn that (or maybe AT or Future plc would pay royalties to Ganesh or whatever for using his testing methodology), we would know for sure if V-cache has a measurable impact in daily workload scenarios. I would love to test this myself but no one here wants to donate a V-cache CPU to me

And I blame AMD for not having 7800X3D available for sale when I bought my used 12700K!

leoneazzurro · May 28, 2024

igor_kavinski said:
Disagree. Regular review sites don't know how to measure V-cache impact. AT writer Ganesh has an excellent multi-core performance hit benchmark methodology where he measures the impact of a compute intensive background application on the benchmark running in the foreground. If Gavin would care to learn that (or maybe AT or Future plc would pay royalties to Ganesh or whatever for using his testing methodology), we would know for sure if V-cache has a measurable impact in daily workload scenarios. I would love to test this myself but no one here wants to donate a V-cache CPU to me

And I blame AMD for not having 7800X3D available for sale when I bought my used 12700K!

Are there practical cases where you want to run several compute-intensive applications in parallel, or it is simply "for science"?

igor_kavinski · May 28, 2024

Joe NYC said:
If AMD figured either of:
a) cooling of the chip if entire CCD is covered by V-Cache
b) put V-Cache under the CCD

Then Wafer on Wafer stacking could be used, and it would overcome challenges you mentioned.

As far as "known good die", if V-Cache was so highly redundant that it would be close to 100%, yield would be affected minimally.

Maybe harder to do on desktop but in laptops AMD can make a dual-stacked die that needs to be cooled on both sides (top and bottom) and the connections to the motherboard are on the four sides of the die. It's doable, if it will add a little thickness to the laptop. Most 15.6 inch and 17.3 inch laptops could accommodate that.

igor_kavinski · May 28, 2024

leoneazzurro said:
Are there practical cases where you want to run several compute-intensive applications in parallel, or it is simply "for science"?

It would make browsing with multiple tabs and Microsoft Teams snappier, and may make other background workloads like a long running video encode job faster. Serious professionals may have a lot more usecases.

leoneazzurro · May 28, 2024

I'd say serious professionals don't browse when their machine is doing critical work. Also, I would like to know how much of performance difference these cases you listed will have, or if it would be noticeable in the practical use.

Aapje · May 28, 2024

igor_kavinski said:
Engineers can do almost anything if sufficiently motivated or mandated to do from their management.

Physics disagrees (and it doesn't listen to management).

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Golden Member

Golden Member

Member

Senior member

Lifer

Lifer

Lifer

Senior member

Lifer

Junior Member

Platinum Member

Golden Member

Lifer

Lifer

Lifer

Senior member

Lifer

Golden Member

Lifer

Golden Member

Lifer

Lifer

Golden Member

Golden Member