- Mar 3, 2017
- 1,684
- 6,227
- 136
But this would introduce more latency as CCDs would need to reach IOD and back to CCD.AMD needs to have a shared L4 V-cache for both CCDs.
Substantial added costs with low return, it may be done when the economic return will be >0.AMD needs to have a shared L4 V-cache for both CCDs.
That's called DDR5 I think 😄AMD needs to have a shared L4
Put the V-cache between CCD and IOD. Have some sort of latch in there so if needed, CCD traffic can snoop in the V-cache to see if data is available there. In case of streaming sequential data access, no need to bother with checking V-cache. Engineers can do almost anything if sufficiently motivated or mandated to do from their management. Or create a 16 core CCD to bury the whole issue once and for all. Tons of people must've checked off the 7950X3D off their wanted list because of the cache discrepancy between the two CCDs. It's just very, very wasteful. I can see they went with it coz it was the quickest thing to do for them but it's very suboptimal in my opinion. There are benchmarks where the 7950X beats the 7950X3D. That should NOT be happening.But this would introduce more latency as CCDs would need to reach IOD and back to CCD.
AMD needs to have a shared L4 V-cache for both CCDs.
That's a sick joke, sorry. I would rather it was eDRAM or even super low latency DDR3-2400 than high latency DDR5. DDR5 bursts are like the Deathstar firing a powerful barrage of data at the CPU. Excellent except when that's not what you need. Sometimes you just need a quick turbolaser from a Tie Interceptor or a Tie Defender.That's called DDR5 I think 😄
There's less than 100 people who are going to buy what you're talking about. CPU+dGPU wins hands down every time. APU on desktop will always be a low-end budget option, you won't upsell people who buy them to a "bigger cache" versionAPUs, the possibility to have different versions of an APU with more cache would be easier to realize.
So basically an SDR using DDR5/6 manufacturing technology. If the OS could access SDR and DDR banks separately then you'd truly have something special. SDR for complex random accesses of small data sets. DDR for rapid streams of more complex data sizes. Only works if both memory technologies share an interface.That's a sick joke, sorry. I would rather it was eDRAM or even super low latency DDR3-2400 than high latency DDR5. DDR5 bursts are like the Deathstar firing a powerful barrage of data at the CPU. Excellent except when that's not what you need. Sometimes you just need a quick turbolaser from a Tie Interceptor or a Tie Defender.
In fact, it is what I said ->There's less than 100 people who are going to buy what you're talking about. CPU+dGPU wins hands down every time. APU on desktop will always be a low-end budget option, you won't upsell people who buy them to a "bigger cache" version
Forgetting the case of extremely large binary executables. Facebook used to have >1GB PHP binaries on their servers (don't know if that's changed). Such a huge binary would benefit immensely as the CPU would not need to access main memory or storage for parts of the executable code that cannot fit inside standard sized L1/L2/L3.There's 2 cases where V$ does nothing at all:
- high computation, low data amounts
Normal cache works just fine for those and the extra cache is never needed
- low computation, very high data amounts
Even V$ will get overloaded and ends up being not more or less useful than normal cache
Citation(s) needed.Substantial added costs with low return, it may be done when the economic return will be >0.
If it's actually working at 1:1 mode, I guess they've increased memory controller speed by about 13% (6800/6000), which isn't much, but still ok. Maybe new IOD revision with higher clocks"It looks like synchronizing 6k8. There seems to be a lot of changes on the if side, and the data is much higher than the crippled z4."
Arrow Lake could possibly have an edge there.If it's actually working at 1:1 mode, that means they increased memory controller speed by about 13% (6800/6000), which isn't much, but still ok. Maybe new io die revision with higher clocks
Current CPUs/APUs already have large L3 sizes. As seen with Zen3/Zen4, consumer applications that take advantage of this added cache (the V-cache) are limited essentially to gaming.Citation(s) needed.
Disagree. Regular review sites don't know how to measure V-cache impact. AT writer Ganesh has an excellent multi-core performance hit benchmark methodology where he measures the impact of a compute intensive background application on the benchmark running in the foreground. If Gavin would care to learn that (or maybe AT or Future plc would pay royalties to Ganesh or whatever for using his testing methodology), we would know for sure if V-cache has a measurable impact in daily workload scenarios. I would love to test this myself but no one here wants to donate a V-cache CPU to meCurrent CPUs/APUs already have large L3 sizes. As seen with Zen3/Zen4, consumer applications that take advantage of this added cache (the V-cache) are limited essentially to gaming.
Are there practical cases where you want to run several compute-intensive applications in parallel, or it is simply "for science"?Disagree. Regular review sites don't know how to measure V-cache impact. AT writer Ganesh has an excellent multi-core performance hit benchmark methodology where he measures the impact of a compute intensive background application on the benchmark running in the foreground. If Gavin would care to learn that (or maybe AT or Future plc would pay royalties to Ganesh or whatever for using his testing methodology), we would know for sure if V-cache has a measurable impact in daily workload scenarios. I would love to test this myself but no one here wants to donate a V-cache CPU to me
And I blame AMD for not having 7800X3D available for sale when I bought my used 12700K!
Maybe harder to do on desktop but in laptops AMD can make a dual-stacked die that needs to be cooled on both sides (top and bottom) and the connections to the motherboard are on the four sides of the die. It's doable, if it will add a little thickness to the laptop. Most 15.6 inch and 17.3 inch laptops could accommodate that.If AMD figured either of:
a) cooling of the chip if entire CCD is covered by V-Cache
b) put V-Cache under the CCD
Then Wafer on Wafer stacking could be used, and it would overcome challenges you mentioned.
As far as "known good die", if V-Cache was so highly redundant that it would be close to 100%, yield would be affected minimally.
It would make browsing with multiple tabs and Microsoft Teams snappier, and may make other background workloads like a long running video encode job faster. Serious professionals may have a lot more usecases.Are there practical cases where you want to run several compute-intensive applications in parallel, or it is simply "for science"?
Engineers can do almost anything if sufficiently motivated or mandated to do from their management.