Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

igor_kavinski · Mar 13, 2025

adroc_thurston said:
MALL only makes sense for GPUs.

Broadwell, Crystal Well etc. showed that it can matter for certain applications especially with limited data sets that need to do repetitive computations, like game engines.

But I think MALLs can matter more if a hardware/software profiling solution is developed so the same required data does not keep on getting evicted and reloaded over and over again just because there are short periods where that data isn't needed.

adroc_thurston · Mar 13, 2025

igor_kavinski said:
Broadwell, Crystal Well etc. showed that it can matter for certain applications especially with limited data sets that need to do repetitive computations, like game engines.

Good news everyone! V$ exists.

igor_kavinski · Mar 13, 2025

adroc_thurston said:
Good news everyone! V$ exists.

Yeah but we need 512MB of that. At least I won't be satisfied till that happens.

adroc_thurston · Mar 13, 2025

igor_kavinski said:
Yeah but we need 512MB of that.

no you don't.

igor_kavinski said:
At least I won't be satisfied till that happens.

Too many sets for too many pains. Do not.

LightningZ71 · Mar 13, 2025

igor_kavinski said:
Yeah but we need 512MB of that. At least I won't be satisfied till that happens

In general, the miss rate on a last level cache halves as the size of the cache quadruples. For example, if your hit rate on a 512Kb cache was 90%, your miss rate would be 10%. If you doubled that cache twice, to 2 MB, you would improve the miss rate to 5% and the hit rate to 95%. It would make a noticeable difference only in programs that have a hot working set that now fits in the expanded cache, but spilled before. Those are very general numbers for x86 code as the effect is still HIGHLY dependent on the hot working set size of each program.

Be aware that, for every doubling of cache size, you are going to introduce additional access latency as well as additional latency in any memory operations that result from a miss when seen from the program itself, OR, you will make the design of the cache more complex, taking up more area, resulting in additional product cost. Eventually, you just aren't making any useful impact in working set latencies and will have to resort to LOTS of predictive extra data loads from main memory to attempt to preload the cache with data that you think that the program will need next. This burns up a lot of energy making memory calls that are often unneeded.

I think that AMD is currently happy with their L3 cache ratio and may look to maintain that ratio into larger CCXs with respect to VCache packages.

igor_kavinski · Mar 13, 2025

LightningZ71 said:
Eventually, you just aren't making any useful impact in working set latencies and will have to resort to LOTS of predictive extra data loads from main memory to attempt to preload the cache with data that you think that the program will need next. This burns up a lot of energy making memory calls that are often unneeded.

This should be exposed as a BIOS option and let the users make that call. I personally have no issue burning a few extra watts for maximum performance.

LightningZ71 · Mar 13, 2025

I vaguely remember from long ago that there were processors that had bios settings where you could turn cache prefetch on and off. It's been a minute, I've slept since then, and there may have been an alcohol or two in my system along the way, so that's about all I have at the moment.

igor_kavinski · Mar 13, 2025

Cache prefetching is available in some Intel BIOS (turning it off causes a serious performance hit). I haven't seen it in the Epyc BIOS though.

MS_AT · Mar 13, 2025

LightningZ71 said:
I vaguely remember from long ago that there were processors that had bios settings where you could turn cache prefetch on and off. It's been a minute, I've slept since then, and there may have been an alcohol or two in my system along the way, so that's about all I have at the moment.

It should be available on AM5. Usually the option can be found from AMD specifc menu but your mileage may vary, depending on the manufacturer.

igor_kavinski · Mar 13, 2025

MS_AT said:
It should be available on AM5. Usually the option can be found from AMD specifc menu but your mileage may vary, depending on the manufacturer.

Guess I'll find out when I get my 9900X up and running, provided some gremlin doesn't steal it on its way to me.

EDIT: They sent me a 7600X so I initiated a return

auvix · Mar 17, 2025

That's great. But where's the money to buy it now?

csbin · Mar 18, 2025

Zen 5 Mesh

https://twitter.com/x/status/1901932243842687454

fastandfurious6 · Mar 18, 2025

csbin said:
Zen 5 Mesh

https://twitter.com/x/status/1901932243842687454

it's from here https://ieeexplore.ieee.org/document/10904529

how do we get the document?

MS_AT · Mar 18, 2025

fastandfurious6 said:
how do we get the document?

from techtechpotato stream or somebody shared a link to japanese site around the same time, that reprinted most of the slides if I remember correctly. In the stream you have both the article (skip backward) and the presentation slides.

yuri69 · Mar 18, 2025

As usual, a subset of AMD's ISSCC slides are available at PC Watch.

Markfw · Mar 18, 2025

MS_AT said:
from techtechpotato stream or somebody shared a link to japanese site around the same time, that reprinted most of the slides if I remember correctly. In the stream you have both the article (skip backward) and the presentation slides.

Bottom line, which is better, 18a or N2 ?

igor_kavinski · Mar 18, 2025

AMD Ryzen 9 9950X3D Review

In our AMD Ryzen 9 9950X3D mini review, we see what the new 16-core 32-thread 3D V-Cache parts offer compared to previous generations

www.servethehome.com

With all of this said, if AMD gave us the option to have 3D V-Cache on both CCDs so that we could avoid having one set of lower cache and one set of higher cache cores, I think a lot of folks would be interested.

fastandfurious6 · Mar 18, 2025

I went ahead and tried out on-CPU AI performance, mostly just for the fun of it. Armed with 64GB of RAM, I was able to test deepseek-r1:32b at an average token rate of 3.346 tokens/s.

not bad at all

Hail The Brain Slug · Mar 18, 2025

igor_kavinski said:
With all of this said, if AMD gave us the option to have 3D V-Cache on both CCDs so that we could avoid having one set of lower cache and one set of higher cache cores, I think a lot of folks would be interested.

But... anyone who thinks they want that is ignorant and a fool.

Thunder 57 · Mar 18, 2025

Hail The Brain Slug said:
But... anyone who thinks they want that is ignorant and a fool.

I never said that. I know a lot of people want to see it. I just don't think it'll end up resulting in what they expect.

Look at the 9950X3D. It's no better in productivity than a 9950X in just about everything. Why would a dual Vcache version be any different? It would just be a waste of Vcache when AMD can't keep them in stock.

Hail The Brain Slug · Mar 18, 2025

Thunder 57 said:
I never said that. I know a lot of people want to see it. I just don't think it'll end up resulting in what they expect.

Look at the 9950X3D. It's no better in productivity than a 9950X in just about everything. Why would a dual Vcache version be any different? It would just be a waste of Vcache when AMD can't keep them in stock.

That you feel singled out here is extremely telling.

MS_AT · Mar 18, 2025

Btw, for people who would like to have 2 x3D CCDs, what are your expectations? What would be better in your opinion?

igor_kavinski · Mar 18, 2025

I think power use would go down even further in multitasking scenarios as memory trips would get reduced a lot.

fastandfurious6 · Mar 19, 2025

Few more new slides from AMD AI Summit in Beijing

AMD於AI PC創新峰會分析個人電腦AI發展趨勢，預告將推出行動版Ryzen 9 9955HX3D處理器

AMD與北京舉行AI PC創新峰會，分享對個人電腦AI發展的趨勢分析，並展示專為電競筆電設計Ryzen 9000HX系列處理器，以及搭載3D V-Cache的Ryzen 9 9955HX3D處理器。

www.techbang.com

Win2012R2 · Mar 19, 2025

MS_AT said:
Btw, for people who would like to have 2 x3D CCDs, what are your expectations? What would be better in your opinion?

Determinism.

Chiplets should be exactly the same - including max frequencies, cache levels etc.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Lifer

Diamond Member

Lifer

Diamond Member

Platinum Member

Lifer

Platinum Member

Lifer

Senior member

Lifer

Junior Member

Senior member

Senior member

Senior member

Senior member

Moderator Emeritus, Elite Member

Lifer

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Lifer

Senior member

Senior member