Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

adroc_thurston · May 10, 2024

AMDK11 said:
RaptorLake has no inter-chiplet latency problem and the RAM controller is on the same chip

ye

AMDK11 said:
Thanks to 2MB L2 instead of 1.25MB, RaptorLake gains approximately +4-5% higher IPC.

In select* workloads.

AMDK11 said:
Zen has a RAM controller on a separate IOD

d2d penalty is tiny.

AMDK11 said:
so it needs larger L3 to compensate

It needs larger L3 to compensate for a skinnier core with less reordering capacity.

AMDK11 said:
and this is mainly why it benefits from large L3 + VCache in games.

it's a skinnier core.

AMDK11 said:
Moreover, communication of one CCD with the neighboring CCD is via IOD.

Not that relevant.

AMDK11 said:
Zen 6 is expected to introduce a single CCD with 16 cores and a shared 64MB L3.

WRONG

gdansk · May 10, 2024

AMDK11 said:
shared 64MB L3

Really? On the same chip or another? Seems huge. People say SRAM scaling stalled on N5/N3. And if I'm eyeballing it correctly that'd be like 70% of a current CCD's size just for the 64MB L3.

adroc_thurston · May 10, 2024

gdansk said:
Really?

no

IronLynx · May 10, 2024

Guys, what processor is this: https://browser.geekbench.com/v6/cpu/6030993
EDIT: It's Phoenix(not Zen5). But the score is a bit high(for 4.1GHz!)

Any info about ROBs and op-cache in Zen5?

H433x0n · May 10, 2024

IronLynx said:
Guys, what processor is this: https://browser.geekbench.com/v6/cpu/6030993
EDIT: It's Phoenix(not Zen5). But the score is a bit high(for 4.1GHz!)

Any info about ROBs and op-cache in Zen5?

It’s Zen 4, family 1A is Zen 5 afaik (26 in decimal)

deasd · May 10, 2024

AMD Ryzen 9000 CPUs rumored to only have 10% IPC boost - but let's not panic over Zen 5 yet

We'll need more seasoning than usual with this claim from out of left-field that Zen 5 processors will only usher in a 10% generational IPC increase.

www.tweaktown.com

AMD Zen 5 CPUs Rumored To Feature Around 10% IPC Increase, Slightly More In Cinebench R23 Single-Thread Test

AMD's Zen 5 CPUs are rumored to feature an IPC increase of around 10% with the latest core architecture powering next-gen Ryzen & EPYC chips.

wccftech.com

So we now have:

9800X, 8 cores, 170w TDP
Clock regression, ~100Mhz
IPC, ~10% compared to Zen4 <NEW>

OMG. I strongly recommend Mike Clark don't wake up anytime soon and keep sleeping until Zen6.

CouncilorIrissa · May 10, 2024

IronLynx said:
Guys, what processor is this: https://browser.geekbench.com/v6/cpu/6030993
EDIT: It's Phoenix(not Zen5). But the score is a bit high(for 4.1GHz!)

Any info about ROBs and op-cache in Zen5?

It's Zen 4, Zen 5 has 48 KB L1D.

StefanR5R · May 10, 2024

Markfw said:
The following is ONLY from the perspective of a Distributed computing perspective. This means 2 things, computing power and efficiency. We use all cores and most of the time SMP.

Zen 1 : way better than bulldozer in all respects, and if I remember correctly cheaper and more efficient than the Intel counter parts.
Zen 2 : small improvements in performance, about the same efficiency as Zen 1.
Zen 3 : MUCH better performance AND efficiency compared to Zen 2 The larger L3 cache made a big difference in some apps.
Zen 4 : MUCH better performance than Zen 3, but about the same efficiency. But in apps that use avx-512, nothing could touch the performance. For primegrid, we had to disable SMT and pin cores to a CCX for maximum performance, but when we do, nothing that Intel has comes close.

No, there must have been mixed up something.

Zen 1 -> Zen 2: circa double the FP throughput per core, circa double the throughput/Watt
Zen 2 -> Zen 3: some throughput increase but barely any throughput/Watt increase in most cases, big benefit to special multithreaded workloads which have larger than 16 MB cache footprint
Zen 3 -> Zen 4: notably higher throughput and throughput/Watt, additional performance increase in vectorized FP workloads

in various Distributed Computing applications. (These are applications which are highly parallel/ almost entirely compute-bound/ power-limited workloads with FP focus. One could conclude that the manufacturing node updates are all what counts in this set of workloads. But really, microarchitecture updates <edit: and SOC updates> and node updates go hand in hand as they enable and leverage each other.)

[I don't have Zen 1/ Naples (but Broadwell-EP which has got similar throughput/Watt), nor do I have Zen 3 myself. I do have Zen 2/ Rome and Zen 4/ Genoa in machines which are configured to same core counts and similar power budgets. My conclusions relative to Zen 1 and Zen 3 rely on what I have seen from others' computers.]

Zen 5 in Distributed Computing? I trust that AMD carves out a decent perf/W update once again, despite only a minor manufacturing node update. But how much? Various hints earlier in this thread sounded promising to me. Though so far, 1T or/and iso-clock or/and integer performance characteristics have been more of a focus in this thread so far, rather than nT iso-power FP.

Markfw said:
For primegrid, we had to disable SMT and pin cores to a CCX for maximum performance,

Actually SMT does measurably improve throughput in PrimeGrid on Zen 4, desktop and server, and does improve perf/W slightly. In contrast, on Zen 2 and Zen 3, SMT usage in PrimeGrid provides no or sometimes a small host throughput advantage but always reduces perf/W. (PrimeGrid is vectorized FP with large cache footprint, but not too large on Zen 3 and 4 if the user gives hints to the OS's process scheduler. Zen 2's cache is too small in many but not all of PrimeGrid's currently active projects.)

StefanR5R · May 10, 2024

StefanR5R said:
Zen 5 in Distributed Computing? I trust that AMD carves out a decent perf/W update once again, despite only a minor manufacturing node update. But how much? Various hints earlier in this thread sounded promising to me. Though so far, 1T or/and iso-clock or/and integer performance characteristics have been more of a focus in this thread so far, rather than nT iso-power FP.

PS, in the context of this specific application scenario, I expect that the top-end Zen 5 desktop SKUs with 230 W PPT limit (170 W TDP) make considerably more efficient use of this power budget, compared to their direct predecessors. Which would make them more attractive to somebody like myself who is interested in perf/Watt and in perf/host.

naukkis · May 10, 2024

adroc_thurston said:
No, they benefit from total cache capacity.
(They also love fat L2's, as you've seen in RPL).

Separate cache pools waste most of cache capacity to duplication. And AMD did make it clear that unified 32MB cache pool of Zen3 is responsible for most part of game speed up. https://www.amd.com/en/technologies/zen-core

leoneazzurro · May 10, 2024

That's mainly for a different reason - that is, eliminating the inter-CCX penalty that made Zen2 suffer

adroc_thurston · May 10, 2024

naukkis said:
Separate cache pools waste most of cache capacity to duplication

Ughh not quite.

naukkis said:
And AMD did make it clear that unified 32MB cache pool of Zen3 is responsible for most part of game speed up.

Not even.

leoneazzurro said:
That's mainly for a different reason - that is, eliminating the inter-CCX penalty that made Zen2 suffer

Not even that, Zen3 was a major improvement in other ways, most notably branch prediction.

naukkis · May 10, 2024

adroc_thurston said:
Ughh not quite.

Not even.

Not even that, Zen3 was a major improvement in other ways, most notably branch prediction.

Zen3 doubled 1-thread cache. That's always massive uplift for cache sensitive applications and gave some places 100% IPC uplift. Direct AMD quote from link posted before: "It also transitioned to a new "unified complex" design that brought 8 cores and 32MB of L3 cache into a single group of resources. This dramatically reduced core-to-core and core-to-cache latencies by making every element of the die a next-door neighbor with minimal communication time. Latency-sensitive tasks like PC gaming especially benefited from this change, as tasks now have direct access to twice as much L3 cache versus "Zen 2."

branch_suggestion · May 10, 2024

Kinda wish C&C did more profile testing on games where Z3 is miles ahead of Z2. I think people have assumed it is the cache when it could more so be something core specific.

naukkis · May 10, 2024

branch_suggestion said:
Kinda wish C&C did more profile testing on games where Z3 is miles ahead of Z2. I think people have assumed it is the cache when it could more so be something core specific.

Need only compare 5700 against 5700x. 5700 has a bit better memory latencies from unified design but only half a cache. It's can also compared to Zen2 with comparable 16mb CCX caches. https://www.techspot.com/review/2802-amd-ryzen-5700/

StefanR5R · May 10, 2024

F.P.S. ≠ I.P.C.

Gideon · May 10, 2024

Vattila said:
I have noticed that there is less contribution from the technically inclined posters (including myself), so a separation of threads and focus may help to keep likeminded forum members engaged.

Sorta OT bu interesting stuff is happening at Chips & Cheese:

Chips and Cheese State of the Union

In December of 2020, a group of friends came together to form Chips & Cheese. It started out as a simple outlet for our in-depth investigations into hardware but quickly turned into a much bigg…

chipsandcheese.com

Among other things:

The first project we have been working on is a new microbenchmark framework. This new framework will hopefully allow for standardization between different tests to keep things as consistent as possible. In the short term, this will also allow folks other than the Chips and Cheese team to add to the Chips and Cheese test suite.

In the long term, we hope that this framework will allow for more tests to be written than the current Chips and Cheese team could ever write on its own, along with diversifying test authors.

So it seems at least more high-quality tech-content is in order

BTW This surprised me:

As for the members of the board:
...

George J. Cozma: President and Chairman

Dr. Ian Cutress: Vice President

Ryan Mull: Treasurer

...

Fjodor2001 · May 10, 2024

deasd said:
So we now have:

9800X, 8 cores, 170w TDP
Clock regression, ~100Mhz
IPC, ~10% compared to Zen4 <NEW>

OMG. I strongly recommend Mike Clark don't wake up anytime soon and keep sleeping until Zen6.

It's probably an invented leak by Micro Center because they want to clear their Zen4 stock before the Zen5 announcement at Computex. 🤣

AMDK11 · May 10, 2024

branch_suggestion said:
Kinda wish C&C did more profile testing on games where Z3 is miles ahead of Z2. I think people have assumed it is the cache when it could more so be something core specific.

Core architecture changes + unified L3 cache as a whole architecture. I don't know how you can still think that L3 was completely irrelevant to IPC.

StefanR5R said:
F.P.S. ≠ I.P.C.

Not true. If core A 4GHz + VCache compared to core B 4GHz without VCache allows you to get +15% more FPS, this is an increase in the IPC of the processor.

I don't know how you can deny facts and logic. Massacre

dhruvdh · May 10, 2024

I think we have had enough discussion on cache and IPC. I hope we can move on.

StefanR5R · May 10, 2024

"Instructions Per Cycle" means instructions per cycle. Is that so hard to memorize?

Edit, as an example, when one processor spins on a lock for 0.2 ms, and the other for 0.3 ms, which of the two processors got the higher Instructions Per Cycle count?

AMDK11 · May 10, 2024

StefanR5R said:
"Instructions Per Cycle" means instructions per cycle. Is that so hard to memorize?

Edit, as an example, when one processor spins on a lock for 0.2 ms, and the other for 0.3 ms, which of the two processors got the higher Instructions Per Cycle count?

A processor with a 0.2 ms latch can process more instructions per cycle than a processor with a 0.3 ms latch.

Less downtime means the core can accept more instructions (data) and process more of them at the same time.

This is an increase in the number of instructions processed per cycle, or IPC.

The goal of next-generation architecture designs is to reduce latency (core downtime/empty cycles) and increase throughput to maximize core saturation with data (instructions).

And it doesn't matter whether it's L0, L1, L2, L3, or even L4 cache, the cache is always part of the architecture and is designed to allow the core architecture to achieve higher IPC.

Will you continue to try to distort reality?

StefanR5R · May 10, 2024

AMDK11 said:
Will you continue to try to distort reality?

If you need to know: I can't continue something which I never started. Just take what I wrote and avoid to add meaning which is not in there.

BTW, the wording in the quote could come across as an attempt on an insult on first glance, but since we are in a technical forum, I am sure this was not the intention.

AMDK11 · May 10, 2024

I think that the topic of the impact of cache and delays on IPC has been developed so much that it is obvious. Let's go ahead and continue the topic of Zen5.

Wolverine2349 · May 10, 2024

leoneazzurro said:
That's mainly for a different reason - that is, eliminating the inter-CCX penalty that made Zen2 suffer

Yes good point. And yet it stopped at 8 cores on a single CCX/CCD with Zen 3 and no further improvement with Zen 4 and 5 and probably beyond as well. Beyond 8 cores there is still the inter CCX/CCD penalty that makes CPUs suffer.

When will there if ever be more than 8 cores on a single CCX/CCD. Zen 4 no and Zen 5 also no and even Zen 6 appears to be no though that still too far out.

Given games this matters a lot would be nice to see. True no more than 8 cores needed for gaming for now, though there are a few games that marginally benefit from more than 8 cores and that number may increase over time to where it matters even more. I hope it does not for any games unless they get a CPU with more than 8 cores on a CCD very soon.

DO not want dual CCD for the rare games that can schedule around latency penalty. Best to have one size fits all for all gaming scenarios as many games dual CCDs is a bad latency penalty.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Platinum Member

Diamond Member

Junior Member

Golden Member

Senior member

Senior member

Elite Member

Elite Member

Senior member

Golden Member

Diamond Member

Senior member

Senior member

Senior member

Elite Member

Golden Member

Diamond Member

Senior member

Junior Member

Elite Member

Senior member

Elite Member

Senior member

Senior member