Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

gdansk · Jun 21, 2024

I still don't see how any IPC increase is Bulldozer 2.

SarahKerrigan · Jun 21, 2024

branch_suggestion said:
It is just really strange, as it seems like a single thread cannot utilise all core resources, did AMD design what is effectively Bulldozer 2: Electric Boogaloo?

This may blow your mind, but a bunch of structures have been statically partitioned for a while.

(Also, it's entirely possible that in 1t mode, the two frontends work like they do with Atom - early fetch/decode of branch targets.)

gdansk · Jun 21, 2024

poke01 said:
Yep, ST is very important. All things considered a mid release on the mobile platform in regards to that. However +66%MT is no joke, there will be people who want that.

In a laptop? Who? lol

poke01 · Jun 21, 2024

gdansk said:
In a laptop? Who? lol

People who don't like desktops( not me I love both)? Strix is essentially a 5950X in terms of MT.

techjunkie123 · Jun 21, 2024

To give them ST credit, they're at M3 family levels of performance in geekbench ST (~3000). So basically second best already (second only to M4).

SarahKerrigan · Jun 21, 2024

techjunkie123 said:
To give them ST credit, they're at M3 family levels of performance in geekbench ST (~3000). So basically second best already (second only to M4).

Yeah, the Bulldozer comparisons are stupid because Bulldozer was drastically behind Intel on single-thread perf. That's not the case here - not even close.

If they actually managed a huge jump in MT from replicated frontends, and also a small ST bump in the same gen, without blowing out area, that's interesting.

gdansk · Jun 21, 2024

poke01 said:
People who don't like desktops( not me I love both)? Strix is essentially a 5950X in terms of MT.

I guess. For laptops I figured most people would be interested in the M4/Lunar Lake type chips.

poke01 · Jun 21, 2024

gdansk said:
I guess. For laptops I figured most people would be interested in the M4/Lunar Lake type chips.

Yep most are. Its a very very tiny space.

poke01 · Jun 21, 2024

techjunkie123 said:
To give them ST credit, they're at M3 family levels of performance in geekbench ST (~3000). So basically second best already (second only to M4).

Yes but they did with a 1GHz boost compared to M3 and we'll have to see power numbers and I don't think they will be close to M3.

In GB5 at 5GHz Zen 5 strix scores around ~2300, M3 does ~2300 at 4.06GHz and M4 does ~2700 at 4.5GHz. AMD still has ways to go.

If we don't take clock and power into account then the 14900K should be the king but its a crappy product.

Hitman928 · Jun 21, 2024

CouncilorIrissa said:
Zen 5 抢先体验：Ryzen AI 9 365 (Strix Point SoC) 简单测试 | David Huang's Blog

9.71% in SIR 2017.

"In this test, a single Zen 5 thread still performs like a 4-decode x86 core. But when we enable two SMT threads for testing, we can see that the throughput doubles, and the instruction throughput reaches 8 in the L1-L2 and even L3 ranges, and in the DRAM range it returns to the same normal level as Zen 4."

@SarahKerrigan you were onto something, I guess

If they increased the die area this much and the main benefit is SMT, that is very disappointing. The AVX/SSE instruction results are also quite perplexing to me where only 512-bit stores are improved but everything else is the same or worse, with int add being twice as slow. I can't imagine that was intentional, lol. I'm not a CPU architect so I can't comment too much, but from an end user perspective, it seems like some misguided choices were made for this architecture. Pending final release silicon results, this is a pretty big disappointment.

Fjodor2001 · Jun 21, 2024

poke01 said:
However +66%MT is no joke, there will be people who want that.

Where does the +66% MT number come from? I've not seen it mentioned previously in the thread.

Hitman928 · Jun 21, 2024

SarahKerrigan said:
Yeah, the Bulldozer comparisons are stupid because Bulldozer was drastically behind Intel on single-thread perf. That's not the case here - not even close.

If they actually managed a huge jump in MT from replicated frontends, and also a small ST bump in the same gen, without blowing out area, that's interesting.

I wouldn't say they blew the area out, but it is a significantly larger core. Thankfully they were able to somehow significantly decrease their L3 cache size to offset it, but the core itself is still significantly bigger than Zen 4.

soresu · Jun 21, 2024

inf64 said:
they spoiled us with previous Zen iterations

To be fair they also led us on a merry hype train with the RDNA3 dual issue thing reporting twice the FLOPS/CU/clk despite that only being relevant to specific use cases.

Hitman928 · Jun 21, 2024

Huang's tests also don't show the 2x bandwidth in L1 cache, though L2 does show ~60% improvement with a weird spike up to 90% as L2 starts to get saturated.

soresu · Jun 21, 2024

branch_suggestion said:
as it seems like a single thread cannot utilise all core resources

What did you think SMT was for?

adroc_thurston · Jun 21, 2024

soresu said:
What did you think SMT was for?

Yeah, and this isn't built like SMT-centric throughput cores (hello POWER) anyway.

soresu · Jun 21, 2024

adroc_thurston said:
Yeah, and this isn't built like SMT-centric throughput cores (hello POWER) anyway.

Lol, POWER SMT be like....

Not sure navigating OpenToonz to resize the GIF was worth it for the joke, but have fun 😅

poke01 · Jun 21, 2024

Fjodor2001 said:
Where does the +66% MT number come from? I've not seen it mentioned previously in the thread.

Over hawk point? I think in Cinebench

Abwx · Jun 21, 2024

Hitman928 said:
Huang's tests also don't show the 2x bandwidth in L1 cache, though L2 does show ~60% improvement with a weird spike up to 90% as L2 starts to get saturated.

In GB6 text processing he measure about 10% while AMD state that it s 19%.
In GB5 and AES XTS he measure about 12-13% while AMD state that it s 35%, so dunno what is the validity of his tests or if frequencies where accurate.

inf64 · Jun 21, 2024

His tests are actually fine, considering that Zen 5 in Strix is castrated in a few ways. I wouldn't be surprised that Granite Ridge gets around 5% more IPC in 1T versus Strix which should put it close to ~16% figure AMD showed. Interesting that SMT might bring bigger uplift.

branch_suggestion · Jun 21, 2024

gdansk said:
I still don't see how any IPC increase is Bulldozer 2.

I'm not saying it is bad, but a different take on a similar goal of nT spam.

SarahKerrigan said:
This may blow your mind, but a bunch of structures have been statically partitioned for a while.

(Also, it's entirely possible that in 1t mode, the two frontends work like they do with Atom - early fetch/decode of branch targets.)

Oh I know, but 1t mode seemingly has more static partitions than Z4 had.
Perf traces are numerous and complex and having as much dynamic capability in a core is desirable over having missed IPC opportunities. Serial computing is hard, I like my parallel dumb ALU machines.

soresu said:
What did you think SMT was for?

SMT is great, but you want to try to avoid having net core performance reliant on SMT use.
Core should have high resource utilisation in 1t mode. Z5 looks very much like a server first core, far more than the previous Zen cores even.

Hitman928 · Jun 21, 2024

Abwx said:
In GB6 text processing he measure about 10% while AMD state that it s 19%.
In GB5 and AES XTS he measure about 12-13% while AMD state that it s 35%, so dunno what is the validity of his tests or if frequencies where accurate.

Yeah, that is peculiar as well, though there is still the ambiguity of what AMD actually tested since the end notes said it was a MT test. I’m inclined to believe it was just a typo in the end notes but this whole thing has been such a mess, anything is possible.

Hitman928 · Jun 21, 2024

inf64 said:
His tests are actually fine, considering that Zen 5 in Strix is castrated in a few ways. I wouldn't be surprised that Granite Ridge gets around 5% more IPC in 1T versus Strix which should put it close to ~16% figure AMD showed. Interesting that SMT might bring bigger uplift.

Maybe that explains the difference, but do we actually have evidence about STX being castrated besides these results?

soresu · Jun 21, 2024

branch_suggestion said:
Core should have high resource utilisation in 1t mode

The problem is that such considerations make the assumption that coding and compiler output is optimal, and we all know that very often that is anything but the case unfortunately.

SarahKerrigan · Jun 21, 2024

soresu said:
The problem is that such considerations make the assumption that coding and compiler output is optimal, and we all know that very often that is anything but the case unfortunately.

And that code has infinite ILP that can always fill the fill set of functional units.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Platinum Member

Senior member

Platinum Member

Golden Member

Junior Member

Senior member

Platinum Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Golden Member

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Platinum Member

Senior member