Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

jdubs03 · Jul 31, 2024

Joe NYC said:
Another pricing leak

https://twitter.com/x/status/1818631374707761456

Can confirm, $599 right on the money

CouncilorIrissa · Jul 31, 2024

CouncilorIrissa said:
$649, $499, $349, $269 would be my guess.

well, I was $50 off on double-CCD chips

I guess the lower pricing is expected given that the chips aren't that great.

Hail The Brain Slug · Jul 31, 2024

279 and 359 don't make much sense to me to fit in with 449 and 599 pricing- it's $7-9 more per core than the dual CCD skus. I would have expected something more like 249/329/449/599

CouncilorIrissa · Jul 31, 2024

Hail The Brain Slug said:
279 and 359 don't make much sense to me to fit in with 449 and 599 pricing- it's $7-9 more per core than the dual CCD skus. I would have expected something more like 249/329/449/599

They want you to pay premium for having latest and greatest (?) uarch.

techjunkie123 · Jul 31, 2024

CouncilorIrissa said:
They want you to pay premium for having latest and greatest (?) uarch.

Added bonus: your chip doesn't spontaneously combust.

Saylick · Jul 31, 2024

Hail The Brain Slug said:
279 and 359 don't make much sense to me to fit in with 449 and 599 pricing- it's $7-9 more per core than the dual CCD skus. I would have expected something more like 249/329/449/599

You still have to pay for an IO die even with 1 CCD…

JustViewing · Jul 31, 2024

Regarding Apple and Windows ARM performance, I would assume when application were compiled for them it would fully use available instruction set. Therefore it will perform near maximum theoretical performance. However, most x64 windows applications will only use lowest common denominator which is SSE2. Lucky to get AVX128 usage. Even then, Intel & AMD optimizations are not similar.

Josh128 · Jul 31, 2024

Pricing is meh, especially for 1 CCD SKUs. $329 for 9700X would have been irresistable. As is stands, I'll probably wait a bit.

MS_AT · Jul 31, 2024

JustViewing said:
Regarding Apple and Windows ARM performance, I would assume when application were compiled for them it would fully use available instruction set. Therefore it will perform near maximum theoretical performance. However, most x64 windows applications will only use lowest common denominator which is SSE2. Lucky to get AVX128 usage. Even then, Intel & AMD optimizations are not similar.

MacOS will use whatever is baked into the chips, thanks to vertical integration. For the other ARM chips, everybody still sticks to NEON, afaik Qualcomm disables SVE2 on their chips even if the support is baked in silicon, so the lowest common denominator is the baseline But yes there is much less fragmentation than what Intel did... [I wouldn't be suprised if AVX512 family would have more extensions than whole ARM-v8]

LightningZ71 · Jul 31, 2024

naukkis said:
Biggest problem for mobile isn't performance but battery life. Having separate low-power cluster will extend battery life as it can optimized better for low power and high performing parts can clock gated when not needed. But AMD's strix point version of that idea is not ideal - why have 8 cores on low power cluster making it to use more power and same time lower high perfromance mode?

The design for Strix Point doesn't make a whole lot of sense unless it was intended for N3 with FinFlex. The difference between the Zen5 and Zen5C cores could have been greater, with the Zen5 cores able to clock higher and the Zen5C cores having lower leakage. That also gives a bit of support to their decision to raise the size of the iGPU to 16CU while keeping memory bandwidth as limited as it is. With FinFlex, they could have targeted lower leakage fin layouts for the iGPU transistors while not sacrificing performance.

naukkis · Jul 31, 2024

MS_AT said:
MacOS will use whatever is baked into the chips, thanks to vertical integration. For the other ARM chips, everybody still sticks to NEON, afaik Qualcomm disables SVE2 on their chips even if the support is baked in silicon, so the lowest common denominator is the baseline But yes there is much less fragmentation than what Intel did... [I wouldn't be suprised if AVX512 family would have more extensions than whole ARM-v8]

Software using NEON and hardware made to execute NEON fast is best possible combination. SVE doesn't bring anything to table - ARM should also drop them from their silicon. CPU's should execute existing software fast - not implement new extensions which no software use now and probably won't use newer like what is happened in x86. They need fast SSEx FPUs and possibly 128 bit AVX any effort to wider SIMD is just wasted silicon space on consumer side.

LightningZ71 · Jul 31, 2024

The power cost balance between having two completely different CCXs, one with 4 Zen5 cores and another with 4 Zen5c cores instead of a single CCX with all 8 cores on it would seem to still not favor two separate CCXs. The ZEn5C cores are dense first, not low power. This isn't some low power island situation. They are there to boost MT scores against a competition that was running away with it with an inferior process node. Combining the cores on a single CCX seems to have worked well enough on Phoenix2. I don't see the advantage of increasing the uncore data transfer demands by having two CCXs.

MS_AT · Jul 31, 2024

naukkis said:
Software using NEON and hardware made to execute NEON fast is best possible combination. SVE doesn't bring anything to table - ARM should also drop them from their silicon. CPU's should execute existing software fast - not implement new extensions which no software use now and probably won't use newer like what is happened in x86. They need fast SSEx FPUs and possibly 128 bit AVX any effort to wider SIMD is just wasted silicon space on consumer side.

Both AVX512 and NEON bring things to the table, with the register width being the least interesting of them... It is only thanks to the fragmentation intel forced that 128b AVX512 is behind 512b AVX512... And ARM is doing the right thing of uniformly pushing SVE in place of NEON. They try to avoid fragmentation in the ecosystem that is the real downside of AVX512 and x64.

naukkis · Jul 31, 2024

LightningZ71 said:
The power cost balance between having two completely different CCXs, one with 4 Zen5 cores and another with 4 Zen5c cores instead of a single CCX with all 8 cores on it would seem to still not favor two separate CCXs. The ZEn5C cores are dense first, not low power. This isn't some low power island situation. They are there to boost MT scores against a competition that was running away with it with an inferior process node. Combining the cores on a single CCX seems to have worked well enough on Phoenix2. I don't see the advantage of increasing the uncore data transfer demands by having two CCXs.

For thing laptop segment MT performance is irrelevant - battery life isn't. Intel start to get their solution right with Lunar Lake and AMD does need too or situation might soon be that AMD solutions only have half battery life of their rivals. Apple did show how to implement proper laptop cpu and rivals start to catch up soon.

naukkis · Jul 31, 2024

MS_AT said:
Both AVX512 and NEON bring things to the table, with the register width being the least interesting of them... It is only thanks to the fragmentation intel forced that 128b AVX512 is behind 512b AVX512... And ARM is doing the right thing of uniformly pushing SVE in place of NEON. They try to avoid fragmentation in the ecosystem that is the real downside of AVX512 and x64.

Yeah and no mobile hardware maker is even offering SVE cpus because they want to offer performance to their consumers - which are running NEON software - and not some fairytale performance for spec nerds. Intel did far force than just AVX512, half of their cpu's won't still support even plain AVX, supporting only SSE4.2. So x86 platform won't get AVX as their software base in next 10 years - probably won't ever.

LightningZ71 · Jul 31, 2024

naukkis said:
For thing laptop segment MT performance is irrelevant - battery life isn't. Intel start to get their solution right with Lunar Lake and AMD does need too or situation might soon be that AMD solutions only have half battery life of their rivals. Apple did show how to implement proper laptop cpu and rivals start to catch up soon.

MT performance in MOST laptops is PRACTICALLY irrelevant. Intel released laptop processors with 6 performance cores and 8 E-cores and IIRC has some with 16 e-cores. Those are MT monsters that AMD has to match from a marketing perspective WITHOUT having to use their Dragon Range repurposed desktop processors. Strix Point, in its current form, is a marketing exercise first and foremost.

Jan Olšan · Jul 31, 2024

poke01 said:
@Jan Olšan great article(I think you are the author right?). Goes in depth and nicely explained.

Thanks.

It's just a summary of what AMD has officially said about the core, I can't do profiling and microbenchmarking-based analysis like Chips & Cheese (I'm really looking forward to to their examination of the core).

naukkis · Jul 31, 2024

Strix point seems worse and worse. It's ridiculous that it even cannot sustain max ST boost clocks on sub 30W devices - what full Zen5 @5,.7GHz consumes, full 100W @ ST workloads? Thats starts to be as ridiculous as Intel Raptor lake fiasco.

DrMrLordX · Jul 31, 2024

naukkis said:
Thats starts to be as ridiculous as Intel Raptor lake fiasco.

Hey at least they aren't self-destructing.

Hitman928 · Jul 31, 2024

naukkis said:
Strix point seems worse and worse. It's ridiculous that it even cannot sustain max ST boost clocks on sub 30W devices - what full Zen5 @5,.7GHz consumes, full 100W @ ST workloads? Thats starts to be as ridiculous as Intel Raptor lake fiasco.

Are you sure it’s a power limit issue and not particular to that laptop design?

naukkis · Jul 31, 2024

DrMrLordX said:
Hey at least they aren't self-destructing.

Strix point big Zen5 cores seems to be so inefficient that performance increasing bios update probably just shut down that big core CCX to allow more power to iGPU.

gdansk · Jul 31, 2024

Even surface temperature of laptop can be used to throttle CPU clock speed.
It could also explain why that device throttles before tjmax.

StefanR5R · Jul 31, 2024

So... there is a thin & light laptop which does not run at 5.1 GHz constantly. And this is a fiasco.

OK.

To me a fiasco is if a laptop doesn't have a keyboard with concave keys and enough travel, or lacks a trackpoint, for example.

The Hardcard · Jul 31, 2024

FlameTail said:
Which is why an 8+4 setup would have been perfect. 8 Zen5 cluster is used for maximum performance (both ST and MT), while the 4 Zen5C cluster would be used as a low power island for power efficiency. This is what ARM SoC vendors do. Use the small cores for power efficiency, and use the large cores for performance scaling.

By putting 8 Zen5C, it feels like AMD is instead following Intel's playbook of "E-core spam". Which is ironic, considering AMD took a dig at Intel with the "economy cores" joke.

Neither Intel’s nor AMD’s small cores are designed to be low power cores in the way that various ARM designer’s little cores are. I think an 8+4 design with just take up a lot more space without much difference in either performance or efficiency.

AMD c cores are just another innovative approach to throughput cores. They will by and large serve similarly to Intel’s E cores, both play a different role altogether than Apple’s E cores. AMD’s dig at Intel’s cores are because of the instruction set version mismatch that affects Intel’s design, not the use of smaller cores itself.

The key question in my opinion, as to whether or not they are a success is - limit yourself to how many regular Zen cores would fit in the same area as the Zen 5c cores and then given the multi-core frequency limits, which set gets more work done. Just looking without taking the time to do the calculations, it looks to me that the 8 5c cores would get a lot more work done than what would in practice be only 4 Zen 5 cores. I don’t think six cores would fit and 5 It’s not a practical amount, that’s why I think it’s 8 Zen 5c versus 4 Zen 5.

Nothingness · Jul 31, 2024

naukkis said:
Software using NEON and hardware made to execute NEON fast is best possible combination. SVE doesn't bring anything to table - ARM should also drop them from their silicon. CPU's should execute existing software fast - not implement new extensions which no software use now and probably won't use newer like what is happened in x86. They need fast SSEx FPUs and possibly 128 bit AVX any effort to wider SIMD is just wasted silicon space on consumer side.

SVE brings predicates and first fault ld/st to the table which can be quite useful for autovectorization. Some of these features were available starting with AVX-512 and were also added to Intel new AVX10.

That said, I agree most of the time hand tuned NEON code is as fast as 128-bit SVE. I still think the sweet spot is at 256-bit wide aka AVX2 or AVX10.2 with 256-bit vectors. And if area/power matters that much do as AMD did on Zen4 for AVX-512 use multiple uops on narrower paths; that did well on Zen4.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Senior member

Diamond Member

Senior member

Member

Diamond Member

Senior member

Senior member

Senior member

Golden Member

Senior member

Golden Member

Senior member

Senior member

Senior member

Golden Member

Senior member

Senior member

Lifer

Diamond Member

Senior member

Platinum Member

Elite Member

Member

Diamond Member