Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 703 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JustViewing

Senior member
Aug 17, 2022
216
382
106
Regarding Apple and Windows ARM performance, I would assume when application were compiled for them it would fully use available instruction set. Therefore it will perform near maximum theoretical performance. However, most x64 windows applications will only use lowest common denominator which is SSE2. Lucky to get AVX128 usage. Even then, Intel & AMD optimizations are not similar.
 

MS_AT

Senior member
Jul 15, 2024
210
504
96
Regarding Apple and Windows ARM performance, I would assume when application were compiled for them it would fully use available instruction set. Therefore it will perform near maximum theoretical performance. However, most x64 windows applications will only use lowest common denominator which is SSE2. Lucky to get AVX128 usage. Even then, Intel & AMD optimizations are not similar.
MacOS will use whatever is baked into the chips, thanks to vertical integration. For the other ARM chips, everybody still sticks to NEON, afaik Qualcomm disables SVE2 on their chips even if the support is baked in silicon, so the lowest common denominator is the baseline But yes there is much less fragmentation than what Intel did... [I wouldn't be suprised if AVX512 family would have more extensions than whole ARM-v8]
 

LightningZ71

Golden Member
Mar 10, 2017
1,783
2,139
136
Biggest problem for mobile isn't performance but battery life. Having separate low-power cluster will extend battery life as it can optimized better for low power and high performing parts can clock gated when not needed. But AMD's strix point version of that idea is not ideal - why have 8 cores on low power cluster making it to use more power and same time lower high perfromance mode?
The design for Strix Point doesn't make a whole lot of sense unless it was intended for N3 with FinFlex. The difference between the Zen5 and Zen5C cores could have been greater, with the Zen5 cores able to clock higher and the Zen5C cores having lower leakage. That also gives a bit of support to their decision to raise the size of the iGPU to 16CU while keeping memory bandwidth as limited as it is. With FinFlex, they could have targeted lower leakage fin layouts for the iGPU transistors while not sacrificing performance.
 
Reactions: Tlh97 and Joe NYC

naukkis

Senior member
Jun 5, 2002
878
755
136
MacOS will use whatever is baked into the chips, thanks to vertical integration. For the other ARM chips, everybody still sticks to NEON, afaik Qualcomm disables SVE2 on their chips even if the support is baked in silicon, so the lowest common denominator is the baseline But yes there is much less fragmentation than what Intel did... [I wouldn't be suprised if AVX512 family would have more extensions than whole ARM-v8]

Software using NEON and hardware made to execute NEON fast is best possible combination. SVE doesn't bring anything to table - ARM should also drop them from their silicon. CPU's should execute existing software fast - not implement new extensions which no software use now and probably won't use newer like what is happened in x86. They need fast SSEx FPUs and possibly 128 bit AVX any effort to wider SIMD is just wasted silicon space on consumer side.
 

LightningZ71

Golden Member
Mar 10, 2017
1,783
2,139
136
The power cost balance between having two completely different CCXs, one with 4 Zen5 cores and another with 4 Zen5c cores instead of a single CCX with all 8 cores on it would seem to still not favor two separate CCXs. The ZEn5C cores are dense first, not low power. This isn't some low power island situation. They are there to boost MT scores against a competition that was running away with it with an inferior process node. Combining the cores on a single CCX seems to have worked well enough on Phoenix2. I don't see the advantage of increasing the uncore data transfer demands by having two CCXs.
 

MS_AT

Senior member
Jul 15, 2024
210
504
96
Software using NEON and hardware made to execute NEON fast is best possible combination. SVE doesn't bring anything to table - ARM should also drop them from their silicon. CPU's should execute existing software fast - not implement new extensions which no software use now and probably won't use newer like what is happened in x86. They need fast SSEx FPUs and possibly 128 bit AVX any effort to wider SIMD is just wasted silicon space on consumer side.
Both AVX512 and NEON bring things to the table, with the register width being the least interesting of them... It is only thanks to the fragmentation intel forced that 128b AVX512 is behind 512b AVX512... And ARM is doing the right thing of uniformly pushing SVE in place of NEON. They try to avoid fragmentation in the ecosystem that is the real downside of AVX512 and x64.
 
Reactions: FlameTail

naukkis

Senior member
Jun 5, 2002
878
755
136
The power cost balance between having two completely different CCXs, one with 4 Zen5 cores and another with 4 Zen5c cores instead of a single CCX with all 8 cores on it would seem to still not favor two separate CCXs. The ZEn5C cores are dense first, not low power. This isn't some low power island situation. They are there to boost MT scores against a competition that was running away with it with an inferior process node. Combining the cores on a single CCX seems to have worked well enough on Phoenix2. I don't see the advantage of increasing the uncore data transfer demands by having two CCXs.

For thing laptop segment MT performance is irrelevant - battery life isn't. Intel start to get their solution right with Lunar Lake and AMD does need too or situation might soon be that AMD solutions only have half battery life of their rivals. Apple did show how to implement proper laptop cpu and rivals start to catch up soon.
 

naukkis

Senior member
Jun 5, 2002
878
755
136
Both AVX512 and NEON bring things to the table, with the register width being the least interesting of them... It is only thanks to the fragmentation intel forced that 128b AVX512 is behind 512b AVX512... And ARM is doing the right thing of uniformly pushing SVE in place of NEON. They try to avoid fragmentation in the ecosystem that is the real downside of AVX512 and x64.

Yeah and no mobile hardware maker is even offering SVE cpus because they want to offer performance to their consumers - which are running NEON software - and not some fairytale performance for spec nerds. Intel did far force than just AVX512, half of their cpu's won't still support even plain AVX, supporting only SSE4.2. So x86 platform won't get AVX as their software base in next 10 years - probably won't ever.
 
Reactions: carancho

LightningZ71

Golden Member
Mar 10, 2017
1,783
2,139
136
For thing laptop segment MT performance is irrelevant - battery life isn't. Intel start to get their solution right with Lunar Lake and AMD does need too or situation might soon be that AMD solutions only have half battery life of their rivals. Apple did show how to implement proper laptop cpu and rivals start to catch up soon.
MT performance in MOST laptops is PRACTICALLY irrelevant. Intel released laptop processors with 6 performance cores and 8 E-cores and IIRC has some with 16 e-cores. Those are MT monsters that AMD has to match from a marketing perspective WITHOUT having to use their Dragon Range repurposed desktop processors. Strix Point, in its current form, is a marketing exercise first and foremost.
 

naukkis

Senior member
Jun 5, 2002
878
755
136
Trolling/flame bait is not permitted. Please read and adhere to the forum rules.
Strix point seems worse and worse. It's ridiculous that it even cannot sustain max ST boost clocks on sub 30W devices - what full Zen5 @5,.7GHz consumes, full 100W @ ST workloads? Thats starts to be as ridiculous as Intel Raptor lake fiasco.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,397
136
Strix point seems worse and worse. It's ridiculous that it even cannot sustain max ST boost clocks on sub 30W devices - what full Zen5 @5,.7GHz consumes, full 100W @ ST workloads? Thats starts to be as ridiculous as Intel Raptor lake fiasco.

Are you sure it’s a power limit issue and not particular to that laptop design?
 

The Hardcard

Member
Oct 19, 2021
199
288
106
Which is why an 8+4 setup would have been perfect. 8 Zen5 cluster is used for maximum performance (both ST and MT), while the 4 Zen5C cluster would be used as a low power island for power efficiency. This is what ARM SoC vendors do. Use the small cores for power efficiency, and use the large cores for performance scaling.

By putting 8 Zen5C, it feels like AMD is instead following Intel's playbook of "E-core spam". Which is ironic, considering AMD took a dig at Intel with the "economy cores" joke.
Neither Intel’s nor AMD’s small cores are designed to be low power cores in the way that various ARM designer’s little cores are. I think an 8+4 design with just take up a lot more space without much difference in either performance or efficiency.

AMD c cores are just another innovative approach to throughput cores. They will by and large serve similarly to Intel’s E cores, both play a different role altogether than Apple’s E cores. AMD’s dig at Intel’s cores are because of the instruction set version mismatch that affects Intel’s design, not the use of smaller cores itself.

The key question in my opinion, as to whether or not they are a success is - limit yourself to how many regular Zen cores would fit in the same area as the Zen 5c cores and then given the multi-core frequency limits, which set gets more work done. Just looking without taking the time to do the calculations, it looks to me that the 8 5c cores would get a lot more work done than what would in practice be only 4 Zen 5 cores. I don’t think six cores would fit and 5 It’s not a practical amount, that’s why I think it’s 8 Zen 5c versus 4 Zen 5.
 

Nothingness

Diamond Member
Jul 3, 2013
3,031
1,972
136
Software using NEON and hardware made to execute NEON fast is best possible combination. SVE doesn't bring anything to table - ARM should also drop them from their silicon. CPU's should execute existing software fast - not implement new extensions which no software use now and probably won't use newer like what is happened in x86. They need fast SSEx FPUs and possibly 128 bit AVX any effort to wider SIMD is just wasted silicon space on consumer side.
SVE brings predicates and first fault ld/st to the table which can be quite useful for autovectorization. Some of these features were available starting with AVX-512 and were also added to Intel new AVX10.

That said, I agree most of the time hand tuned NEON code is as fast as 128-bit SVE. I still think the sweet spot is at 256-bit wide aka AVX2 or AVX10.2 with 256-bit vectors. And if area/power matters that much do as AMD did on Zen4 for AVX-512 use multiple uops on narrower paths; that did well on Zen4.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |