- Mar 3, 2017
- 1,747
- 6,598
- 136
well, I was $50 off on double-CCD chips$649, $499, $349, $269 would be my guess.
They want you to pay premium for having latest and greatest (?) uarch.279 and 359 don't make much sense to me to fit in with 449 and 599 pricing- it's $7-9 more per core than the dual CCD skus. I would have expected something more like 249/329/449/599
Added bonus: your chip doesn't spontaneously combust.They want you to pay premium for having latest and greatest (?) uarch.
You still have to pay for an IO die even with 1 CCD…279 and 359 don't make much sense to me to fit in with 449 and 599 pricing- it's $7-9 more per core than the dual CCD skus. I would have expected something more like 249/329/449/599
MacOS will use whatever is baked into the chips, thanks to vertical integration. For the other ARM chips, everybody still sticks to NEON, afaik Qualcomm disables SVE2 on their chips even if the support is baked in silicon, so the lowest common denominator is the baseline But yes there is much less fragmentation than what Intel did... [I wouldn't be suprised if AVX512 family would have more extensions than whole ARM-v8]Regarding Apple and Windows ARM performance, I would assume when application were compiled for them it would fully use available instruction set. Therefore it will perform near maximum theoretical performance. However, most x64 windows applications will only use lowest common denominator which is SSE2. Lucky to get AVX128 usage. Even then, Intel & AMD optimizations are not similar.
The design for Strix Point doesn't make a whole lot of sense unless it was intended for N3 with FinFlex. The difference between the Zen5 and Zen5C cores could have been greater, with the Zen5 cores able to clock higher and the Zen5C cores having lower leakage. That also gives a bit of support to their decision to raise the size of the iGPU to 16CU while keeping memory bandwidth as limited as it is. With FinFlex, they could have targeted lower leakage fin layouts for the iGPU transistors while not sacrificing performance.Biggest problem for mobile isn't performance but battery life. Having separate low-power cluster will extend battery life as it can optimized better for low power and high performing parts can clock gated when not needed. But AMD's strix point version of that idea is not ideal - why have 8 cores on low power cluster making it to use more power and same time lower high perfromance mode?
MacOS will use whatever is baked into the chips, thanks to vertical integration. For the other ARM chips, everybody still sticks to NEON, afaik Qualcomm disables SVE2 on their chips even if the support is baked in silicon, so the lowest common denominator is the baseline But yes there is much less fragmentation than what Intel did... [I wouldn't be suprised if AVX512 family would have more extensions than whole ARM-v8]
Both AVX512 and NEON bring things to the table, with the register width being the least interesting of them... It is only thanks to the fragmentation intel forced that 128b AVX512 is behind 512b AVX512... And ARM is doing the right thing of uniformly pushing SVE in place of NEON. They try to avoid fragmentation in the ecosystem that is the real downside of AVX512 and x64.Software using NEON and hardware made to execute NEON fast is best possible combination. SVE doesn't bring anything to table - ARM should also drop them from their silicon. CPU's should execute existing software fast - not implement new extensions which no software use now and probably won't use newer like what is happened in x86. They need fast SSEx FPUs and possibly 128 bit AVX any effort to wider SIMD is just wasted silicon space on consumer side.
The power cost balance between having two completely different CCXs, one with 4 Zen5 cores and another with 4 Zen5c cores instead of a single CCX with all 8 cores on it would seem to still not favor two separate CCXs. The ZEn5C cores are dense first, not low power. This isn't some low power island situation. They are there to boost MT scores against a competition that was running away with it with an inferior process node. Combining the cores on a single CCX seems to have worked well enough on Phoenix2. I don't see the advantage of increasing the uncore data transfer demands by having two CCXs.
Both AVX512 and NEON bring things to the table, with the register width being the least interesting of them... It is only thanks to the fragmentation intel forced that 128b AVX512 is behind 512b AVX512... And ARM is doing the right thing of uniformly pushing SVE in place of NEON. They try to avoid fragmentation in the ecosystem that is the real downside of AVX512 and x64.
MT performance in MOST laptops is PRACTICALLY irrelevant. Intel released laptop processors with 6 performance cores and 8 E-cores and IIRC has some with 16 e-cores. Those are MT monsters that AMD has to match from a marketing perspective WITHOUT having to use their Dragon Range repurposed desktop processors. Strix Point, in its current form, is a marketing exercise first and foremost.For thing laptop segment MT performance is irrelevant - battery life isn't. Intel start to get their solution right with Lunar Lake and AMD does need too or situation might soon be that AMD solutions only have half battery life of their rivals. Apple did show how to implement proper laptop cpu and rivals start to catch up soon.
Thanks.@Jan Olšan great article(I think you are the author right?). Goes in depth and nicely explained.
Thats starts to be as ridiculous as Intel Raptor lake fiasco.
Strix point seems worse and worse. It's ridiculous that it even cannot sustain max ST boost clocks on sub 30W devices - what full Zen5 @5,.7GHz consumes, full 100W @ ST workloads? Thats starts to be as ridiculous as Intel Raptor lake fiasco.
Hey at least they aren't self-destructing.
Neither Intel’s nor AMD’s small cores are designed to be low power cores in the way that various ARM designer’s little cores are. I think an 8+4 design with just take up a lot more space without much difference in either performance or efficiency.Which is why an 8+4 setup would have been perfect. 8 Zen5 cluster is used for maximum performance (both ST and MT), while the 4 Zen5C cluster would be used as a low power island for power efficiency. This is what ARM SoC vendors do. Use the small cores for power efficiency, and use the large cores for performance scaling.
By putting 8 Zen5C, it feels like AMD is instead following Intel's playbook of "E-core spam". Which is ironic, considering AMD took a dig at Intel with the "economy cores" joke.
SVE brings predicates and first fault ld/st to the table which can be quite useful for autovectorization. Some of these features were available starting with AVX-512 and were also added to Intel new AVX10.Software using NEON and hardware made to execute NEON fast is best possible combination. SVE doesn't bring anything to table - ARM should also drop them from their silicon. CPU's should execute existing software fast - not implement new extensions which no software use now and probably won't use newer like what is happened in x86. They need fast SSEx FPUs and possibly 128 bit AVX any effort to wider SIMD is just wasted silicon space on consumer side.