Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 698 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

naukkis

Senior member
Jun 5, 2002
878
757
136
I've heard exactly the same argument about AMD FSA/HSA years back and it never amounted to anything.

It's still just GPGPU compute when you ignore the fluff.

x86 ecosystem won't have working solution. Apple does, every Apple cpu have gpu attached and working driver model to use it. AVX512 instead, how greatly it's implemented in Zen5 brings absolutely zero performance increase to desktop/mobile use cases. Because it's not used to anything as most of installed cpu base won't support it.
 

FlameTail

Diamond Member
Dec 15, 2021
3,772
2,226
106
x86 is towards wider SIMD units. ARM side has more 128 bit execution pipes and load/store ports. Apple and Cortex-x925 with its six 128bit NEON pipes is anything but weak in SIMD execution. x86 targets high clocks and many SIMD units and load/store-ports are hard to make working together but Intel E-cores take that same ARM approach - more 128 bit execution ability and performance seems to be there for x86 too.
There's certainly some interesting discussions to be had about design choices in modern CPUs.

Wider SIMD units vs More SIMD units

Monolithic Decoder vs Clustered Decoders

uOP caches vs No uOP cache

L1/L2 cache vs L0/L1/L2/L3 cache
 

DavidC1

Senior member
Dec 29, 2023
782
1,241
96
To the extent it failed at all (which it didn't, really, but it obviously isn't with us anymore) - there are reasons way beyond that; every gen late, most of them missing clock targets, IBM having damn near miraculous execution in the early 2000s, general decline of the RISC/UNIX market after K8, perception by potential Itanium OEMs that HP was structurally privileged in IPF, etc, etc, etc, etc
Intel probably really didn't believe in the project as they claim they did. Because the money flow never existed with Itanium to really matter, and a goal of business is to make money.

There are many projects that continued despite the reluctance of the creator because it turned out to be massively financially successful, since even the creator of the project did not forsee what the market saw. Intel demonstrated time and time again when the money is at stake, they eventually come around.

And successful projects also require feedback, which you receive in bounds with x86, and almost nothing with Itanium.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Intel probably really didn't believe in the project as they claim they did. Because the money flow never existed with Itanium to really matter, and a goal of business is to make money.

There are many projects that continued despite the reluctance of the creator because it turned out to be massively financially successful, since even the creator of the project did not forsee what the market saw. Intel demonstrated time and time again when the money is at stake, they eventually come around.

And successful projects also require feedback, which you receive in bounds with x86, and almost nothing with Itanium.

Intel wasn't the creator.
 
Reactions: Nothingness
Jul 27, 2020
19,613
13,481
146
Nah, explain in depth. What "optimization" can an OS do to make SPEC faster?
Nah since he showed M2 failing miserably in those Linux benchmarks, MacOS must be cheating and "detecting" SPEC runs

Their black optimization art is really just a SPEC detector built into their silicon!

And should Apple wish to defend their honor, they better fix whatever is causing their SoCs to tank under Linux!
 

desrever

Member
Nov 6, 2021
170
451
106
Nah since he showed M2 failing miserably in those Linux benchmarks, MacOS must be cheating and "detecting" SPEC runs

Their black optimization art is really just a SPEC detector built into their silicon!
Nah, explain in depth. What "optimization" can an OS do to make SPEC faster?
Explain in depth why there is difference between different OS in SPEC then?
 
Reactions: dr1337

DavidC1

Senior member
Dec 29, 2023
782
1,241
96
Could you elaborate on why that is the case?
This is the SpecCPU suite: https://www.spec.org/cpu2017/Docs/overview.html#benchmarks

Integer: Compiler, Artificial Intelligence, Route Planning, Compression

All require "smarts" to advance. And they are latency sensitive. Like an easy example is games. Like when you are playing games, and the CPU needs to respond to you. Does bandwidth matter? Only to a point. Does latency? Always. Branch prediction accuracy? Absolutely. More units? Yes. More cache? Yes.

FP: Lot of them are 3D modelers where it's highly parallelizable. It likes latency, branch prediction, and more units too, but not as much. The first widely used consumer FP use was for 3D acceleration in games(before Voodoo 3D cards).

The thing is, improving Integer performance gives gains in all areas. AI, Cryptography, web browsing, encoding, games, everything.

Despite the simplicity in explanation, the fact is before the variants of Intel 80486, FP units were discrete cards. An accelerator. Why did Skymont get extra 30% over Integer by simply doubling the number of vector units?
Intel wasn't the creator.
Well, they were co-creators, were they not? They were eventually saddled with it anyway. Even if that's the case, HP themselves pretty much gave it away, which kinda speaks for itself.
 
Reactions: FlameTail

Nothingness

Diamond Member
Jul 3, 2013
3,031
1,973
136
Try that again in FP/SIMD and M4 dies a horrible death 🤣

4x 128 bit vs 4x 512 bit isn't even a competition.
Apple units are symmetric so you get 4x128 FMA. That's not the case with Zen5 where you get 2x256/512 FMA. That's still twice more but only for code that can vectorize to 512-bit. I wouldn't bet who will win in the end for general FP code.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Well, they were co-creators, were they not? They were eventually saddled with it anyway. Even if that's the case, HP themselves pretty much gave it away, which kinda speaks for itself.

HP developed essentially the whole thing as PA-WideWord before Intel joined the project in 1995ish and canceled their own RISC project. The Itanium2 core came out of HP's design group in Fort Collins; the Poulson core that came out in 2012 (and proved to be the last Itanium) came from Intel Hudson.

HP didn't "give it away." There was a calculated move to transfer ownership of the Itanium product line to Intel, even though design work was continuing to be done in Fort Collins, to make it a "neutral" merchant platform where OEMs would not be competing with their CPU vendor.
 
Reactions: igor_kavinski

DavidC1

Senior member
Dec 29, 2023
782
1,241
96
Apple units are symmetric so you get 4x128 FMA. That's not the case with Zen5 where you get 2x256/512 FMA. That's still twice more but only for code that can vectorize to 512-bit. I wouldn't bet who will win in the end for general FP code.
Doubling FP units versus going to wider SIMD is an art of compromise between area, total available market, and power use.
HP developed essentially the whole thing as PA-WideWord before Intel joined the project in 1995ish and canceled their own RISC project. The Itanium2 core came out of HP's design group in Fort Collins; the Poulson core that came out in 2012 (and proved to be the last Itanium) came from Intel Hudson.
Ok, but doesn't that strengthen the idea that Intel had no motivation financially or otherwise to make it a real good chip? When Intel first got in the market, the computer market was nonexistent, so it was much easier decision to make which route to go, and it came to be x86. If they had EPIC then, they could have done it.

Same with the iPAX or whatever it was called that Intel tried to push.

Only in benchmarks we can do Apple vs. Apple comparisons. In the real world we can't always normalize them.
 

naukkis

Senior member
Jun 5, 2002
878
757
136
Apple units are symmetric so you get 4x128 FMA. That's not the case with Zen5 where you get 2x256/512 FMA. That's still twice more but only for code that can vectorize to 512-bit. I wouldn't bet who will win in the end for general FP code.
That's easy. All Apple code uses same 128 bit Neon instructions and have full fp SIMD power. For Zen5 situation is different. 99% of codebase use 128bit SSEx and 1% is 256bit avx2. Zero percent uses AVX512. For general code Apple's solution is so much better that it ain't funny. x86 codebase is badly fragmented and ain't moving forward because Intel won't want to support newest instructions on all models. Intel will focus in next generations to bring up performance to those 128-bit units which actually have software support. Extensions without proper top to bottom installed cpu support are useless.
 
Reactions: Nothingness

Doug S

Platinum Member
Feb 8, 2020
2,711
4,603
136
The longer you stay there, the more screwed you will get. Need to invest again in max 7 years or maybe earlier when RAM/SSD limitations start troubling you or when the SSD writes get exhausted. No incremental upgrades. There's no guarantee that your laptop will even work flawlessly that long unless you are "subscribed" to AppleCare. RAID 1? What's that, asks Apple? Cheaper workstation laptops give you that option. ECC? Again, what's that, Apple asks? You use what we give you, now shut up and go back to saving money for your next whole laptop upgrade, Apple tells you. Yes, I'm sure it's Paradise on your side of the world.

Show me the x86 laptop that takes ECC RAM. And I mean a REAL laptop, not some 7 lb "desktop replacement" brick with a multiple fans that are spinning so fast for anything beyond idle they sound like dentist drills!
 
Jul 27, 2020
19,613
13,481
146
Show me the x86 laptop that takes ECC RAM. And I mean a REAL laptop, not some 7 lb "desktop replacement" brick with a multiple fans that are spinning so fast for anything beyond idle they sound like dentist drills!
Not in the mood to search one (there are a LOT more x86 laptop makers, niche ones).

Pity you don't have an Apple 7 pounder that doesn't make drill sounds. I feel sorry for you.
 

Nothingness

Diamond Member
Jul 3, 2013
3,031
1,973
136
Apple can optimize their whole stack from the OS level down, including their browser. And for some benchmarks, they can optimize the entire benchmark too. There just won't be any other CPU that beat them at performance/watt simply due to this. Nobody really knows how much black magic optimization is happening for Apple devices running Apple silicon. Here is Apple silicon on Linux:

View attachment 104180
M2 gets absolutely trashed by Zen 4. M4 vs Strix would probably be similar. Of course optimizations in Linux for isn't perfect so there is gains to be had under Linux for Apple silicon but my point is the hardware isn't the whole thing. The Software matters, probably way more than people give it credit.
This just shows the issue with Phoronix when doing cross architecture comparisons: the software they use will surely have much heavier hand tuning on x86 than on Arm.

Also Linux on Apple wasn't likely properly tuned back then (I'm not sure it is now).

Add to that Phoronix mixes in a single score ST and MT workloads and you have the perfect tool to provide useless comparisons.

In the mean time all people properly compiling and running an OS/ISA agnostic benchmark such as SPEC all show the same thing: Apple is at the top of the ST charts.
 

Doug S

Platinum Member
Feb 8, 2020
2,711
4,603
136
Computerbase test show that the Apple core doesnt manage to hit its max frequency more than 10s, after wich it throttle at half the power, so the ST perf is actually 25-30% lower than what is displayed, so much for the 9950X delivering less perf.

Edit : That s why Geekbench do pauses between each test, that s a blatant help for Apple as their core wouldnt yield good numbers if the tests would be run without delays.

Well good news, SPEC doesn't pause and even passively cooled M4 beats everyone there in 1T SPECint. What's your excuse going to be then? Are you going to say that integer benchmarks are useless, and we should only look at FP? I'm sure if you look hard enough you can find a benchmark that tells you what you want to hear, and can come up with an explanation why that's the One True Benchmark and all the rest have been paid off by Apple to favor them.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Well good news, SPEC doesn't pause and even passively cooled M4 beats everyone there in 1T SPECint. What's your excuse going to be then? Are you going to say that integer benchmarks are useless, and we should only look at FP? I'm sure if you look hard enough you can find a benchmark that tells you what you want to hear, and can come up with an explanation why that's the One True Benchmark and all the rest have been paid off by Apple to favor them.

Yeah, I have no use for Apple products myself but the constant drumbeat of excuses that have been made for the last ten years about how Apple cores aren't really as fast as they appear is starting to get irritating, especially since the same dumb arguments are increasingly being applied to Cortex.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |