- Mar 3, 2017
- 1,747
- 6,598
- 136
I've heard exactly the same argument about AMD FSA/HSA years back and it never amounted to anything.
It's still just GPGPU compute when you ignore the fluff.
It can be optimized down to the individual tasks pretty easily, why would you even think this?This is a dubious argument. SPEC cannot be made magically faster by a OS.
There's certainly some interesting discussions to be had about design choices in modern CPUs.x86 is towards wider SIMD units. ARM side has more 128 bit execution pipes and load/store ports. Apple and Cortex-x925 with its six 128bit NEON pipes is anything but weak in SIMD execution. x86 targets high clocks and many SIMD units and load/store-ports are hard to make working together but Intel E-cores take that same ARM approach - more 128 bit execution ability and performance seems to be there for x86 too.
It can be optimized down to the individual tasks pretty easily, why would you even think this?
Even SPEC itself says to test with multiple OS and compilers.
Intel probably really didn't believe in the project as they claim they did. Because the money flow never existed with Itanium to really matter, and a goal of business is to make money.To the extent it failed at all (which it didn't, really, but it obviously isn't with us anymore) - there are reasons way beyond that; every gen late, most of them missing clock targets, IBM having damn near miraculous execution in the early 2000s, general decline of the RISC/UNIX market after K8, perception by potential Itanium OEMs that HP was structurally privileged in IPF, etc, etc, etc, etc
Intel probably really didn't believe in the project as they claim they did. Because the money flow never existed with Itanium to really matter, and a goal of business is to make money.
There are many projects that continued despite the reluctance of the creator because it turned out to be massively financially successful, since even the creator of the project did not forsee what the market saw. Intel demonstrated time and time again when the money is at stake, they eventually come around.
And successful projects also require feedback, which you receive in bounds with x86, and almost nothing with Itanium.
Nah since he showed M2 failing miserably in those Linux benchmarks, MacOS must be cheating and "detecting" SPEC runsNah, explain in depth. What "optimization" can an OS do to make SPEC faster?
Nah since he showed M2 failing miserably in those Linux benchmarks, MacOS must be cheating and "detecting" SPEC runs
Their black optimization art is really just a SPEC detector built into their silicon!
Explain in depth why there is difference between different OS in SPEC then?Nah, explain in depth. What "optimization" can an OS do to make SPEC faster?
This is the SpecCPU suite: https://www.spec.org/cpu2017/Docs/overview.html#benchmarksCould you elaborate on why that is the case?
Well, they were co-creators, were they not? They were eventually saddled with it anyway. Even if that's the case, HP themselves pretty much gave it away, which kinda speaks for itself.Intel wasn't the creator.
Explain in depth why there is difference between different OS in SPEC then?
Apple units are symmetric so you get 4x128 FMA. That's not the case with Zen5 where you get 2x256/512 FMA. That's still twice more but only for code that can vectorize to 512-bit. I wouldn't bet who will win in the end for general FP code.Try that again in FP/SIMD and M4 dies a horrible death 🤣
4x 128 bit vs 4x 512 bit isn't even a competition.
And if somebody is on a toy platform without 1st party compiler support for one of the three languages in SPEC, they can omit 1 int and 7 fp subtests to keep it easy and quick.SPEC takes maybe two hours to set up, including installing and writing up a config.
Well, they were co-creators, were they not? They were eventually saddled with it anyway. Even if that's the case, HP themselves pretty much gave it away, which kinda speaks for itself.
Doubling FP units versus going to wider SIMD is an art of compromise between area, total available market, and power use.Apple units are symmetric so you get 4x128 FMA. That's not the case with Zen5 where you get 2x256/512 FMA. That's still twice more but only for code that can vectorize to 512-bit. I wouldn't bet who will win in the end for general FP code.
Ok, but doesn't that strengthen the idea that Intel had no motivation financially or otherwise to make it a real good chip? When Intel first got in the market, the computer market was nonexistent, so it was much easier decision to make which route to go, and it came to be x86. If they had EPIC then, they could have done it.HP developed essentially the whole thing as PA-WideWord before Intel joined the project in 1995ish and canceled their own RISC project. The Itanium2 core came out of HP's design group in Fort Collins; the Poulson core that came out in 2012 (and proved to be the last Itanium) came from Intel Hudson.
That's easy. All Apple code uses same 128 bit Neon instructions and have full fp SIMD power. For Zen5 situation is different. 99% of codebase use 128bit SSEx and 1% is 256bit avx2. Zero percent uses AVX512. For general code Apple's solution is so much better that it ain't funny. x86 codebase is badly fragmented and ain't moving forward because Intel won't want to support newest instructions on all models. Intel will focus in next generations to bring up performance to those 128-bit units which actually have software support. Extensions without proper top to bottom installed cpu support are useless.Apple units are symmetric so you get 4x128 FMA. That's not the case with Zen5 where you get 2x256/512 FMA. That's still twice more but only for code that can vectorize to 512-bit. I wouldn't bet who will win in the end for general FP code.
The longer you stay there, the more screwed you will get. Need to invest again in max 7 years or maybe earlier when RAM/SSD limitations start troubling you or when the SSD writes get exhausted. No incremental upgrades. There's no guarantee that your laptop will even work flawlessly that long unless you are "subscribed" to AppleCare. RAID 1? What's that, asks Apple? Cheaper workstation laptops give you that option. ECC? Again, what's that, Apple asks? You use what we give you, now shut up and go back to saving money for your next whole laptop upgrade, Apple tells you. Yes, I'm sure it's Paradise on your side of the world.
Not in the mood to search one (there are a LOT more x86 laptop makers, niche ones).Show me the x86 laptop that takes ECC RAM. And I mean a REAL laptop, not some 7 lb "desktop replacement" brick with a multiple fans that are spinning so fast for anything beyond idle they sound like dentist drills!
This just shows the issue with Phoronix when doing cross architecture comparisons: the software they use will surely have much heavier hand tuning on x86 than on Arm.Apple can optimize their whole stack from the OS level down, including their browser. And for some benchmarks, they can optimize the entire benchmark too. There just won't be any other CPU that beat them at performance/watt simply due to this. Nobody really knows how much black magic optimization is happening for Apple devices running Apple silicon. Here is Apple silicon on Linux:
View attachment 104180
M2 gets absolutely trashed by Zen 4. M4 vs Strix would probably be similar. Of course optimizations in Linux for isn't perfect so there is gains to be had under Linux for Apple silicon but my point is the hardware isn't the whole thing. The Software matters, probably way more than people give it credit.
Computerbase test show that the Apple core doesnt manage to hit its max frequency more than 10s, after wich it throttle at half the power, so the ST perf is actually 25-30% lower than what is displayed, so much for the 9950X delivering less perf.
Edit : That s why Geekbench do pauses between each test, that s a blatant help for Apple as their core wouldnt yield good numbers if the tests would be run without delays.
Oh yay! People with SPEC fetish have M4 to play with and grin about! The world is finally a bed of roses now.Well good news, SPEC doesn't pause and even passively cooled M4 beats everyone there in 1T SPECint.
Well good news, SPEC doesn't pause and even passively cooled M4 beats everyone there in 1T SPECint. What's your excuse going to be then? Are you going to say that integer benchmarks are useless, and we should only look at FP? I'm sure if you look hard enough you can find a benchmark that tells you what you want to hear, and can come up with an explanation why that's the One True Benchmark and all the rest have been paid off by Apple to favor them.
Oh yay! People with SPEC fetish have M4 to play with and grin about! The world is finally a bed of roses now.
AT isn't a hobby???Get a hobby.
So I am not the only one who's baffled. It seems Igor has turned into a different person starting this month!At this point, I genuinely cannot tell if you are this clueless or if you're doing some kind of bizarre performance art.