Question Incredible Apple M4 benchmarks...

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Nothingness

Diamond Member
Jul 3, 2013
3,137
2,153
136
Excellent point. So if there is evidence that GB is favoring or has favored certain vendors in certain tests, is it a relevant benchmark and should people be excited over the score of a version bump just released not too long ago?
That was irony I think it's fair to use all the features a CPU has, one just has to be aware of what impacts the score of a test (and yes, I guess the original GB document would need to be updated to clarify what test uses Apple AMX/SME).
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
This thread is rapidly getting a bit silly, so I'll just make a couple of comments and then I'm probably going to step away from it a little.

  • Cinebench is a potato. It may be useful for... something... and I know Chipsandcheese uses it periodically, but I've never seen it act as a reliable proxy for any larger set of workloads that I'd consider particularly meaningful.
  • Geekbench is deeply flawed but at least does attempt to be a proxy for other things, and is better than most other free options. Unfortunately, being a benchmark that is publicly available only in binary form creates problems like we're seeing now, where there's ambiguity over whether their particular configuration favors one vendor or another. (It also doesn't break out int and FP as distinct scores for some reason.)
  • I like SPEC. I run SPEC or its subtests when I need a quick and dirty performance number that I find meaningful. SPEC is nowhere near a turnkey "fire and forget" bench like GB, though, and I can't imagine getting it running on Android or iPhone OS is particularly fun. (Even getting spectools running on new ISAs is surprisingly annoying at times.) More to the point, it's also expensive; I have a license because I use it for work but it is never going to be broadly popular among folks who just want to get an idea of how fast their iPad is.
  • And, lastly, a spicy take: anything that can get twice as fast in a single gen based on relatively niche ISA changes simply doesn't belong in a set of general-purpose proxy benchmarks. The function of something like SPEC is to be able to predict random "normal" code's performance. I haven't seen it ever gain a 100%+ intergenerational win on a single subtest without compiler shenanigans, because in general, that's not how real applications will behave - especially because SPEC (correctly) does not include platform-specific backends. I realize not everyone will agree; I invite them to run MLperf or something if this kind of code stream is important to them.
    • This does not mean that I hate Apple or biased for x86 or whatever. As a general rule, I'd apply the same standard there. I feel about matrix accelerators boosting Object Detection basically the same way as I'd feel about using tuned assembly backends or fixed-function codec blocks in SPEC's x264_s - doubtless those are useful things with a real impact on user experience, but that is not really what a general-purpose benchmark is for.
 

Nothingness

Diamond Member
Jul 3, 2013
3,137
2,153
136
Thanks. Apparently, people have different interpretations of the results.
The main issue with these results is power consumption. No matter how these were measured, comparing an iPad and an iPhone is meaningless (and do we even know how the poster measures power?).
That said, as @SarahKerrigan wrote, it looks like this shows a measurable IPC improvement.
 

Doug S

Platinum Member
Feb 8, 2020
2,888
4,912
136
YES! That's not really an expense for a company worth trillions. They could get their interns to do that, WITHOUT PAY, as a project to prove their worth for a permanent position.


Yes I'm sure companies love to have unpaid interns write software that they will then turn around and support for years. Apple sells Macs as an integrated product, hardware and software. Maybe if Qualcomm is able to prove there really is a market for ARM Windows Apple will decide it is worth doing what you want. But they aren't going to half ass it with drivers written by interns so Windows will boot, but experience crashes and poor performance due to poor quality. That would reflect poorly on Apple, people would blame the hardware for the poor performance, etc. If they were going to do it, they'd have to commit to long term support and that's not something that's "not really an expense" or something they can get unpaid interns to do.


And Apple takes no blame? Why do you think Microsoft went with Qualcomm? Coz Apple won't give them ARM chips for Surface devices and especially not at prices that Qualcomm is providing. Do you forget that it was an Apple engineer who in his spare time, got MacOS working on x86 hardware, Steve Jobs saw it, flew to Japan to meet with Sony top dog and pitched an idea of MacOS running on Sony VAIO laptops? Shouldn't be a big task for a few of their engineers to get Windows on ARM running on a Mx device and then just announce that Macbooks are versatile enough to run Windows and literally every useful application in existence. Wouldn't that be a big selling point?


Not saying Apple takes no blame. But why are you complaining that Apple won't sell Microsoft chips to make a Surface? Do you really believe they should be obligated to sell their chips on the open market? But the biggest roadblock to running Windows on an Apple Silicon Mac up until today has been Microsoft and Qualcomm, not Apple:

https://www.tomshardware.com/pc-com...ty-agreement-with-microsoft-expires-this-year


If Apple isn't promoting the fact that their bootloader is unlocked, there's no guarantee that it will stay unlocked. Maybe they are curious to see how far hackers get with running a functional Linux on their hardware. If they get too close for comfort, there is nothing stopping Apple from locking everything down coz we all know how much Apple fears competition and they do literally everything under the sun to prevent anyone from getting in on their side of the fence. An unlocked bootloader may also be a talent scouting tactic on Apple's part. Search on Github for projects that hack away on Apple hardware and then scoop them up coz if they are doing that much for free, imagine what they could do with proper guidance and creature comforts.


So now you're just assuming Apple is evil, and has unlocked their bootloader just to fool people into wasting time porting Linux only to pull the rug out from under them at the last minute? If you assume Apple might do that, why in the world would you want the ability to run Windows on a Mac? Because an evil Apple could pull the rug out from under you as far as running Windows on a Mac.
 

poke01

Platinum Member
Mar 8, 2022
2,581
3,409
106
If Apple isn't promoting the fact that their bootloader is unlocked, there's no guarantee that it will stay unlocked
They promoted the unlocked bootloader in their dev presentations.

The Mac also has an unlocked bootloader to run previous versions of macOS that are no longer supported by Apple, which in turn also supports other Operating Systems as well.

An M1 Mac can be restored to its original OS that came out in 2020. You can’t do that with an iPhone, as you can only run signed iOS versions.
 

ikjadoon

Senior member
Sep 4, 2006
235
513
146
Looking at this another way, how helpful is SME in a consumer benchmark? Helpful as in, "Is CPU Y faster in these workloads than CPU X?".
  1. Where does Geekbench use SME?
  2. Do users commonly use those workloads?
  3. How often do those workloads activate the CPU?
  4. How much "better" is a CPU that can run these workloads 2x or 10x or 100x faster?
Question 1: Where does Geekbench use SME?

Geekbench 6.3 (finally current) uses Arm's SME on these subtests: Photo Library, Object Detection, and Background Blur. An important corollary is whether Geekbench's usage of SME is representative of consumer applications & OSes usage of SME: that I do not know.

Question 2: Do users commonly use those workloads?

On-device object detection, photo library classification, and background blur do get used frequently, especially in mobile. These are not rare workloads:
  • Photo capture: face recognition, scene recognition, object tracking
  • Security: facial recognition
  • Videoconferencing: background blur, object detection
  • Photo library classification
  • Apple's "remove background" feature
Question 3: How often do those workloads activate the CPU (vs GPU or NPU)?

I don't know and this is the crucial piece, IMO. For some reference, on both Android and iOS, even as we've had NPUs for generations, the CPU still remains a part of the puzzle. But everyone is a little vague about how much they rely on it and on the face of it, we'd expect much of it to shifted to NPUs already. The little info I've found:

Qualcomm's take: (for short bursts of small models & seemingly lower latency, use the CPU)

As previously mentioned, most generative AI use cases can be categorized into on-demand, sustained, or pervasive. For on-demand applications, latency is the KPI since users do not want to wait. When these applications use small models, the CPU is usually the right choice. When models get bigger (e.g., billions of parameters), the GPU and NPU tend to be more appropriate.
A personal assistant that offers a natural voice user interface (UI) to improve productivity and enhance user experiences is expected to be a popular generative AI application. The speech recognition, LLM, and speech models must all run with some concurrency, so it is desirable to split the models between the NPU, GPU, CPU, and the sensor processor. For PCs, agents are expected to run pervasively (always-on), so as much of it as possible should run on the NPU for performance and power efficiency.

Apple's take: background apps or GPU-intensive tasks, use the CPU (why not CPU+NPU, aka ANE?...I don't know)

Use MLComputeUnits.cpuOnly to restrict the model to the CPU, if your app might run in the background or runs other GPU intensive tasks.

Notably, Apple's Core ML requires the CPU to be allowed to run the workload and cannot be excluded; developers can selectively exclude the GPU and NPU, however.



I'd really like to see Google's take with Android (as Android will be the majority ML workload by volume / # of users & workloads). Unfortunately, there is not much I've found. For now, this older 2019 paper (that uses even older CPUs) is only slightly helpful:

Even though most mobile inference workloads run on CPUs, optimizations of ML workloads with accelerators hordes most of the attention. There is a lot of room for optimizations on mobile CPUs to enable ML applications across different mobile platforms. [based on this even older 2018 data]
CPUs provide both the worst energy-efficiency as well as the worst throughput among all components. Still, they are critical for inferencing because they are commonly present across all mobile devices. Low-end mobile SoCs would lack accelerators like NPU. They may contain a low-end GPU, but maybe missing OpenCL support and thereby lack any inferencing capability. Network inference on CPU is inevitable and demands optimization considerations.

Question 4: How much "better" is a CPU that can run these workloads 2x or 10x or 100x faster?

Another crucial question and I don't know. It depends on the answer for #3. If it's only used once every 100 workloads (let's call one workload = one action in an app), being 100x faster is entering into irrelevance. Particularly with Geekbench seemingly weighing each subtest equally, would we make the same call? Is Photo Library really the same weight as HTML5? I don't think so.
 

Eug

Lifer
Mar 11, 2000
23,925
1,526
126
Is Photo Library really the same weight as HTML5? I don't think so.
I don't know how much Apple's Photos application leverages this, but Photos' services take a fair amount of CPU cycles on Apple's consumer (and Pro) machines, doing face recognition in the background and stuff like that, whether or not the actual application is running.

My Photos library is about 60000 images and videos, and is over 600 GB in size.

On my slow Intel MacBook, sometimes Photos can take forever to complete its background tasks, esp. after an OS version upgrade. It's faster on the Apple Silicon Macs, but it's not exactly always snappy. If M4 significantly sped this up, that would be quite welcome.

Note that my Photos library is identical on my Intel MacBook, M1 Mac mini, iPhone, and iPad, because it's all sync'd through iCloud. Only the M1 Mac mini (along with iCloud) has all the original files, but all the other devices have thumbnails/reduced size images with automatic cloud access to the originals. Plus my iPhone has a ton of the originals on it locally, since that's what I use to take most of the photos and videos in the first place. It seems a lot of the processing is local however, so it would make sense for it to be as fast as possible on both iDevices and Macs.

P.S. Photos seems to have a bad memory leak when exporting original files. I tried on my 24 GB Intel iMac to export ~50000 original files with no processing, and it freaked Photos right out. I could only accomplish this task by doing about 3000 files at a time. Photos needs to be overhauled.

 
Last edited:

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
From Zen3 to Zen4, Object Detection score increased by 2.5x in GB6. That's more than the 2x of M3 to M4.

Where was the outrage?


Nobody is outraged. Nobody is saying Apple somehow cheated. People are trying to figure out what kind of general-purpose improvements can be expected from this microarchitecture. Early SPEC results have made that far clearer.

Nobody is persecuting you or Apple, and your (repetitive) reaction to people examining the benchmarks is misplaced.
 

poke01

Platinum Member
Mar 8, 2022
2,581
3,409
106
From Zen3 to Zen4, Object Detection score increased by 2.5x in GB6. That's more than the 2x of M3 to M4.

Where was the outrage?


I learnt now that extensions do accelerate part of a workflow when it’s supported but it’s not the CPU core that’s doing the work.

SME can be implemented by other ARM vendors too and it’s not specific to M4( Well, it won’t be in a few years).

Best real world tests are Blender and code compiling which the M3 already excelled at.
The M4 family should be even faster.
 
Reactions: Orfosaurio

branch_suggestion

Senior member
Aug 4, 2023
414
907
96
What really matters is the net ST uplift across a full spectrum of workloads.
As it currently stands it seems to be in the mid-teens. Solid, but nothing crazy like some are claiming. N3E is the single largest contributor to getting the extra clocks without killing power, uArch gains are small and have been small since Firestorm.
 

Eug

Lifer
Mar 11, 2000
23,925
1,526
126
Besides for example Photos, where else could this be used? Final Cut Pro?

They made a big deal of the subject isolation feature in the keynote for example. Does it apply here or no?

 
Reactions: Orfosaurio

poke01

Platinum Member
Mar 8, 2022
2,581
3,409
106
N3E is the single largest contributor to getting the extra clocks without killing power, uArch gains are small and have been small since Firestorm.
Yeah, it looks like Apple is using these node advancements while they get their CPU team in order.
It also helps that Firestorm is still relevant today.
 

mikegg

Golden Member
Jan 30, 2010
1,849
471
136
Nobody is outraged. Nobody is saying Apple somehow cheated. People are trying to figure out what kind of general-purpose improvements can be expected from this microarchitecture. Early SPEC results have made that far clearer.

Nobody is persecuting you or Apple, and your (repetitive) reaction to people examining the benchmarks is misplaced.
Calm down.

There are clearly biases here. The point is to point out that Intel and AMD scores also benefited from a major increase in Object Detection performance in the past. It's clearly the kind of acceleration that chip designers are focusing on due to a drastic increase in demand from applications.
 
Reactions: Orfosaurio

poke01

Platinum Member
Mar 8, 2022
2,581
3,409
106
Reactions: Orfosaurio

Eug

Lifer
Mar 11, 2000
23,925
1,526
126
This belongs here as well. Lol
Post in thread 'Apple Silicon SoC thread'
http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=threads/apple-silicon-soc-thread.2587205/post-41210527


I guess it’s not easy to sustain the 4.4Ghz in an iPad that’s 5.1mm thin. Bodes well for the 16” MacBooks and the desktop Macs that have fans to hit that clock speed.
Besides breaking the 4000 score barrier in single-core, it’s also the fastest M4 9-core multi-core score to date.

Hmm..not sure if it's a good idea to tell a lady to calm down
Off topic but I wondered if the SK name referred to the game character.
 
Jul 27, 2020
20,909
14,489
146

GB 6.0 vs. 6.3 for Ryzen 7950X

Single threaded



Multi threaded



Object Detection suffered a minor regression (and some other MT tests suffered greater ones) whereas Background Blur sees a massive optimization that even the regressive changes of GB 6.3 isn't able to diminish.

The ASUS system could've faster RAM but if so, that doesn't explain how the MSI with slower RAM is outperforming it in one particular test.

Wish Geekbench would capture more system info parameters like RAM speed and latency. The Geekbench Browser also needs an advanced search with better filtering options.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |