Question Incredible Apple M4 benchmarks...

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

SpudLobby

Senior member
May 18, 2022
991
684
106
Pretty significant increase in core power there between M3 and M4
Yeah I am assuming he’s using those internal APIs or something and the 11W figure is the full number someone else measured externally. M3 for example certainly consumes more than 4.76 watts in ST. Regardless yeah they blew up core power, albeit at least it’s way better than competitors still.
 
Reactions: exquisitechar

poke01

Platinum Member
Mar 8, 2022
2,584
3,410
106
Last one, looking at thermals over previous iPad Pro and M4 uses less power while offering more FPS.

 

moinmoin

Diamond Member
Jun 1, 2017
5,145
8,226
136
Why would they actually redesign the core though if the basics still work?
In short: It's the very definition of resting on one's laurels.

It's exploiting existing capabilities but doesn't prepare for future bigger improvements. Making the most of what's already there makes it seem like the product keeps up well in the competitive environment, but bigger jumps are essentially delayed up to the point where work on redesigns resume (and those actually turn out to be successful, which is never a given). Intel did it will all the Skylake derivatives and lost its IPC lead over that. Apple seems to be on the way to do the same right now unless there is a major redesign coming soon for one of the following gens.
 

FlameTail

Diamond Member
Dec 15, 2021
4,238
2,594
106
there is a major redesign coming soon for one of the following gens.
We thought that would happen with N3B/A16/M3. But it didn't.

But then we thought it will happen with N3E/A18/M4. But that also hasn't happened.

So when?
 

moinmoin

Diamond Member
Jun 1, 2017
5,145
8,226
136
We thought that would happen with N3B/A16/M3. But it didn't.

But then we thought it will happen with N3E/A18/M4. But that also hasn't happened.

So when?
In the worst case Apple is actively looking for competent staff but so far "only" found ones capable of making the most of existing core designs. Redesigning cores to not only perform at the same level as the previous ones but significantly above that with added room for further improvements is kind of hard after you let leave all the staff previously involved in such evolution.
 
Reactions: Orfosaurio

poke01

Platinum Member
Mar 8, 2022
2,584
3,410
106
We thought that would happen with N3B/A16/M3. But it didn't.

But then we thought it will happen with N3E/A18/M4. But that also hasn't happened.

So when?
According to mobile chip expert on Weibo, M3 was supposed to the core redesign but Apple wasn’t happy the performance uplift ie the IPC improvement. So they scraped it.

Who knows when the next one will be.
 

SpudLobby

Senior member
May 18, 2022
991
684
106
In short: It's the very definition of resting on one's laurels.

It's exploiting existing capabilities but doesn't prepare for future bigger improvements. Making the most of what's already there makes it seem like the product keeps up well in the competitive environment, but bigger jumps are essentially delayed up to the point where work on redesigns resume (and those actually turn out to be successful, which is never a given). Intel did it will all the Skylake derivatives and lost its IPC lead over that. Apple seems to be on the way to do the same right now unless there is a major redesign coming soon for one of the following gens.
You have no idea what I meant. My point is that Apple isn’t going to change their design philosophies which are targeted towards low power cores and running wide, and they’ve been doing this for *years* now, since inception.

I’m not advocating for doing Intel and Skylake (and yes I know that story). Just pushing back against “wow they only did wider decode new branch prediction and reduced memory latency”. That’s fine, what they need is that on a scale that actually matches the changes they were previously making in terms of performance.

I don’t care how they get there, but I know what it’s probably going to look like, and they’re not moving fast enough. My point is they don’t need to be *sexy*.
 
Reactions: Orfosaurio

SpudLobby

Senior member
May 18, 2022
991
684
106
At any rate I do fundamentally agree Apple is Skylaking it and have said the same myself. Maybe less bad than Skylake, but still not far off.
 

moinmoin

Diamond Member
Jun 1, 2017
5,145
8,226
136
You have no idea what I meant. My point is that Apple isn’t going to change their design philosophies which are targeted towards low power cores and running wide, and they’ve been doing this for *years* now, since inception.
Maybe you should write what you meant then? I'm just reading what you wrote, and you wrote "Why would they actually redesign the core". Redesigning the core every couple of generations is a prerequisite to always increase the room available for improvements, essentially rebalancing bottlenecks. This has nothing to do with changing some design philosophies or some such.

Reading your further response it seems we actually agree, so I'll stop there.

Edit: And going by Geekerwan's microarchitecture analysis discussed in the other thread M4 may well constitute a redesigned core. Let's see.
 
Last edited:

SpudLobby

Senior member
May 18, 2022
991
684
106
Maybe you should write what you meant then? I'm just reading what you wrote, and you wrote "Why would they actually redesign the core". Redesigning the core every couple of generations is a prerequisite to always increase the room available for improvements, essentially rebalancing bottlenecks. This has nothing to do with changing some design philosophies or some such.

Reading your further response it seems we actually agree, so I'll stop there.

Edit: And going by Geekerwan's microarchitecture analysis discussed in the other thread M4 may well constitute a redesigned core. Let's see.
I mean, “redesigning the core” is not at all something clear in these corners. People literally believe in “clean sheet designs” which is a very loose term these days

But we do agree lol
 

moinmoin

Diamond Member
Jun 1, 2017
5,145
8,226
136
I mean, “redesigning the core” is not at all something clear in these corners.
That's really unfortunate as that's a rather basic step in evolutionary (not revolutionary, that'd be "clean sheet") development of silicon. To have technical discussions everybody should (try to) be on the same page. Recent discussions about odd and even Zen gens showed that some people don't seem to grasp the difference between designing schematic diagrams and allocating resources to units and connections on it.
 
Reactions: Nothingness

trivik12

Senior member
Jan 26, 2006
343
317
136
x86 is in trouble if Apple and probably Qualcomm chips are so far ahead on Performance per watt/Single threaded numbers as well. its ridiculous how far ahead Apple is currently. Of course this chip is almost useless for iPad OS. I hope they release some Mac with it soon.
 

Panino Manino

Senior member
Jan 28, 2017
876
1,136
136
It doesn't look like Apple has the know-how anymore to redesign the core. Every generation after M1 has been using the transistor budget to just widen existing structures. Like Jim Keller said, every so often you have to re-write the core and rebalance the structures. Even if the underlying building blocks are the same, how you use them to construct the core matters a lot more. AMD did it with Zen 3 and now they will redo it with Zen 5. It doesn't appear Apple did it at all within the last 3 years.

But having an as wide as possible machine isn't the way, the only way some may say, to achieve this great performance at lower clocks and power? Would be possible to achieve the same performance with a narrower core?
 

FlameTail

Diamond Member
Dec 15, 2021
4,238
2,594
106
Apple has the most efficient E-core in the world

It is already light years ahead of ARM'S Cortex little cores in terms of performance-per-watt.

Intel/AMD low-power core designs have no hope of beating this.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
But having an as wide as possible machine isn't the way, the only way some may say, to achieve this great performance at lower clocks and power? Would be possible to achieve the same performance with a narrower core?

Likely. There are lots of tricks that boost your general core perf without going wider.
  • Deeper structures - schedulers, ROB, PRF size
  • Larger or faster caches (like Z, which has usually not been on the cutting edge of width but has generally had very large L1/L2)
  • Non-committing runahead (like Power6) and its advanced cousin, speculative multithreading (like Rock)
  • Branch predictor improvements
  • Prefetch improvements
  • Reduced op latencies
  • For multithreaded performance - SMT, unusual shared cache configurations (like later SPARC gens), unusual shared core-resource configurations (like Bulldozer family or Freescale e6500)
Just a question of what tradeoffs you're willing to make for your target workloads.
 

Panino Manino

Senior member
Jan 28, 2017
876
1,136
136
Likely. There are lots of tricks that boost your general core perf without going wider.
  • Deeper structures - schedulers, ROB, PRF size
  • Larger or faster caches (like Z, which has usually not been on the cutting edge of width but has generally had very large L1/L2)
  • Non-committing runahead (like Power6) and its advanced cousin, speculative multithreading (like Rock)
  • Branch predictor improvements
  • Prefetch improvements
  • Reduced op latencies
  • For multithreaded performance - SMT, unusual shared cache configurations (like later SPARC gens), unusual shared core-resource configurations (like Bulldozer family or Freescale e6500)
Just a question of what tradeoffs you're willing to make for your target workloads.

Possible, it yes.
Is there any team able to build this unicorn of narrower machine that matches Apple?
 

FlameTail

Diamond Member
Dec 15, 2021
4,238
2,594
106
So there's wide cores and narrow cores, deep cores and shallow cores (?).

Can someone further explain?
 

naukkis

Senior member
Jun 5, 2002
962
829
136
But having an as wide as possible machine isn't the way, the only way some may say, to achieve this great performance at lower clocks and power? Would be possible to achieve the same performance with a narrower core?

No. It's either to execute as many as possible instructions simultaneously or clock higher. Of course going wider isn't easy - it done wrong it will clock so low that it isn't competitive. But Apple design is brilliant, they same time go even wider and increased clocks considerably - and also see that Apple isn't pushing clocks as high as silicon could go like Intel does and still beat them - in a pad platform. That's so massive lead in performance that it ain't even funny.
 

naukkis

Senior member
Jun 5, 2002
962
829
136
In short: It's the very definition of resting on one's laurels.

It's exploiting existing capabilities but doesn't prepare for future bigger improvements. Making the most of what's already there makes it seem like the product keeps up well in the competitive environment, but bigger jumps are essentially delayed up to the point where work on redesigns resume (and those actually turn out to be successful, which is never a given). Intel did it will all the Skylake derivatives and lost its IPC lead over that. Apple seems to be on the way to do the same right now unless there is a major redesign coming soon for one of the following gens.

It seems that Apple redesigned core from top down from 9 issue to 10-issue. That's a really major redesign, specially that they simultaneously could clock it higher. So yes somebody could hope for bigger jumps but when they are at top of what anyone have done it isn't walk at a park.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
No. It's either to execute as many as possible instructions simultaneously or clock higher.

This isn't true at all.

One of the largest performance limiters in modern microarchitecture is correct branch prediction, which isn't about executing more ops at all, but rather about running ops as early as possible and wasting as little work as possible.

Cortex-X4 is wider than M3, and slower.
 

naukkis

Senior member
Jun 5, 2002
962
829
136
This isn't true at all.

One of the largest performance limiters in modern microarchitecture is correct branch prediction, which isn't about executing more ops at all, but rather about running ops as early as possible and wasting as little work as possible.

Cortex-X4 is wider than M3, and slower.

Wider as cpu ability to extract IPC from code. Technical details to achieve that are irrelevant, stating that Cortex-X4 is "wider" than M3 is massively oversimplification. Minor performance increase can be found by widening any state of cpu pipeline but as increasing area of execution will also increase power logarithmically good cpu designer should instead minimize every stage at their design to find optimal performance/power and extract max clocks. Which clearly Apple does better than any of their rival today. Which actually is very curious case, they don't need to be fastest, they really don't chase maximum performance from their silicon and still they got the fastest cpu design. They sure make their rivals look incompetent.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Wider as cpu ability to extract IPC from code. Technical details to achieve that are irrelevant, stating that Cortex-X4 is "wider" than M3 is massively oversimplification. Minor performance increase can be found by widening any state of cpu pipeline but as increasing area of execution will also increase power logarithmically good cpu designer should instead minimize every stage at their design to find optimal performance/power and extract max clocks. Which clearly Apple does better than any of their rival today. Which actually is very curious case, they don't need to be fastest, they really don't chase maximum performance from their silicon and still they got the fastest cpu design. They sure make their rivals look incompetent.

Nobody in industry describes "wider" as anything but "more op throughput and execution resources" - there are clearly many ways to scale performance without increasing those, as enumerated above

Not going to respond to the Apple hagiography because I don't think it's very relevant
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |