Discussion Apple Silicon SoC thread

Eug · Nov 10, 2020

M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:

M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:

Page 78 - Discussion - Apple Silicon SoC thread

Page 78 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M1 Ultra discussion here:

Page 109 - Discussion - Apple Silicon SoC thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M2 discussion here:

Page 127 - Discussion - Apple Silicon SoC thread

Page 127 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:

Page 215 - Discussion - Apple Silicon SoC thread

Page 215 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

M4 Family discussion here:

Page 263 - Discussion - Apple Silicon SoC thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Doug S · Dec 1, 2020

Roland00Address said:
The problem with apple's making cheap laptops that are $600 is apple does not want to cannibalize sales and thus it does artificial market segmentation where they try to make multiple categories, where they have the best device in that category, but not a "mainstream multi use device."

Instead of selling a device that is "abc" they rather sell you device "A"+"B"+"C"

How do you explain the iPhone SE, then? Surely that cannibalizes sales that would have gone to higher priced iPhones? They do it because it expands their market - and by bringing people into the ecosystem, they increase installed base and those iPhone SE buyers might buy higher end iPhones next time.

I think they will do the same with the Mac down the road. Once they update the physical design the "SE" can use the old physical design - because the equipment used to make those components will have been fully amortized. It'll use a not quite new CPU (like the SE2 uses the A13 which was six months old at the time of release, and will be several years old by the time they quit selling this iteration of the SE) which again will leverage depreciated assets to reduce cost - TSMC's depreciated assets in this case.

Their first task is to get the whole Mac line on ARM. They've said that will take a couple years, so by summer/fall 2022. So maybe we'll see a Macbook SE and Mini SE in spring 2023. Alternatively, they could sell "last year's model" for less like they also do for the iPhone. But there will be some lower price points to expand the reach once they've got the hard work of getting all the ARM Macs out the door done.

Eug · Dec 1, 2020

Viper GTS said:
He is referring to this:

https://twitter.com/x/status/1331736203402547201

I'm sure someone has done or will do a more in depth technical explanation but basically Apple is doing x86 style memory operations in hardware rather than emulating them They can do either method depending on what the application needs which gives them a massive performance advantage.

Cool.

amrnuke said:
So back to the M1, I found LTT's latest review of the Mac Mini interesting.

Very interesting the H.265 vs H.264 hardware results. This just goes along with their extreme fine-tuning with the mobile / tablet experience. The most important things for most users of the MBA, MBP, and Mac mini is going to be responsiveness and multimedia experience.

Another interesting thing he mentions is the implementation of Rosetta. I'm not sure what exactly is meant by x86 consistency model and how it affects direct execution of ARM-like x86 instructions or if it enables direct execution of certain x86 instructions. Anyone care to take a layman's jab?

Interesting. He says that running Prime95 didn't ramp up the fan. Peak temp with fan at normal 1700 rpm idle speed is 77C.

Peak system (not SoC) power usage is 32 Watts.

Is Prime95 native on Apple Silicon?

Linus doesn't seem to be just impressed with M1's performance, I think he's more impressed with its performance per watt (for good reason), as he keeps talking about how well the MacBook Air does.

They also do performance extrapolations.

And they're impressed by Rosetta 2.

name99 · Dec 1, 2020

guidryp said:
That memory ordering thing, is referring to Big Endian(Moto 68K) vs Little Endian (x86).

ARM is Bi-Endian (it goes both ways), as was PowerPC, so this isn't new, nor it is in Apple unique feature.

No it is not!

It has to do with the ordering of stores.
Suppose I have a CPU that executes code
store rA to addrA
store rB to addrB

These instructions can be executed out of order depending on the order in which their operands rA, rB, addrA, addrB become available. But effective execution on the local CPU, ie their effect on all the other instructions in the local code, will be in program order.
BUT what about other CPUs? If store B executes first, can it dump its results out to the cache and memory system, to be seen by other CPUs, before store A makes it to memory?

(a) Why does it matter?
A common parallel programming pattern is to calculate a result, store that result, then store a flag telling other CPUs "I'm done, you can read my result", something like
store result
store flag
If the other CPUs see the store flag first, before result has been pushed out to memory, they will assume the flag means the value of result in memory is valid! For correctness it is vital that flag only be visible to other CPUs AFTER result has made it out to memory.

(b) So how to handle this?
One way is to force all stores in order. This is definitely sub-optimal, and it's mostly pointless because almost every store that one CPU makes is of zero interest to any other CPU.

The alternative is to say that most of the time stores can get ordered in whatever random order they like, just the few times when the ordering actually matters to other CPUs you insert a special instruction (called a fence, or a barrier) to enforce the exact ordering you want.

This second alternative is supposed to be so complicated and difficult to understand that the x86 crowd insist every CPU should use the first alternative. But honestly, this is the voice of ignorance. It's just not that hard; all it requires is knowing WTF you are doing! The pattern is simple: whenever you "publish" a flag that tells other CPUs data is valid, you need a fence. Whenever you "subscribe" to a flag (ie test it for validity) you need a fence. Like anything in programming it's difficult at first, requires some hands on experience, then it becomes second nature.

Anyway that's the BASIC issue at hand. There are then many different details of exactly what sort of re-ordering is allowed under what conditions, and what barriers/fences are provided to enforce ordering; that's what it means to say different companies have different memory models. But what I've described above is the essence of the issue, and the essence of why it matters.

Why do you want to re-order stores? Well one reason is that it allows you to be a lot more aggressive with store coalescing. Load/store queues (that hold pending loads and stores that are still speculative and potentially could be cancelled if a branch prediction turns out to be false) are among the most expensive structures on a CPU. If you can make any single slot in the LSQ "more productive", you can hold more pending loads/stores, which in turn means your ROB can be larger.
If you are required to write all loads/stores to memory in order, you are limited in what you can do on this front. But if you are free to re-order stores, you can engage in aggressive store-coalescing -- put a sequence of stores to sequential addresses into the same LSQ entry, and treat them as a single unit for most purposes, ignoring the order in which they were put into that slot. This amplifies the capacity of your LSQ and then amplifies your bandwidth when you write that LSQ entry to cache.

Heartbreaker · Dec 1, 2020

name99 said:
No it is not!

Yeah, got that hours ago when Hitman pointed it out.

amrnuke · Dec 1, 2020

name99 said:
No it is not!

It has to do with the ordering of stores.
Suppose I have a CPU that executes code
store rA to addrA
store rB to addrB

These instructions can be executed out of order depending on the order in which their operands rA, rB, addrA, addrB become available. But effective execution on the local CPU, ie their effect on all the other instructions in the local code, will be in program order.
BUT what about other CPUs? If store B executes first, can it dump its results out to the cache and memory system, to be seen by other CPUs, before store A makes it to memory?

(a) Why does it matter?
A common parallel programming pattern is to calculate a result, store that result, then store a flag telling other CPUs "I'm done, you can read my result", something like
store result
store flag
If the other CPUs see the store flag first, before result has been pushed out to memory, they will assume the flag means the value of result in memory is valid! For correctness it is vital that flag only be visible to other CPUs AFTER result has made it out to memory.

(b) So how to handle this?
One way is to force all stores in order. This is definitely sub-optimal, and it's mostly pointless because almost every store that one CPU makes is of zero interest to any other CPU.

The alternative is to say that most of the time stores can get ordered in whatever random order they like, just the few times when the ordering actually matters to other CPUs you insert a special instruction (called a fence, or a barrier) to enforce the exact ordering you want.

This second alternative is supposed to be so complicated and difficult to understand that the x86 crowd insist every CPU should use the first alternative. But honestly, this is the voice of ignorance. It's just not that hard; all it requires is knowing WTF you are doing! The pattern is simple: whenever you "publish" a flag that tells other CPUs data is valid, you need a fence. Whenever you "subscribe" to a flag (ie test it for validity) you need a fence. Like anything in programming it's difficult at first, requires some hands on experience, then it becomes second nature.

Anyway that's the BASIC issue at hand. There are then many different details of exactly what sort of re-ordering is allowed under what conditions, and what barriers/fences are provided to enforce ordering; that's what it means to say different companies have different memory models. But what I've described above is the essence of the issue, and the essence of why it matters.

Why do you want to re-order stores? Well one reason is that it allows you to be a lot more aggressive with store coalescing. Load/store queues (that hold pending loads and stores that are still speculative and potentially could be cancelled if a branch prediction turns out to be false) are among the most expensive structures on a CPU. If you can make any single slot in the LSQ "more productive", you can hold more pending loads/stores, which in turn means your ROB can be larger.
If you are required to write all loads/stores to memory in order, you are limited in what you can do on this front. But if you are free to re-order stores, you can engage in aggressive store-coalescing -- put a sequence of stores to sequential addresses into the same LSQ entry, and treat them as a single unit for most purposes, ignoring the order in which they were put into that slot. This amplifies the capacity of your LSQ and then amplifies your bandwidth when you write that LSQ entry to cache.

Is x86 completely ordered? I thought they ditched sequential consistency as a rule some time ago?

moinmoin · Dec 1, 2020

Hitman928 said:
No, he is referring to x86 using TSO for memory consistency, it has to do with read/write conflicts for multi-core architectures. He is saying that M1 can switch between TSO when running Rosetta and whatever it uses natively. I'm less familiar with ARM but I believe they use a relaxed memory consistency model more similar to POWER than x86. I have no idea if this is true or not but that is what he is claiming. If true, I also wonder if Intel tries to sue depending on how it is being implemented in the M1 architecture as they have been quick to sue other companies that have tried to implement similar hardware features though I think that was more to do with the instruction set itself.

x64 (so technically by AMD anyway?) seems to be one of the more inflexible memory ordering implementations, would be ironic for Intel to sue for it: https://en.wikipedia.org/wiki/Memory_ordering#cite_ref-mem_ord_pdf_8-0

Hitman928 · Dec 1, 2020

amrnuke said:
Is x86 completely ordered? I thought they ditched sequential consistency as a rule some time ago?

x86-TSO is sequential on the write side I believe.

amrnuke · Dec 1, 2020

Hitman928 said:
x86-TSO is sequential on the write side I believe.

Interesting!
Taking that difference to the hardware level for their emulation is one of those things @name99 talks about where Apple are doing things smarter and more intuitively, even if it means harder work on the uarch design side.

Hitman928 · Dec 1, 2020

moinmoin said:
x64 (so technically by AMD anyway?) seems to be one of the more inflexible memory ordering implementations, would be ironic for Intel to sue for it: https://en.wikipedia.org/wiki/Memory_ordering#cite_ref-mem_ord_pdf_8-0

x64 model is just inherited from x86. The TSO used by x86 is their own version of it so it may come down to implementation and patents but I'm just thinking out loud on this, I have no idea how specific this implementation is or if there are even any patents in this regard, I wouldn't be surprised if there's nothing to sue over, I just know that Intel has been very protective of other companies trying to support any kind of x86 hardware features or emulation in the past.

Hitman928 · Dec 1, 2020

amrnuke said:
Interesting!
This is one of those things @name99 talks about where Apple are doing things smarter and more intuitively, even if it means harder work on the uarch design side.

I'm not 100% on that as Intel's TSO documentation is not very good from what I understand and leaves a lot of things ambiguous as to how things are actually handled by the hardware. "Weaker" memory models also come with their own caveats mainly in putting more responsibility on the programmers to get things right. I am not a software engineer or digital designer, but I rub shoulders with quite a few so I get bit and pieces here and there. I know fences have come up a few times and it's usually not in a favorable way, but maybe it's just the group I associate with, lol.

Hitman928 · Dec 1, 2020

I'll just say too, that weak memory models aren't new, they've been used for decades, here's a blog that has a basic rundown of where each prominent CPU architecture stands on the strong-weak memory model spectrum.

Weak vs. Strong Memory Models

There are many types of memory reordering, and not all types of reordering occur equally often. It all depends on processor you’re targeting and/or the toolchain you’ …

preshing.com

amrnuke · Dec 1, 2020

Hitman928 said:
I'll just say too, that weak memory models aren't new, they've been used for decades, here's a blog that has a basic rundown of where each prominent CPU architecture stands on the strong-weak memory model spectrum.

Weak vs. Strong Memory Models

There are many types of memory reordering, and not all types of reordering occur equally often. It all depends on processor you’re targeting and/or the toolchain you’ …

preshing.com

I'll do some late-night reading tonight, and hopefully stay awake for it!

moinmoin · Dec 1, 2020

Hitman928 said:
x64 model is just inherited from x86. The TSO used by x86 is their own version of it so it may come down to implementation and patents but I'm just thinking out loud on this, I have no idea how specific this implementation is or if there are even any patents in this regard, I wouldn't be surprised if there's nothing to sue over, I just know that Intel has been very protective of other companies trying to support any kind of x86 hardware features or emulation in the past.

I'd think any patent Intel could sue over is already lapsed, AMD64 is by AMD, it mandates SSE2 for which, being introduced back in 2000, any linked patents incidentally should have lapsed this year.

Hitman928 said:
I'll just say too, that weak memory models aren't new, they've been used for decades, here's a blog that has a basic rundown of where each prominent CPU architecture stands on the strong-weak memory model spectrum.

Weak vs. Strong Memory Models

There are many types of memory reordering, and not all types of reordering occur equally often. It all depends on processor you’re targeting and/or the toolchain you’ …

preshing.com

That's essentially the table on the wikipedia page I linked before.

Here's AMD64 Architecture Programmer’s Manual Volume 2: System Programming, memory ordering starts at page 176. Here are Intel's 64 and IA-32 Architectures Software Developer Manuals. In Intel® 64 and IA-32 architectures software developer's manual volume 3A: System programming guide, part 1 Intel is pretty explicit that earlier x86 had a slightly weaker memory model that shouldn't be relied on anymore on x64 (page Vol. 3A 8-6):
"The Pentium and Intel486 processors follow the processor-ordered memory model; however, they operate as strongly-ordered processors under most circumstances. Reads and writes always appear in programmed order at the system bus—except for the following situation where processor ordering is exhibited. Read misses are permitted to go ahead of buffered writes on the system bus when all the buffered writes are cache hits and, there-fore, are not directed to the same address being accessed by the read miss.

In the case of I/O operations, both reads and writes always appear in programmed order.

Software intended to operate correctly in processor-ordered processors (such as the Pentium 4, Intel Xeon, and P6 family processors) should not depend on the relatively strong ordering of the Pentium or Intel486 processors. Instead, it should ensure that accesses to shared variables that are intended to control concurrent execution among processors are explicitly required to obey program ordering through the use of appropriate locking or serializing operations (see Section 8.2.5, “Strengthening or Weakening the Memory-Ordering Model”)."

Nicola Telecco · Dec 2, 2020

Reddit post: 4800U Cinebench R23 Scores/Power-level/#Threads

Interesting power efficiency data points. While I can neither confirm nor deny the poster's chart, that is in line with my expectations.

The ~10W MT 5500 score is of particular significance, if you consider that the ~10W M1 Air's score has been reported to drop down to 6400 level at thermal equilibrium, due to its fan-less design (Youtube link: Apple M1 MacBook Air - Thermals, Benchmarks & x86 Apps!)

jeanlain · Dec 2, 2020

Nicola Telecco said:
Reddit post: 4800U Cinebench R23 Scores/Power-level/#Threads

Interesting power efficiency data points. While I can neither confirm nor deny the poster's chart, that is in line with my expectations.

The ~10W MT 5500 score is of particular significance, if you consider that the ~10W M1 Air's score has been reported to drop down to 6400 level at thermal equilibrium, due to its fan-less design (Youtube link: Apple M1 MacBook Air - Thermals, Benchmarks & x86 Apps!)

The 14.6W setting is comparable to the M1 mini, which reportedly consumes about 15W on cinebench multi (not sure it that includes the RAM), and scores ~7800. The M1 has better perf/W here, but the difference is not as high as in other tests, in particular single-threaded tasks. It would have been nice to have the power usage of the 4800U in cinebench ST.

moinmoin · Dec 2, 2020

jeanlain said:
The 14.6W setting is comparable to the M1 mini, which reportedly consumes about 15W on cinebench multi (not sure it that includes the RAM), and scores ~7800. The M1 has better perf/W here, but the difference is not as high as in other tests, in particular single-threaded tasks. It would have been nice to have the power usage of the 4800U in cinebench ST.

As the table lists 1-16 threads, ST in essentially included:

So ST is between 1078 (9.8W), 1161 (14.3W) and 1183 (26W).

Nicola Telecco · Dec 2, 2020

jeanlain said:
The 14.6W setting is comparable to the M1 mini, which reportedly consumes about 15W on cinebench multi (not sure it that includes the RAM), and scores ~7800. The M1 has better perf/W here, but the difference is not as high as in other tests, in particular single-threaded tasks. It would have been nice to have the power usage of the 4800U in cinebench ST.

I agree it would be nice to have the power associated to the ST R23 bench. I will report if I find out.

jeanlain · Dec 2, 2020

moinmoin said:
So ST is between 1078 (9.8W), 1161 (14.3W) and 1183 (26W).

The CPU is unlikely to consume 26W on only one thread. However, the fact that it performs better at 14.3W than at 9.8W suggests that a single cinebench thread makes the CPU consume more than 9.8W, when unconstrained (the difference in performance between 14.3W and 26W may not be significant).
The M1 consumes 3.8W in cinebench ST (not sure if it's the whole package though), for a score of 1522.

amrnuke · Dec 2, 2020

jeanlain said:
The CPU is unlikely to consume 26W on only one thread. However, the fact that it performs better at 14.3W than at 9.8W suggests that a single cinebench thread makes the CPU consume more than 9.8W, when unconstrained (the difference in performance between 14.3W and 26W may not be significant).
The M1 consumes 3.8W in cinebench ST (not sure if it's the whole package though), for a score of 1522.

Both Andrei and the Lenovo benchmarker are just using package power reported, not measuring with a dedicated device. For the M1 that reported package power includes DRAM and other things but as you can see on Andrei's tweets, the contribution of the neural engine, DRAM, etc is minimal. The vast majority of power consumption is from the cores when running CB23. I imagine the same is largely true of the Renoir chips.

It would be far more fair, IMO, to compare an MBA to the Lenovo and measure at the wall power in CB23.

jeanlain · Dec 2, 2020

amrnuke said:
It would be far more fair, IMO, to compare an MBA to the Lenovo and measure at the wall power in CB23.

That's difficult to do on battery-powered devices.
And that depends on what you want to measure. If you're interested in CPU perf/W in particular, you'd rather exclude other sources of variation.

Bam360 · Dec 2, 2020

A CPU is always more efficient, in an embarrasingly parallel task like Cinebench, at the same power limit when it has more cores , because of the relationship between voltage and frequency, and the quadratic dependence that voltage has on power consumption. This can be seen in that test, where the 4800U keeps losing points when it has less cores or threads in use, even though the power consumption is the same (except 1 or 2T maybe, where it may not reach the full 15W for example). This is because it downclocks more the more threads or cores it has in use, reducing the voltage necessary to achieve that frequency.

Assuming The Stilt's voltage-frequency curve of Zen2 desktop is similar for Renoir, a 4/8 core at 3GHz and 0,81V would consume a similar amout of power as a 8/16 core at 2,3GHz and 0,66V, so you could have around 50% higher performance at the same power level in that scenario. A similar story happens with Intel laptop, where it was well known in reviews like Hardware Unboxed, where a 6 core was much more efficient in multithreaded tasks than the 4-core version, same will happen with the 8-core version.

All of this is to say, the only reason Renoir is remotely competitive in multithreaded eficiency, when you asses the massive delta in single threaded efficiency, is because:
-4800U has more or less double the effective cores than M1, and it downclocks very badly to be in the best sweet spot of the efficiency curve.
-M1 MBP and Mini don't even do this, they can maintain similar max ST frequency for all the cores in CB(big cores), so they are far from the sweet spot, the efficiency also increases when downclocking, as seen in MBA, the big boys get around 13% higher score but consuming 40-50% more (10-11W vs 15W).

You can compare multithreaded efficiency of CPU with different number of cores to assess efficiency difference between two CPU, but you can't do it to compare architectural efficiency, ie, you can compare M1 vs 4800U to compare efficiency between M1 and 4800U, but not to compare efficiency between Firestorm/Icestorm and Zen2.

dark zero · Dec 2, 2020

I need to say this but Apple expected to tests the waters with their M1, but on CPU side, it literally changed the game. On GPU side, only resisted the laptops with decent dedicated GPU. But seeing that the next generation might focus on GPU it won't last long.

jeanlain · Dec 2, 2020

Bam360 said:
-M1 MBP and Mini don't even do this, they can maintain similar max ST frequency for all the cores in CB(big cores), so they are far from the sweet spot, the efficiency also increases when downclocking, as seen in MBA, the big boys get around 13% higher score but consuming 40-50% more (10-11W vs 15W).

According to that reddit post, the 4800U @14.3W achieves a score of 6374, which isn't higher than the MBA at 9.2W (full package power). Unless you're referring to different results?

IvanKaramazov · Dec 2, 2020

jeanlain said:
According to that reddit post, the 4800U @14.3W achieves a score of 6374, which isn't higher than the MBA at 9.2W (full package power). Unless you're referring to different results?

I believe @Bam360 is comparing the M1 in the MBP and Mini to the same chip in the Air, where efficiency gains are larger than performance loss when downclocked (correct me if I'm wrong). The implication is that an 8+4 M1X or M2X or whatever with 8 performance cores could theoretically downclock all 8 cores to hit 15w and outperform a 4+4 M1 also running at 15w, because Cinebench is extremely parallel. Though I don't expect the 8+4 chip will do that, as it will probably only exist in chassis where ~28w sustained makes sense.

In other words, if you take the 4800U results at 14.3W and add 19% for Zen3 IPC improvements and 15% for a pretend move to 5nm, that would suggest a theoretical 5nm 5800U at 15W would outperform the M1 at 15W (6374*1.29=8,222 v ~7800 for the M1). However, that doesn't necessarily tell you anything about the relative architectural "goodness" of M1 v Zen 3 because Cinebench is so massively parallel, performing better on 8 cores at 15w than it does on 4 of those same cores running faster at 15w.

jeanlain · Dec 2, 2020

IvanKaramazov said:
In other words, if you take the 4800U results at 14.3W and add 19% for Zen3 IPC improvements and 15% for a pretend move to 5nm, that would suggest a theoretical 5800U at 15W would outperform the M1 at 15W (6374*1.29=8,222 v ~7800 for the M1)

1.19 * 1.15 = 1.36.

Discussion Apple Silicon SoC thread

Lifer

Diamond Member

Lifer

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Junior Member

Member

Diamond Member

Junior Member

Member

Golden Member

Member

Member

Platinum Member

Member

Member

Member