Discussion Apple Silicon SoC thread

Page 206 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
23,752
1,284
126
M1
5 nm
Unified memory architecture - LP-DDR4
16 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 12 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache
(Apple claims the 4 high-effiency cores alone perform like a dual-core Intel MacBook Air)

8-core iGPU (but there is a 7-core variant, likely with one inactive core)
128 execution units
Up to 24576 concurrent threads
2.6 Teraflops
82 Gigatexels/s
41 gigapixels/s

16-core neural engine
Secure Enclave
USB 4

Products:
$999 ($899 edu) 13" MacBook Air (fanless) - 18 hour video playback battery life
$699 Mac mini (with fan)
$1299 ($1199 edu) 13" MacBook Pro (with fan) - 20 hour video playback battery life

Memory options 8 GB and 16 GB. No 32 GB option (unless you go Intel).

It should be noted that the M1 chip in these three Macs is the same (aside from GPU core number). Basically, Apple is taking the same approach which these chips as they do the iPhones and iPads. Just one SKU (excluding the X variants), which is the same across all iDevices (aside from maybe slight clock speed differences occasionally).

EDIT:



M1 Pro 8-core CPU (6+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 14-core GPU
M1 Pro 10-core CPU (8+2), 16-core GPU
M1 Max 10-core CPU (8+2), 24-core GPU
M1 Max 10-core CPU (8+2), 32-core GPU

M1 Pro and M1 Max discussion here:


M1 Ultra discussion here:


M2 discussion here:


Second Generation 5 nm
Unified memory architecture - LPDDR5, up to 24 GB and 100 GB/s
20 billion transistors

8-core CPU

4 high-performance cores
192 KB instruction cache
128 KB data cache
Shared 16 MB L2 cache

4 high-efficiency cores
128 KB instruction cache
64 KB data cache
Shared 4 MB L2 cache

10-core iGPU (but there is an 8-core variant)
3.6 Teraflops

16-core neural engine
Secure Enclave
USB 4

Hardware acceleration for 8K h.264, h.264, ProRes

M3 Family discussion here:


M4 Family discussion here:

 
Last edited:

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
One thing I haven't seen discussion of that probably deserves it is the name. Apple specifically chose to name it "A17 Pro".

That kind of implies to me that the non-Pro iPhone 16s will not get this SoC next year. If not, what will they get? Will it be as simple as one less GPU core (which they have binned on already) or will there be larger differences between A17 Pro and next year's "A17 Fusion" or whatever it may be called in the iPhone 16? Is it possible it will be a separate SoC design, and if so what might they differentiate on?
Good question. Of course, Apple could just call it the 'A17' and drop the Pro moniker. Marketing FTW. 99.99% of iPhone buyers won't even notice.
 

Jan Olšan

Senior member
Jan 12, 2017
312
402
136
For example the 525.x264 benchmark tests video compression.
Can't speak for the other ones, but the x264 subtest is total garbage if you expect it should be doing that.

x264 is SIMD-heavy code that is hand-optimised, written for yasm (over 50 % spent in SIMDed functions). It also has ARM assembly for running on ARM. But what does the SPEC bench do? Compile it with --disable-asm. And that completely cripples the encoder, disables the use of any asm and as a result, the binary uses hardly any *relevant* SIMD, you will be benchmarking C code (scalar). Because this isn't code that autovectorizes. Compiler might try to SIMD some parts, but in some stupid way.

I suppose that SPEC authors did this to basically level the field between x86 and other platforms, but it makes the banchmark utterly useless if you want to claim it represents anything from reality. This is not even close to how H.264 encoding workload looks.

I can't speak about the other workloads of the suite, maybe some are similarly idiotic too, maybe this is an exception.
 

Mopetar

Diamond Member
Jan 31, 2011
8,005
6,449
136
Can't speak for the other ones, but the x264 subtest is total garbage if you expect it should be doing that.

x264 is SIMD-heavy code that is hand-optimised, written for yasm (over 50 % spent in SIMDed functions). It also has ARM assembly for running on ARM. But what does the SPEC bench do? Compile it with --disable-asm. And that completely cripples the encoder, disables the use of any asm and as a result, the binary uses hardly any *relevant* SIMD, you will be benchmarking C code (scalar). Because this isn't code that autovectorizes. Compiler might try to SIMD some parts, but in some stupid way.

I suppose that SPEC authors did this to basically level the field between x86 and other platforms, but it makes the banchmark utterly useless if you want to claim it represents anything from reality. This is not even close to how H.264 encoding workload looks.

I can't speak about the other workloads of the suite, maybe some are similarly idiotic too, maybe this is an exception.

For a CPU test you'd want to do that. For an SoC it would just used a dedicated hardware encoder otherwise, which is great if you want to compare the SoC hardwares encode/decoder against something else, but that's obviously no longer a CPU test. The dedicated hardware path will smoke a CPU software path. Once upon a time my iPad with hardware x264 encoding could render videos faster than a multicore Xeon workstation Mac that cost over 10x as much.

Of course SoCs don't have hardware decoders/encoders for everything so it's useful to see how they do with a software encoder/decoder that hasn't been tuned for the CPU. Otherwise the results would be biased which makes it less useful as a benchmark to compare CPU performance.
 
Reactions: Tlh97

Doug S

Platinum Member
Feb 8, 2020
2,486
4,048
136
For a CPU test you'd want to do that. For an SoC it would just used a dedicated hardware encoder otherwise, which is great if you want to compare the SoC hardwares encode/decoder against something else, but that's obviously no longer a CPU test. The dedicated hardware path will smoke a CPU software path. Once upon a time my iPad with hardware x264 encoding could render videos faster than a multicore Xeon workstation Mac that cost over 10x as much.

Of course SoCs don't have hardware decoders/encoders for everything so it's useful to see how they do with a software encoder/decoder that hasn't been tuned for the CPU. Otherwise the results would be biased which makes it less useful as a benchmark to compare CPU performance.

Back in the day SPEC was specific about stating it was not intended as a CPU test, but as a system test.

Thus the large memory footprints relative to common memory sizes (at least back in the SPEC heyday of the 90s and 00s) which brought the memory subsystem to the forefront. If it had been designed as primarily a CPU benchmark you wouldn't want benchmarks that as sensitive to memory bandwidth like a lot of the FP suite, or to memory latency like some of the INT suite.

I'm not saying it would provide more useful results if it allowed use of dedicated hardware, but that absolutely would be keeping in spirit with SPEC's OG philosophy.
 

Nothingness

Platinum Member
Jul 3, 2013
2,750
1,396
136
Why don't people go to the source? SPEC makes SPEC CPU goals clear:

SPEC CPU 2017 focuses on compute intensive performance, which means these benchmarks emphasize the performance of:
  • Processor - The CPU chip(s).
  • Memory - The memory hierarchy, including caches and main memory.
  • Compilers - C, C++, and Fortran compilers, including optimizers.
SPEC CPU 2017 intentionally depends on all three of the above - not just the processor.
SPEC CPU 2017 is not intended to stress other computer components such as networking, graphics, Java libraries, or the I/O system. Note that there are other SPEC benchmarks that focus on those areas.

Various other characteristics are described on these pages including what each workload does.

I suppose that SPEC authors did this to basically level the field between x86 and other platforms, but it makes the banchmark utterly useless if you want to claim it represents anything from reality. This is not even close to how H.264 encoding workload looks.

I can't speak about the other workloads of the suite, maybe some are similarly idiotic too, maybe this is an exception.
Rather than making accusations (highlights are mine), you could propose benchmarks to SPEC if you think you're qualified. The window for SPECv8 is likely closed (cf https://www.spec.org/cpuv8/) but they will probably make a call for SPECv9 which will give you the opportunity to propose something.

As far as x264 in SPEC goes it's interesting for two reasons: it shows how well a compiler can use SIMD instructions and it shows how well a rather high IPC workload performs. Various low level characteristics are described here: https://research.spec.org/icpe_proceedings/2019/proceedings/p285.pdf

For memory usage, all CPU design teams I know are using SPEC CPU rate with a single instance running. This significantly reduces the requirements. It's also how it's done by most review sites (in particular for phones where memory is limited).
 

jeanlain

Member
Oct 26, 2020
159
136
86
I would really love for Apple to make a bold move and produce a "premium" steam deck style console. With whole emulation on Linux thing going and Apple's efforts i think we are not far from revolution in pseudo crossplatform gaming.
Performance under emulation will be degraded. Apple compared the performance of their D3DMetal translator (game porting toolkit) to a native version of "The Medium" (which doesn't appear to be well optimized). The native version performed with doubled frame rate.

The fact that the A17 beats the Steam Deck GPU (before throttling) is expected since the latter has about 63-65% the FLOPs, texture and pixel filtrate of the M1 (which is the only Apple GPU of which we know the official metrics), and the A17 achieves ~85% the performance of the M1 in Metal benchmarks. Also 85% is not far from 3.77GHz*6 / (3.2GHz*8), assuming the GPU clock speed si proportional to the P-core clock speed.
By the same extrapolation, the A17 should have about 1/2 the TFLOPs, pixel and texture filtrate of the GTX1060. That it achieves half the performance of the GTX 1060 in RE Village is consistent.
What's impressive is that it consumes <10W.
 
Last edited:

Eug

Lifer
Mar 11, 2000
23,752
1,284
126
According to the Wall Street Journal, insiders are claiming Apple’s modem is 3 years behind Qualcomm’s.


At this rate, maybe we won’t see an Apple modem for several years.

That said, just a few months ago, Qualcomm’s own CEO predicted Apple may start using its own modem as soon as 2024.

 
Reactions: Ajay

Mopetar

Diamond Member
Jan 31, 2011
8,005
6,449
136
Back in the day SPEC was specific about stating it was not intended as a CPU test, but as a system test.

In a way that's correct, and you could just as well use it to test different compilers as you could different CPUs, but there's a reason that SPEC has so many different individual tests. Any one of them in isolation isn't going to be a good system test unless the purpose of the system is very narrow in scope.

Thus the large memory footprints relative to common memory sizes (at least back in the SPEC heyday of the 90s and 00s) which brought the memory subsystem to the forefront. If it had been designed as primarily a CPU benchmark you wouldn't want benchmarks that as sensitive to memory bandwidth like a lot of the FP suite, or to memory latency like some of the INT suite.

The memory subsystem is still a part of the CPU and if it's holding back the rest of the processor, it's not much different than putting rubbish tires on a Ferrari.

There are some benchmarks that will try to exclude that aspect of a CPU and just see how much raw throughput it is capable of even if it can't really be tapped because most programs will bottleneck at some other point such that the execution units can't be fed.

SPEC as a whole has a lot of different tests that will exercise different parts of the CPU and expose areas where it's strong and others where it's weak.

I'm not saying it would provide more useful results if it allowed use of dedicated hardware, but that absolutely would be keeping in spirit with SPEC's OG philosophy.

In some ways the result is useful. If you want to do encoding, knowing how much better the dedicated hardware path is than another system without one is important. It really depends on what perspective you have.

For AT's reviews it was about exploring the differences in the CPU cores between various ARM and x86 processors and less about the products that contained those chips.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
Some sources say it's regular N4.

Anyway, N4P was available last year?

The Dimensity 9200 and SD8G2 use regular N4, no?
Hmm, some site do say N4, but could be doing as N4P is just a variant. It was available in time to manufacture the A16 Bionic. As to the others, I only pay attention to Apple's SoCs (well, Qualcomm and occasionally Samsung).
 

Doug S

Platinum Member
Feb 8, 2020
2,486
4,048
136
According to the Wall Street Journal, insiders are claiming Apple’s modem is 3 years behind Qualcomm’s.


At this rate, maybe we won’t see an Apple modem for several years.

That said, just a few months ago, Qualcomm’s own CEO predicted Apple may start using its own modem as soon as 2024.



I saw a story elsewhere earlier today claiming Apple will be using their modem in the new iPhone SE next spring.

The claim was that since the SE won't support mmwave 5G and that's apparently where some of their issues lie it would work OK for that iPhone and give them plenty of real world data (which would help identify corner cases in the baseband software)

Just because Apple extended the deal for three years from now doesn't necessarily mean Apple is three years behind Qualcomm as that article implies. The contract was originally a set number of years (I think 4?) plus a two year option. There would presumably be some date by which Apple would have to inform Qualcomm whether or not they would exercise that option. So probably that date approached, Apple was not ready, so they told Qualcomm they would exercise it. They would have be 100% sure they were ready to start using their modem in next year's iPhone if they did not exercise that option, and clearly did not feel they were.
 

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
Performance under emulation will be degraded. Apple compared the performance of their D3DMetal translator (game porting toolkit) to a native version of "The Medium" (which doesn't appear to be well optimized). The native version performed with doubled frame rate.

The fact that the A17 beats the Steam Deck GPU (before throttling) is expected since the latter has about 63-65% the FLOPs, texture and pixel filtrate of the M1 (which is the only Apple GPU of which we know the official metrics), and the A17 achieves ~85% the performance of the M1 in Metal benchmarks. Also 85% is not far from 3.77GHz*6 / (3.2GHz*8), assuming the GPU clock speed si proportional to the P-core clock speed.
By the same extrapolation, the A17 should have about 1/2 the TFLOPs, pixel and texture filtrate of the GTX1060. That it achieves half the performance of the GTX 1060 in RE Village is consistent.
What's impressive is that it consumes <10W.
They don’t need an emulator. The games are coming to them. Looking at the upcoming games on the app store, there are some popular games coming out such as Warframe and Shapez. I suspect there will be more soon.
 

Doug S

Platinum Member
Feb 8, 2020
2,486
4,048
136
They don’t need an emulator. The games are coming to them. Looking at the upcoming games on the app store, there are some popular games coming out such as Warframe and Shapez. I suspect there will be more soon.

Games coming to the iPhone or Mac doesn't mean there won't be some emulation overhead. There are plenty of Mac games that take a hit from an internal DirectX to Metal translation layer, plus a hit from not being designed for the Apple GPU's deferred rendering.

I would assume games on iOS don't suffer from those issues, unless they are ports from PC/Xbox (in which case the Android versions probably suffer similar internal emulation/translation from DirectX calls)
 

H433x0n

Golden Member
Mar 15, 2023
1,068
1,272
96

Geekerwan released English speaking version of their review.
He got a new subscriber, really impressed by this video.

I’m pretty convinced that TSMC N3 got shipped prematurely. I get why Apple was allegedly paying for known good die now, the economics of N3B wouldn’t make sense otherwise. The N3B node seems to be a practical joke.
 

poke01

Golden Member
Mar 8, 2022
1,395
1,611
106
The ENG version of A17 Pro from Geekerwan is much better for western audiences.

Also M3 is going to REALLY powerful if A17 Pro is anything to go by. 9 wide decode and those new ecores are packing a punch.
 

trivik12

Senior member
Jan 26, 2006
321
288
136
M3 looks really promising. Question is when will they release it. Since that has way more of performance cores, TDP will have to go up for sure relative to M2. But these chips are so efficient relative to x86 chips that it does not matter at all for laptop form factor. I only hope base RAM goes up from 8GB(I am not too optimistic)
 

FlameTail

Diamond Member
Dec 15, 2021
3,157
1,804
106
M3 looks really promising. Question is when will they release it. Since that has way more of performance cores, TDP will have to go up for sure relative to M2. But these chips are so efficient relative to x86 chips that it does not matter at all for laptop form factor. I only hope base RAM goes up from 8GB(I am not too optimistic)
Ye. Base RAM should become 12 GB.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |