Lion Cove's true L1 is what they call L0. It's basically marketing. L1i is only 64KB with 5 cycle latency. And it's "L1" has 9 cycle latency. It's L2 class latency of Nehalem days.
ARM's way of caches and clocks are far superior. Apple achieves better performance with lot lower power. The ARM chips aren't that far behind either. It is embarassing.
The scientists making these have long noted that power required for memory accesses is a big limiter in making it perform better. Having large L1 is a very good basic idea, which can't be done on 5.7GHz processors without insane latency.
The difference between a 19-stage processor and 9 stage one is only 27%. And x86 vendors have to pull out all the tricks to get there. Small, high latency cache levels, lowered uncore clocks, stability issues. It's not worth it. And that gap is going to shrink further. At what point you have to ask: Is it worth it?
Then you go look at ARM servers doing high core counts...... oh look far worse cache implementations......
so if i approach this like you ,
how many 144+ core apple SOC's are there , 0 so x86 cache implementation = infinity better
how many 144+ core AWS SOC's are there , 0 so x86 cache implementation = infinity better
lets look at graviton V4, only 32k L1D , only 2mb L2 per core with almost zero L3 which means lots of extra memory accesses then modern x86 server cores.......
lets look at ampere A192 64kb but its write thought ( great for power right ) 2mb L2 per core with zero L3 but memory control side cache. which means lots of extra memory accesses then modern x86 server cores, that totally doesn't show up in benchmarks like
https://www.servethehome.com/ampere...permicro-nvidia-broadcom-kioxia-server-cpu/2/
man ARM is so far behind its
EMBARASSING
Or lets not be idiots and actually evaluate things with consideration to their TAM/target markets, different vendors are making different trade offs.