Info TOP 20 of the World's Most Powerful CPU Cores - IPC/PPC comparison

Richie Rich · May 15, 2020

Added cores:

A53 - little core used in some low-end smartphones in 8-core config (Snapdragon 450)
A55 - used as little core in every modern Android SoC
A72 - "high" end Cortex core used in Snapdragon 625 or Raspberry Pi 4
A73 - "high" end Cortex core
A75 - "high" end Cortex core
Bulldozer - infamous AMD core

Geekbench 5.1 PPC chart 6/23/2020:

Pos	Man	CPU	Core	Year	ISA	GB5 Score	GHz	PPC (score/GHz)	Relative to 9900K	Relative to Zen3
1	Nuvia	(Est.)	Phoenix (Est.)	2021	ARMv9.0	2001	3.00	667.00	241.0%	194.1%
2	Apple	A15 (est.)	(Est.)	2021	ARMv9.0	1925	3.00	641.70	231.8%	186.8%
3	Apple	A14 (est.)	Firestorm	2020	ARMv8.6	1562	2.80	558.00	201.6%	162.4%
4	Apple	A13	Lightning	2019	ARMv8.4	1332	2.65	502.64	181.6%	146.3%
5	Apple	A12	Vortex	2018	ARMv8.3	1116	2.53	441.11	159.4%	128.4%
6	ARM Cortex	V1 (est.)	Zeus	2020	ARMv8.6	1287	3.00	428.87	154.9%	124.8%
7	ARM Cortex	N2 (est.)	Perseus	2021	ARMv9.0	1201	3.00	400.28	144.6%	116.5%
8	Apple	A11	Monsoon	2017	ARMv8.2	933	2.39	390.38	141.0%	113.6%
9	Intel	(Est.)	Golden Cove (Est.)	2021	x86-64	1780	4.60	386.98	139.8%	112.6%
10	ARM Cortex	X1	Hera	2020	ARMv8.2	1115	3.00	371.69	134.3%	108.2%
11	AMD	5900X (Est.)	Zen 3 (Est.)	2020	x86-64	1683	4.90	343.57	124.1%	100.0%
12	Apple	A10	Hurricane	2016	ARMv8.1	770	2.34	329.06	118.9%	95.8%
13	Intel	1065G7	Icelake	2019	x86-64	1252	3.90	321.03	116.0%	93.4%
14	ARM Cortex	A78	Hercules	2020	ARMv8.2	918	3.00	305.93	110.5%	89.0%
15	Apple	A9	Twister	2015	ARMv8.0	564	1.85	304.86	110.1%	88.7%
16	AMD	3950X	Zen 2	2019	x86-64	1317	4.60	286.30	103.4%	83.3%
17	ARM Cortex	A77	Deimos	2019	ARMv8.2	812	2.84	285.92	103.3%	83.2%
18	Intel	9900K	Coffee LakeR	2018	x86-64	1384	5.00	276.80	100.0%	80.6%
19	Intel	10900K	Comet Lake	2020	x86-64	1465	5.30	276.42	99.9%	80.5%
20	Intel	6700K	Skylake	2015	x86-64	1032	4.00	258.00	93.2%	75.1%
21	ARM Cortex	A76	Enyo	2018	ARMv8.2	720	2.84	253.52	91.6%	73.8%
22	Intel	4770K	Haswell	2013	x86-64	966	3.90	247.69	89.5%	72.1%
23	AMD	1800X	Zen 1	2017	x86-64	935	3.90	239.74	86.6%	69.8%
24	Apple	A13	Thunder	2019	ARMv8.4	400	1.73	231.25	83.5%	67.3%
25	Apple	A8	Typhoon	2014	ARMv8.0	323	1.40	230.71	83.4%	67.2%
26	Intel	3770K	Ivy Bridge	2012	x86-64	764	3.50	218.29	78.9%	63.5%
27	Apple	A7	Cyclone	2013	ARMv8.0	270	1.30	207.69	75.0%	60.5%
28	Intel	2700K	Sandy Bridge	2011	x86-64	723	3.50	206.57	74.6%	60.1%
29	ARM Cortex	A75	Prometheus	2017	ARMv8.2	505	2.80	180.36	65.2%	52.5%
30	ARM Cortex	A73	Artemis	2016	ARMv8.0	380	2.45	155.10	56.0%	45.1%
31	ARM Cortex	A72	Maya	2015	ARMv8.0	259	1.80	143.89	52.0%	41.9%
32	Intel	E6600	Core2	2006	x86-64	338	2.40	140.83	50.9%	41.0%
33	AMD	FX-8350	BD	2011	x86-64	566	4.20	134.76	48.7%	39.2%
34	AMD	Phenom 965 BE	K10.5	2006	x86-64	496	3.70	134.05	48.4%	39.0%
35	ARM Cortex	A57 (est.)	Atlas	0	ARMv8.0	222	1.80	123.33	44.6%	35.9%
36	ARM Cortex	A15 (est.)	Eagle	0	ARMv7 32-bit	188	1.80	104.65	37.8%	30.5%
37	AMD	Athlon 64 X2 3800+	K8	2005	x86-64	207	2.00	103.50	37.4%	30.1%
38	ARM Cortex	A17 (est.)		0	ARMv7 32-bit	182	1.80	100.91	36.5%	29.4%
39	ARM Cortex	A55	Ananke	2017	ARMv8.2	155	1.60	96.88	35.0%	28.2%
40	ARM Cortex	A53	Apollo	2012	ARMv8.0	148	1.80	82.22	29.7%	23.9%
41	Intel	Pentium D	P4	2005	x86-64	228	3.40	67.06	24.2%	19.5%
42	ARM Cortex	A7 (est.)	Kingfisher	0	ARMv7 32-bit	101	1.80	56.06	20.3%	16.3%

TOP 10 - Performance Per Area comparison at ISO-clock (PPA/GHz)

Copied from locked thread. They try to avoid people to see this comparison how x86 is so bad.[/B]

Pos	Man	CPU	Core	Core Area mm2	Year	ISA	SPEC PPA/Ghz	Relative
1	ARM Cortex	A78	Hercules	1.33	2020	ARMv8	9.41	100.0%
2	ARM Cortex	A77	Deimos	1.40	2019	ARMv8	8.36	88.8%
3	ARM Cortex	A76	Enyo	1.20	2018	ARMv8	7.82	83.1%
4	ARM Cortex	X1	Hera	2.11	2020	ARMv8	7.24	76.9%
5	Apple	A12	Vortex	4.03	2018	ARMv8	4.44	47.2%
6	Apple	A13	Lightning	4.53	2019	ARMv8	4.40	46.7%
7	AMD	3950X	Zen 2	3.60	2019	x86-64	3.02	32.1%

It's impressive how fast are evolving the generic Cortex cores:

A72 (2015) which can be found in most SBC has 1/3 of IPC of new Cortex X1 - They trippled IPC in just 5 years.
A73 and A75 (2017) which is inside majority of Android smart phones today has 1/2 IPC of new Cortex X1 - They doubled IPC in 3 years.

Comparison how x86 vs. Cortex cores:

A75 (2017) compared to Zen1 (2017) is loosing massive -34% PPC to x86. As expected.
A77 (2019) compared to Zen2 (2018) closed the gap and is equal in PPC. Surprising. Cortex cores caught x86 cores.
X1 (2020) is another +30% IPC over A77. Zen3 need to bring 30% IPC jump to stay on par with X1.

Comparison to Apple cores:

AMD's Zen2 core is slower than Apple's A9 from 2015.... so AMD is 4 years behind Apple
Intel's Sunny Cove core in Ice Lake is slower than Apple's A10 from 2016... so Intel is 3 years behind Apple
Cortex A77 core is slower than Apple's A9 from 2015.... but
New Cortex X1 core is slower than Apple's A11 from 2017 so ARM LLC is 3 years behind Apple and getting closer

GeekBench5.1 comparison from 6/22/2020:

added Cortex X1 and A78 performance projections from Andrei here
2020 awaiting new Apple A14 Firestorm core and Zen3 core

Updated:

EDIT:
Please note to stop endless discussion about PPC frequency scaling: To have fair and clean comparison I will use only the top (high clocked) version from each core as representation for top performance.

Richie Rich · Jul 2, 2020

LightningZ71 said:
Wow Richie... Just wow. The frequency cap of any given node (for each architecture) is related to the maximum stable operating frequency of the slowest critical path. Its is instruction set agnostic. However, I will grant you that having a front end that has to decode CICS instructions into essentially a bunch of RISC instructions to then send the final micro-ops into the various execution units serves as a complexity barrier, creating complex critical paths of its own that can put a cap on frequencies. So, yes, x86, being more complex, can effectively limit frequency on a given node, but, it is not, itself, the cause of that cap.

No, x86 is NOT about critical path anymore. Since Sandy Bridge reached 5.0 GHz on 32nm, every next node (22nm, 14nm) brings faster and more efficient transistor switching, so critical path can clock higher and higher. Every other IPs like ARMs are increasing frequency fine, so we know that physical law works fine. But reality is that 5 GHz is still cap for x86. How is that?

Maybe ask any overclocker about thermal limit and degradation.
Maybe think about that every new node allows 2x as much transistors in same are while decreasing power consumption only 30% - which leads to higher heat density per mm2.

As a example 5nm TSMC:

+84% more transistors, -30% power consumption or +15% frequency gain
at iso-clock 5GHz.... 1.84 * 0.7 = 1.288 x higher heat density per mm2
at +15% frequency gain (5.75 GHz -> power 1.15^3=1.52x higher power consumption)...1.84 * 1.52 = 2.8x higher heat density per mm2

This is the answer why x86 cannot go higher over 5 GHz and new node like Intel's 10nm shows massive frequency dip. Even AMD's Norrod mentioned that we can expect decreasing frequencies. They are lazy to rebuilt core to be more power efficient and be able to challenge heat density and scale up to 6 GHz. So they are stuck at 5 GHz for a DECADE jeez!

While super efficient ARMs they are not heat limited yet so increasing freq with every new node DESPITE massive IPC gains:

A11@10nm... 2.4 GHz
A12@7nm ... 2.54 GHz
A13@N7P.... 2.64 GHz

In other words power efficient ARM core designed for high frequency would run around 6 GHz today (twice power efficient ARM core 2^(1/3) = 1.26x higher freq = 5*1.26 = 6.3 Ghz when hitting same thermal limit). x86's inefficiency hit thermal limits a decade ago, not a speed limits. Laziness, and earning easy money when AMD was down with BD.

LightningZ71 · Jul 2, 2020

name99 said:
What does SMT performance have to do with single-threaded performance, which is what we are discussing? All your anecdote about SMT4 says is that SMT (even SMT2) is a stupid idea -- which I have been saying for years!
(There IS a very specific way to implement something that superficially looks like SMT, but which avoids the flaws of standard SMT, but that's for another thread.)

It is certainly true that extracting more performance from going wider requires exponentially more transistors. However since every process advance (for those of us still ON process advances) provides exponentially more transistors, this is not exactly a problem.

Will it last forever? No, of course not. No-one is claiming that.
What's being claimed is that RIGHT NOW, and for the foreseeable future, going wider is a better bet than higher frequency.

As for Intel, you are letting them off way too easy. Intel's failures are the result of a deliberate choice to prioritize finance and marketing over daring engineering. A strategy that works great if your goal is to make lots of money today -- and not so great if your goal is to keep the company relevant for the next twenty years...

FOR EXAMPLE Intel had the chance, when they decided to create a mobile chip, to
- go with full x86
- go with a simplified x86; call it x86v8! Something that would make the job of porting compilers, libraries, OS's easy, but not be binary compatible
- go with a start from scratch modern-design ISA
They CHOSE the first option. No-one force them to. There was no body of existing code that they had to support. They could have done anything, but they chose the cheap easy option. Plenty of us at the time said that was stupid, said it would fail in exactly the ways it did fail.
This was not "who could possibly have predicted?", it was very stupid very greedy management making bad decisions.

SMT performance certainly impacts single thread performance in moderate to heavy load situations. In situations where the second thread isn't scheduled, the CPU's own handling of the second thread won't cause a noticeable performance hit. However, in situations where both threads are at least moderately utilized, overhead certainly creeps into the picture. That overhead negatively impacts the compute throughput of each thread. There is contention for execution units and memory ports. All of that adds up. In addition, with SMT enabled, the various execution units are more heavily utilized, increasing heat generation. That heat can only leave the core so fast, and, as a result, negatively affects the ability of the processor to boost (assuming that it's operating near its thermal limits, which is often not the case in server CPUs). Those execution units being more utilized also eats up power budget, which can also affect boosting behavior.

SMT is not a free lunch. It's main purpose was two fold: To make better utilization of the extra execution units in overly wide CPUs and to hide memory access latency (when one thread stalls on a memory read, the other thread can be dynamically allocated more execution units and continue working while the first thread is context switched out, or waits for the read from cache). In situations where there is high near cache locality for both threads, and both threads are highly branchy, or are contending for a limited execution unit resource, the net effect can be to reduce performance for both threads below 50% of the core's single thread capabilities due to admin overhead for managing the threads.

As for going wider not being an issue, even with the ever increasing transistor counts for those cores, it very much IS a problem. Those transistors are NOT free. They all represent heat sources, power sinks, AND they continue to increase the relative size of the whole processor. One of the advantages given for new nodes is that you can get more of the same die design off of a given wafer. If you continue to make the CPU larger with respect to transistor count, you negate that advantage. Each node is getting more complex and more expensive per wafer. You have to get higher yield from each wafer to be able to break even. Every additional transistor is also a new potential point of failure, meaning that you are under more and more yield pressure. Again, NO FREE LUNCH.

Can going wider show an improvement right now? Yes. I'm not disputing that. I'm challenging the notion that going VERY wide and trying to make SMT4 work well is the way to go. People dump on x86-64 because of how complex the front end of the CPU has to be to get good performance. You don't think that trying to do SMT4 and having to deal with essentially a doubling of the front end and loads more buffers is going to come for free? There's a reason that there are not a whole lot of SMT ARM implementations out there, and it's not just because the companies are short sighted...

LightningZ71 · Jul 2, 2020

Richie Rich said:
No, x86 is NOT about critical path anymore. Since Sandy Bridge reached 5.0 GHz on 32nm, every next node (22nm, 14nm) brings faster and more efficient transistor switching, so critical path can clock higher and higher. Every other IPs like ARMs are increasing frequency fine, so we know that physical law works fine. But reality is that 5 GHz is still cap for x86. How is that?

Maybe ask any overclocker about thermal limit and degradation.
Maybe think about that every new node allows 2x as much transistors in same are while decreasing power consumption only 30% - which leads to higher heat density per mm2.

As a example 5nm TSMC:

+84% more transistors, -30% power consumption or +15% frequency gain

at iso-clock 5GHz.... 1.84 * 0.7 = 1.288 x higher heat density per mm2

at +15% frequency gain (5.75 GHz -> power 1.15^3=1.52x higher power consumption)...1.84 * 1.52 = 2.8x higher heat density per mm2

This is the answer why x86 cannot go higher over 5 GHz and new node like Intel's 10nm shows massive frequency dip. Even AMD's Norrod mentioned that we can expect decreasing frequencies. They are lazy to rebuilt core to be more power efficient and be able to challenge heat density and scale up to 6 GHz. So they are stuck at 5 GHz for a DECADE jeez!

While super efficient ARMs they are not heat limited yet so increasing freq with every new node DESPITE massive IPC gains:

A11@10nm... 2.4 GHz

A12@7nm ... 2.54 GHz

A13@N7P.... 2.64 GHz

In other words power efficient ARM core designed for high frequency would run around 6 GHz today (twice power efficient ARM core 2^(1/3) = 1.26x higher freq = 5*1.26 = 6.3 Ghz when hitting same thermal limit). x86's inefficiency hit thermal limits a decade ago, not a speed limits. Laziness, and earning easy money when AMD was down with BD.

Richie, you just referenced asking an overclocker about Thermal limit and degradation in a thread about 5Ghz being a hard frequency cap for x86, yet it's overclockers that have been demonstrating that, with LN2, x86 can perform at 8+ Ghz, and did this years ago. The limit isn't the x86 instruction set. The limit is both lithography, with respect to how efficient it is at dissipating heat, and how effective it is at preventing electromigration, and with the absolute switching speed of the various transistors coupled with how well the design manages the flow of electrons across its medium. x86 has lived in a world where power is abundant and cooling is effective and ACTIVE, not passive, for all but its lowest power designs. Given that world, they designed for that sort of target, using the fact that the silicon was still capable of reliably switching at high speeds at those higher thermal and power levels on the silicon power/performance curve. Apple ARM has been focusing on staying on the lower end of that curve of their silicon because they don't have active cooling in any of the devices that their chips have been targeted at.

The two different approaches are targeted at two different design targets. Intel learned that you can try to push this too far with the Pentium IV, and had to go wider and slower with the Core architecture. They're realizing that they have to do it again, as their next generation of architectures are also looking to be wider.

Apple can, and likely will, make the decision to optimize future A series processors for higher thermal envelope targets, while ALSO keeping a line for mobile devices that are entirely thermally constrained and passively cooled. There are tweaks that can be made to the libraries used during the development phase for each node that can enhance frequency, or density, or power/thermal profile. While you can also enable higher frequencies for processors by lengthening pipelines, I doubt that Apple will do that to their specific desktop SKUs as it would have other affects on software performance profiling that would incur other costs during software development.

To your point that a power efficient ARM core designed for high frequency being capable of 6Ghz, there's nothing in the instruction set that makes that more or less possible. Could someone design an ARM core that can reach 6Ghz with active cooling? If thermals and power were no object, then sure. However, looking at the leading edge nodes that are out there, that power and thermal curve gets really, really ugly that far over to the right. That's a function of the lithography. Your math completely overlooks the fact that the power/performance curve is just that, a curve, not a straight line. To give you an example, when you look at power vs. clock curves for AMD processors, you see that, for each 100 Mhz over around 4.2Ghz, power consumption and heat generation increase at an ever increasing rate? The processor isn't magically growing new transistors to consume that power, it's just getting that much harder to make the transistors switch faster and faster. You can relax this somewhat by loosening up the design rules for the node, but, there are limits.

Apple, Samsung, Mediatek, they've all been living on the left side of the power/performance curve for well over a decade in an all-in effort to keep power use and thermal output under tight control. You can see some of the late model "high performance" flagship handsets push this curve a bit too far and start to suffer from heat soak and throttling in several of Anandtech's reviews. You sing the praises of ARM without taking into account the targeted platform for most of the designs that you are trotting out as paragons of circuit design virtue.

If you want to keep power and heat in check, you go wider and wider, and add more and more cores at those low clock frequencies to get the work done that you need to. Eventually, you run out of parallel workloads. What then? Why is it that Apple's leading mobile SoCs have had only two HP cores for a while now and 4 LP cores? It's because they optimize for the platform, phone handsets are highly single app focused. You don't need more and more HP cores to make a single app perform. So, if you're only going to have two cores, and you can't go very far to the right on the power/performance curve, the only thing left that you can do is deepen and enlarge the caches as well as throw the kitchen sink at the HP core by widening it as much as is practical to get the IPC sky high. And, here we are.

Richie Rich · Jul 2, 2020

LightningZ71 said:
Richie, you just referenced asking an overclocker about Thermal limit and degradation in a thread about 5Ghz being a hard frequency cap for x86, yet it's overclockers that have been demonstrating that, with LN2, x86 can perform at 8+ Ghz, and did this years ago. The limit isn't the x86 instruction set. The limit is both lithography, with respect to how efficient it is at dissipating heat, and how effective it is at preventing electromigration, and with the absolute switching speed of the various transistors coupled with how well the design manages the flow of electrons across its medium. x86 has lived in a world where power is abundant and cooling is effective and ACTIVE, not passive, for all but its lowest power designs. Given that world, they designed for that sort of target, using the fact that the silicon was still capable of reliably switching at high speeds at those higher thermal and power levels on the silicon power/performance curve. Apple ARM has been focusing on staying on the lower end of that curve of their silicon because they don't have active cooling in any of the devices that their chips have been targeted at.

The two different approaches are targeted at two different design targets. Intel learned that you can try to push this too far with the Pentium IV, and had to go wider and slower with the Core architecture. They're realizing that they have to do it again, as their next generation of architectures are also looking to be wider.

Apple can, and likely will, make the decision to optimize future A series processors for higher thermal envelope targets, while ALSO keeping a line for mobile devices that are entirely thermally constrained and passively cooled. There are tweaks that can be made to the libraries used during the development phase for each node that can enhance frequency, or density, or power/thermal profile. While you can also enable higher frequencies for processors by lengthening pipelines, I doubt that Apple will do that to their specific desktop SKUs as it would have other affects on software performance profiling that would incur other costs during software development.

To your point that a power efficient ARM core designed for high frequency being capable of 6Ghz, there's nothing in the instruction set that makes that more or less possible. Could someone design an ARM core that can reach 6Ghz with active cooling? If thermals and power were no object, then sure. However, looking at the leading edge nodes that are out there, that power and thermal curve gets really, really ugly that far over to the right. That's a function of the lithography. Your math completely overlooks the fact that the power/performance curve is just that, a curve, not a straight line. To give you an example, when you look at power vs. clock curves for AMD processors, you see that, for each 100 Mhz over around 4.2Ghz, power consumption and heat generation increase at an ever increasing rate? The processor isn't magically growing new transistors to consume that power, it's just getting that much harder to make the transistors switch faster and faster. You can relax this somewhat by loosening up the design rules for the node, but, there are limits.

Apple, Samsung, Mediatek, they've all been living on the left side of the power/performance curve for well over a decade in an all-in effort to keep power use and thermal output under tight control. You can see some of the late model "high performance" flagship handsets push this curve a bit too far and start to suffer from heat soak and throttling in several of Anandtech's reviews. You sing the praises of ARM without taking into account the targeted platform for most of the designs that you are trotting out as paragons of circuit design virtue.

If you want to keep power and heat in check, you go wider and wider, and add more and more cores at those low clock frequencies to get the work done that you need to. Eventually, you run out of parallel workloads. What then? Why is it that Apple's leading mobile SoCs have had only two HP cores for a while now and 4 LP cores? It's because they optimize for the platform, phone handsets are highly single app focused. You don't need more and more HP cores to make a single app perform. So, if you're only going to have two cores, and you can't go very far to the right on the power/performance curve, the only thing left that you can do is deepen and enlarge the caches as well as throw the kitchen sink at the HP core by widening it as much as is practical to get the IPC sky high. And, here we are.

Too much writing trying to hide garbage under the carpet.

Answer this:

"Double power efficient core designed for high frequency would run around 6 GHz today (twice power efficient core 2^(1/3) = 1.26x higher freq = 5*1.26 = 6.3 Ghz before hitting same thermal limit as inefficient core at 5 GHz)."

Is that true or not?

Hitman928 · Jul 2, 2020

Richie Rich said:
Too much writing trying to hide garbage under the carpet.

Answer this:

"Double power efficient core designed for high frequency would run around 6 GHz today (twice power efficient core 2^(1/3) = 1.26x higher freq = 5*1.26 = 6.3 Ghz before hitting same thermal limit as inefficient core at 5 GHz)."

Is that true or not?

Extremely unlikely.

naukkis · Jul 2, 2020

LightningZ71 said:
As for going wider not being an issue, even with the ever increasing transistor counts for those cores, it very much IS a problem. Those transistors are NOT free. They all represent heat sources, power sinks, AND they continue to increase the relative size of the whole processor. One of the advantages given for new nodes is that you can get more of the same die design off of a given wafer. If you continue to make the CPU larger with respect to transistor count, you negate that advantage. Each node is getting more complex and more expensive per wafer. You have to get higher yield from each wafer to be able to break even. Every additional transistor is also a new potential point of failure, meaning that you are under more and more yield pressure. Again, NO FREE LUNCH.

Yep, for Core2 design Intel stated that they had a rule that anything implemented in silicon would need to be power efficient - something like for 1% more power use they want at least 2% more performance or so.

-> with Sunny Cove they have not used that method. It's extremely deep core - 50% larger ROB compared to Skylake and Zen2 and fastest ARMs which sure increase active transistor count per cycle a lot and many other structures that have grow similarly and yet IPC grow was only 18%. Not exactly surprise that core designed like that isn't very power efficient. And Intel won't stop there, as Keller stated future Cove revisions go as deep as 800 instruction ROB.

-> there might again need to be some serious rethinking at Intel - extracting IPC with insane use of transistors won't bring them anything but useless cores. Like what happened before - too fat IAPX432, Tejas, ev8 and AMD K8 designs were axed by someone that actually simulated those designs to be either too big to be operate at needed frequencies or be too power hungry.

Doug S · Jul 2, 2020

Carfax83 said:
That was my point. I have never heard of a high core count CPU with just an L1 and L2 cache. They always add an L3/L4 or something.

That has nothing to do with high core count requiring L3/L4, it has to do with having the massive transistor budget in the billions required to put many cores on the same CPU. Those transistor budgets have also expanded the levels of cache because you can design the different levels with different specialties, optimized for less latency in the lower levels and greater density in the higher levels.

Yes, it also helps to manage access from all the cores, but that can and has been done with cores having direct access to main memory in the past. But if you have billions of transistors it improves performance if you have a cache help reduce those latencies on data being shared between cores.

This is once again arguing something like "I have never seen a car with airbags that didn't also have antilock brakes". There's a very good reason for that, but it doesn't indicate that airbags only work in a car with ABS.

If you consider "high core count CPUs" to include CPUs designed to go into servers with a lot of sockets from the old days of one CPU core per chip then you will find many such examples of CPUs with only L1 or only L1/L2, a couple of which I provided.

Markfw · Jul 2, 2020

Richie Rich said:
Too much writing trying to hide garbage under the carpet.

Answer this:

"Double power efficient core designed for high frequency would run around 6 GHz today (twice power efficient core 2^(1/3) = 1.26x higher freq = 5*1.26 = 6.3 Ghz before hitting same thermal limit as inefficient core at 5 GHz)."

Is that true or not?

I have refrained from posting here. But in light of the most recent posts, You really need to read what others have said. The biggest thing of what is wrong with your posts, is you can not extrapolate based on small wide cores and big more narrow cores. And low power usage vs high clock speed. Everything you are trying to say is based on extrapolation AND YOU CAN;T DO THAT.

CHADBOGA · Jul 2, 2020

Markfw said:
I have refrained from posting here. But in light of the most recent posts, You really need to read what others have said. The biggest thing of what is wrong with your posts, is you can not extrapolate based on small wide cores and big more narrow cores. And low power usage vs high clock speed. Everything you are trying to say is based on extrapolation AND YOU CAN;T DO THAT.

You will find that HE CAN DO THAT.

Rigg · Jul 2, 2020

This thread had me curious so I downloaded geekbench 5.2 and ran it in windows 10 with some different fixed clocked multipliers on my Ryzen 5 3600.

2.2 Ghz - 310.91 PPC

System manufacturer System Product Name - Geekbench

Benchmark results for a System manufacturer System Product Name with an AMD Ryzen 5 3600 processor.

browser.geekbench.com

2.65 Ghz - 309.06 PPC

System manufacturer System Product Name - Geekbench

Benchmark results for a System manufacturer System Product Name with an AMD Ryzen 5 3600 processor.

browser.geekbench.com

4.55 Ghz (CCX0) 4.5 Ghz (CCX1) 302.42 PPC

System manufacturer System Product Name - Geekbench

Benchmark results for a System manufacturer System Product Name with an AMD Ryzen 5 3600 processor.

browser.geekbench.com

I'm not sure can be drawn from this other than
A) using 4.7 Ghz for the 3950x to calculate the PPC metric is flawed
B) geekbench scales pretty linearly

Clearly Ryzen CPU's don't hold a steady max boost in single thread workloads. Anyone who has owned one could tell you that. This blatant oversight doesn't diminish the fact that the A13 is pretty impressive in this test though. Now I can brag to my friends about how my new I-phone SE slays in geekbench

DrMrLordX · Jul 3, 2020

CHADBOGA said:
You will find that HE CAN DO THAT.

Has he ascended to the level of Internet Strong Man yet?

Richie Rich · Jul 3, 2020

Rigg said:
This thread had me curious so I downloaded geekbench 5.2 and ran it in windows 10 with some different fixed clocked multipliers on my Ryzen 5 3600.

2.2 Ghz - 310.91 PPC

System manufacturer System Product Name - Geekbench

Benchmark results for a System manufacturer System Product Name with an AMD Ryzen 5 3600 processor.

browser.geekbench.com

2.65 Ghz - 309.06 PPC

System manufacturer System Product Name - Geekbench

Benchmark results for a System manufacturer System Product Name with an AMD Ryzen 5 3600 processor.

browser.geekbench.com

4.55 Ghz (CCX0) 4.5 Ghz (CCX1) 302.42 PPC

System manufacturer System Product Name - Geekbench

Benchmark results for a System manufacturer System Product Name with an AMD Ryzen 5 3600 processor.

browser.geekbench.com

I'm not sure can be drawn from this other than
A) using 4.7 Ghz for the 3950x to calculate the PPC metric is flawed
B) geekbench scales pretty linearly

Clearly Ryzen CPU's don't hold a steady max boost in single thread workloads. Anyone who has owned one could tell you that. This blatant oversight doesn't diminish the fact that the A13 is pretty impressive in this test though. Now I can brag to my friends about how my new I-phone SE slays in geekbench

It's NOT flawed. Your 302/Ghz pts vs. 286 pts/GHz in my table is marginal difference (5.5%). There is also big question about memory (frequency, latency). There are many way how to manually tweak system in BIOS to get peak performance while sacrificing stability.

BTW I used 4.6 GHz for 3950X as suggested Andrei during his SPEC testing. It's also mentioned in the table. Please read first before you talk garbage.

Your tweaked 302 pts/GHz doesn't help much. Apple A13 still has 502 pts/GHz light years away from any x86 system. Even with 302 pts/GHz Ryzen cannot move above Apple A9 from 2015. What a shame.

Pos	Man	CPU	Core	Year	ISA	GB5 Score	GHz	PPC (score/GHz)	Relative	Relative
1	Apple	A13	Lightning	2019	ARMv8	1332	2.65	502.64	100%	182%
2	Apple	A12	Vortex	2018	ARMv8	1116	2.53	441.11	88%	159%
3	Apple	A11	Monsoon	2017	ARMv8	933	2.39	390.38	78%	141%
4	ARM Cortex	X1	Hera	2020	ARMv8	1115	3.00	371.69	74%	134%
5	Apple	A10	Hurricane	2016	ARMv8	770	2.34	329.06	65%	119%
6	Intel	1065G7	Icelake	2019	x86-64	1252	3.90	321.03	64%	116%
7	ARM Cortex	A78	Hercules	2020	ARMv8	918	3.00	305.93	61%	111%
8	Apple	A9	Twister	2015	ARMv8	564	1.85	304.86	61%	110%
9	AMD	3950X	Zen 2	2019	x86-64	1317	4.60	286.30	57%	103%
10	ARM Cortex	A77	Deimos	2019	ARMv8	812	2.84	285.92	57%	103%
11	Intel	9900K	Skylake	2018	x86-64	1384	5.00	276.80	55%	100%
12	AMD	1800X	Zen 1	2017	x86-64	1073	3.90	275.13	55%	99%
13	ARM Cortex	A76	Enyo	2018	ARMv8	720	2.84	253.52	50%	92%
14	Intel	4770K	Haswell	2013	x86-64	966	3.90	247.69	49%	89%
15	Apple	A8	Typhoon	2014	ARMv8	323	1.40	230.71	46%	83%
16	Intel	3770K	Ivy Bridge	2012	x86-64	764	3.50	218.29	43%	79%
17	Apple	A7	Cyclone	2013	ARMv8	270	1.30	207.69	41%	75%
18	Intel	2700K	Sandy Bridge	2011	x86-64	723	3.50	206.57	41%	75%
19	ARM Cortex	A75	Prometheus	2017	ARMv8	505	2.80	180.36	36%	65%
20	ARM Cortex	A73	Artemis	2016	ARMv8	380	2.45	155.10	31%	56%
21	Intel	E6600	Core2	2006	x86-64	338	2.40	140.83	28%	51%
22	AMD	FX-8350	BD	2011	x86-64	566	4.20	134.76	27%	49%
23	AMD	Phenom 965 BE	K10.5	2006	x86-64	496	3.70	134.05	27%	48%
24	ARM Cortex	A72	Maya	2015	ARMv8	260	2.00	130.00	26%	47%
25	AMD	Athlon 64 X2 3800+	K8	2005	x86-64	207	2.00	103.50	21%	37%
26	ARM Cortex	A55	Ananke	2017	ARMv8	178	1.80	98.67	20%	36%
27	ARM Cortex	A53	Apollo	2012	ARMv8	148	1.80	82.22	16%	30%
28	Intel	Pentium D	P4	2005	x86-64	228	3.40	67.06	13%	24%

I'm afraid that those graphs shows a massive problems that x86's facing now and in near future:

This year Apple's A14 will take a lead in absolute ST performance
Next year also generic Cortex X2 based on Matterhorn and SVE2 2048-bit capable vectors will reach desktop levels.
Look at the development rate difference. ARM IPs are delivering performance 3x faster that x86.
Look at Ice Lake how delivered 18% more IPC while increasing 38% more transistors - this inefficient/brute force approach resulted in massive clock decrease

.

Richie Rich · Jul 3, 2020

And take a look at CPU frequency evolution. Lazy x86 vendors are stucked in 4-5GHz range for a decade. While ARMs are increasing clocks constantly.

We can extrapolate that in 2-3 years ARM CPUs will reach 4 GHz barrier due to efficient and smart design hence no thermal limitation.
We can extrapolate that x86 CPU will decrease freqency down to 4 GHz due to massive thermal density problems and inability to re-design microarchitecture (inability to change phylosophy of CPU design targets).
Both x86 and ARM running around 4 GHz while ARM having 2x more IPC. No doubt who has a future and who is gonna die like PowerPC.

DrMrLordX · Jul 3, 2020

Richie Rich said:
We can extrapolate

No. That's all you ever do anyway, but no.

x86 CPU will decrease freqency

Certainly not. AMD last lost clockspeed when moving to GF 14nm. They've only gained clockspeed since then, even using dense libraries with Matisse.

CHADBOGA · Jul 3, 2020

DrMrLordX said:
Has he ascended to the level of Internet Strong Man yet?

Absolutely.

No one has ever been stronger when it comes to repeat use of tables with questionable merit.

Antey · Jul 3, 2020

you fed him, now rejoice.

sincerely,
the lurkers.

Richie Rich · Jul 3, 2020

Markfw said:
I have refrained from posting here. But in light of the most recent posts, You really need to read what others have said. The biggest thing of what is wrong with your posts, is you can not extrapolate based on small wide cores and big more narrow cores. And low power usage vs high clock speed. Everything you are trying to say is based on extrapolation AND YOU CAN;T DO THAT.

Mark, Mark, Mark. We all know that you were wrong in Gravion2 thread. And you are wrong again. That equation is based on physical law. I know that some people ignores physical laws and claims the Earth is flat etc. but that's their problem not mine. And with knowledge of laws you can do extrapolation. Whole engineering is based on extrapolation.

Same way most of people was saying Apple's IPC cannot be compared, or SPEC/GeekBench cross-ISA performance comparison is flawed or Apple will never move to ARM because there is no real performance. Well today we know Apple is moving ENTIRE BILLION business to ARM.

x86 world = body-builder competition
ARM world = MMA UFC cage

Intel was earning huge money thanks to monopoly, doing nothing and eating cheeseburgers and they got lazy instead working in a gym. AMD was attending gym every day and finally got in good shape again. But ARM cores are fighting in MMA cage very hard (huge TDP limits, big push for efficiency) and training BJJ three times a day. Every world had it's champion and it was fine when separated. But now when you put body-builder in UFC cage he will receive RNC or arm bar very fast. And that is what's happening right know in servers, and what Apple is doing with his entire billion business. ARM is doing smart and deadly jiu-jitsu moves and x86 has no clue how to defend because simple high frequency brute force doesn't work anymore.

Everyone who train BJJ knows the difference

DrMrLordX · Jul 3, 2020

Richie Rich said:
And with knowledge of laws you can do extrapolation.

This thread is not based on "knowledge of laws".

name99 · Jul 3, 2020

Markfw said:
I have refrained from posting here. But in light of the most recent posts, You really need to read what others have said. The biggest thing of what is wrong with your posts, is you can not extrapolate based on small wide cores and big more narrow cores. And low power usage vs high clock speed. Everything you are trying to say is based on extrapolation AND YOU CAN;T DO THAT.

You don't need extrapolation to
- compare benchmark numbers from existing iPad Pro against existing Macs (from MBA up to iMac Pro and Mac Pro)
- to compare the power numbers in each case

We know how this plays out!
- iPad Pro single threaded beats almost any Intel CPU for most tasks. Intel still has a small advantage if your form factor can sustain running at 5GHz+ (about 4GHz+ for Ice Lake), or if you can make aggressive use of AVX512, or if you make aggressive use of AES.
- iPad Pro scales slightly better than Intel up to the 4+4 level (compared to say Intel 4 core+SMT)

Here's an example:

iPad Pro (12.9-inch 3rd Generation) vs MacBook Air (Early 2020) - Geekbench

browser.geekbench.com

OK, that's state of the art today. Extrapolation comes from
- can Apple add more cores? Why not?
- will adding more cores hurt them? Maybe, but the evidence TODAY is that whatever they need to scale from one to many-core (cache structure, memory controller, NoC, memory ordering constraints, ...) works better than Intel's equivalent on the same problem.
- can Apple boost performance for the A14X/Z? Why not? There's no reason they can't pick up the 20% or so that's already there in the A13 over the A12, presumably they can (out of share laziness) just pick up the iso-power faster transistors of TSMC 5nm.

- can Apple boost performance via IPC? OK, that's the one most risky extrapolation, but cof course the most interesting. If they have say 80% more 5nm transistors, can they do something useful with them. I cannot PROVE that they can. What I can do is say
+ they have done so reliably for quite a few years now
+ there are probably still a few known techniques for improving IPC that they are not yet using. It's hard to be sure because we have no idea quite what they are using today. But there are a constant stream of good ideas (that "just" require more transistors) out of academia for better cache placement/replacement algorithms, better prefetch, better branch prediction, compressed LLC, long term parking, ...
This is even apart from the obvious wins they could (probably have) added like SVE, and AMX (still there on A13, still, apparently unused by anything -- I'm guessing LLVM support is still not yet quite ready for the public.)
+ Apple chose this year to make the ARM switch. They didn't have to do that. They could have delayed. Which suggests they know how well A14X/Z stacks up against Ice Lake (and probably Tiger Lake)...

- can Intel boost performance via frequency? Once again who knows? But once again, look at the tape. If you think Apple's track record is irrelevant, that tomorrow everything could change, and likewise for Intel.
If you think that tomorrow Intel will be shipping 5GHz on 10nm, and in 2022 6GHz on 7nm; meanwhile Apple IPC for A14 will be no more than, say, 5% over A13 sure, go right ahead, tell us that. But hemming and hawing about "you can't extrapolate, no-one knows the future!", give me a break. We ALL KNOW the future is uncertain. But we also all make plans based on reasonable extrapolations...

name99 · Jul 3, 2020

Rigg said:
This thread had me curious so I downloaded geekbench 5.2 and ran it in windows 10 with some different fixed clocked multipliers on my Ryzen 5 3600.

2.2 Ghz - 310.91 PPC

System manufacturer System Product Name - Geekbench

Benchmark results for a System manufacturer System Product Name with an AMD Ryzen 5 3600 processor.

browser.geekbench.com

2.65 Ghz - 309.06 PPC

System manufacturer System Product Name - Geekbench

Benchmark results for a System manufacturer System Product Name with an AMD Ryzen 5 3600 processor.

browser.geekbench.com

4.55 Ghz (CCX0) 4.5 Ghz (CCX1) 302.42 PPC

System manufacturer System Product Name - Geekbench

Benchmark results for a System manufacturer System Product Name with an AMD Ryzen 5 3600 processor.

browser.geekbench.com

I'm not sure can be drawn from this other than
A) using 4.7 Ghz for the 3950x to calculate the PPC metric is flawed
B) geekbench scales pretty linearly

Clearly Ryzen CPU's don't hold a steady max boost in single thread workloads. Anyone who has owned one could tell you that. This blatant oversight doesn't diminish the fact that the A13 is pretty impressive in this test though. Now I can brag to my friends about how my new I-phone SE slays in geekbench

This insistence that "you can't scale IPC across frequencies" has been the last refuge of people who don't like what reality is telling them for a few years now.
Sure, IN THEORY you can't do it. And there are some benchmarks for which this is very obvious. SPEC is specifically designed to include some of these, both benchmarks constrained by memory latency (basically impossible to solve) and by bandwidth (can be solved by more memory controllers, but most CPUs don't do that because memory channels are expensive).

A paper like this

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0220135&type=printable

give it all in huge detail.

But that doesn't change the fact that CACHES WORK! Even for most SPEC code.
That's why we use them, and keep growing them!!!
And to the extent that caches work, you can, in fact, do a pretty good job of maintaining IPC across 2 or 3x frequency range.

name99 · Jul 3, 2020

Richie Rich said:
It's NOT flawed. Your 302/Ghz pts vs. 286 pts/GHz in my table is marginal difference (5.5%). There is also big question about memory (frequency, latency). There are many way how to manually tweak system in BIOS to get peak performance while sacrificing stability.

BTW I used 4.6 GHz for 3950X as suggested Andrei during his SPEC testing. It's also mentioned in the table. Please read first before you talk garbage.

Your tweaked 302 pts/GHz doesn't help much. Apple A13 still has 502 pts/GHz light years away from any x86 system. Even with 302 pts/GHz Ryzen cannot move above Apple A9 from 2015. What a shame.

Pos
Man
CPU
Core
Year
ISA
GB5 Score
GHz
PPC (score/GHz)
Relative
Relative
1
Apple
A13
Lightning
2019
ARMv8
1332
2.65
502.64
100%
182%
2
Apple
A12
Vortex
2018
ARMv8
1116
2.53
441.11
88%
159%
3
Apple
A11
Monsoon
2017
ARMv8
933
2.39
390.38
78%
141%
4
ARM Cortex
X1
Hera
2020
ARMv8
1115
3.00
371.69
74%
134%
5
Apple
A10
Hurricane
2016
ARMv8
770
2.34
329.06
65%
119%
6
Intel
1065G7
Icelake
2019
x86-64
1252
3.90
321.03
64%
116%
7
ARM Cortex
A78
Hercules
2020
ARMv8
918
3.00
305.93
61%
111%
8
Apple
A9
Twister
2015
ARMv8
564
1.85
304.86
61%
110%
9
AMD
3950X
Zen 2
2019
x86-64
1317
4.60
286.30
57%
103%
10
ARM Cortex
A77
Deimos
2019
ARMv8
812
2.84
285.92
57%
103%
11
Intel
9900K
Skylake
2018
x86-64
1384
5.00
276.80
55%
100%
12
AMD
1800X
Zen 1
2017
x86-64
1073
3.90
275.13
55%
99%
13
ARM Cortex
A76
Enyo
2018
ARMv8
720
2.84
253.52
50%
92%
14
Intel
4770K
Haswell
2013
x86-64
966
3.90
247.69
49%
89%
15
Apple
A8
Typhoon
2014
ARMv8
323
1.40
230.71
46%
83%
16
Intel
3770K
Ivy Bridge
2012
x86-64
764
3.50
218.29
43%
79%
17
Apple
A7
Cyclone
2013
ARMv8
270
1.30
207.69
41%
75%
18
Intel
2700K
Sandy Bridge
2011
x86-64
723
3.50
206.57
41%
75%
19
ARM Cortex
A75
Prometheus
2017
ARMv8
505
2.80
180.36
36%
65%
20
ARM Cortex
A73
Artemis
2016
ARMv8
380
2.45
155.10
31%
56%
21
Intel
E6600
Core2
2006
x86-64
338
2.40
140.83
28%
51%
22
AMD
FX-8350
BD
2011
x86-64
566
4.20
134.76
27%
49%
23
AMD
Phenom 965 BE
K10.5
2006
x86-64
496
3.70
134.05
27%
48%
24
ARM Cortex
A72
Maya
2015
ARMv8
260
2.00
130.00
26%
47%
25
AMD
Athlon 64 X2 3800+
K8
2005
x86-64
207
2.00
103.50
21%
37%
26
ARM Cortex
A55
Ananke
2017
ARMv8
178
1.80
98.67
20%
36%
27
ARM Cortex
A53
Apollo
2012
ARMv8
148
1.80
82.22
16%
30%
28
Intel
Pentium D
P4
2005
x86-64
228
3.40
67.06
13%
24%

View attachment 24804

View attachment 24805

I'm afraid that those graphs shows a massive problems that x86's facing now and in near future:

This year Apple's A14 will take a lead in absolute ST performance

Next year also generic Cortex X2 based on Matterhorn and SVE2 2048-bit capable vectors will reach desktop levels.

Look at the development rate difference. ARM IPs are delivering performance 3x faster that x86.

Look at Ice Lake how delivered 18% more IPC while increasing 38% more transistors - this inefficient/brute force approach resulted in massive clock decrease.

I agree with most of what you say, but that last comment:

Ice Lake how delivered 18% more IPC while increasing 38% more transistors - this inefficient/brute force approach resulted in massive clock decrease

is a bad comment.
- Using more transistors is the whole point of the exercise. Transistors are basically free. Intel SHOULD be using more of them. Apple certainly is for every core. And yes, it's a brute force approach, in the sense that doubling the number of transistors only gets you maybe 20% higher IPC. But WHO CARES? Transistors are free! Design for that fact.

- We do not know why Ice Lake runs at lower frequencies. It COULD be because they increased the FO4-depth of each pipeline stage (ie it's a consequence of whatever changes they made to go after higher IPC). But it could also be process-related (the 10nm transistors just cannot switch as fast the 14nm++ transistors). Or it could be thermal-related (the transistors can switch faster if you give them more power, but Intel right now would prefer to present these cores as appropriate for mobile, and so is not targeting them at those higher thermal levels).

This is, of course, the same argument we have about Apple's cores. Right now today, if I pumped more voltage into an A12Z, could it run at 4GHz (ie the transistors are capable of that switching speed; just run very very hot)? Or would it run very very hot but with negligible frequency improvement? No-one outside Apple and TSMC knows.
What we DO know (because we have the frequency/power curves from a variety of CPUs, including Apple's) is just how fast power rises to eke out minor frequency gains, which is why I (and you) keep pushing the point that IPC is a more sustainable route to performance for the foreseeable future.

Rigg · Jul 3, 2020

Richie Rich said:
It's NOT flawed. Your 302/Ghz pts vs. 286 pts/GHz in my table is marginal difference (5.5%). There is also big question about memory (frequency, latency). There are many way how to manually tweak system in BIOS to get peak performance while sacrificing stability.

BTW I used 4.6 GHz for 3950X as suggested Andrei during his SPEC testing. It's also mentioned in the table. Please read first before you talk garbage.

My bad on the 4.7 thing. Still it seems you don't know what speed the 3950x core was running at in your table and you are just guessing. Do you know what speed the MCLK and FCLK were running in your data? If this a "big question" maybe you should factor it in.

5.5 % is more than the difference between the 1800x and the 3950x in your table. That alone should bring into question the usefulness of your table and geekbench PPC as a metric.

Your tweaked 302 pts/GHz doesn't help much. Apple A13 still has 502 pts/GHz light years away from any x86 system. Even with 302 pts/GHz Ryzen cannot move above Apple A9 from 2015. What a shame.

Which I fully acknowledged at the end of my post.

Rigg said:
This blatant oversight doesn't diminish the fact that the A13 is pretty impressive in this test though. Now I can brag to my friends about how my new I-phone SE slays in geekbench

I bought an Iphone SE in part because it has an A13. Please read first before you talk garbage.

Thunder 57 · Jul 3, 2020

Holy hell this thread has gone to hell.

Richie Rich said:
View attachment 24805

Where do you get this ~~shit~~ stuff? Do you really believe Ice Lake has less ST performance than Skylake? I thought the internet strongman was juanrga, but at least he seems to have gone away. I guess you didn't read what I suggested. Apple goes all in for ST performance. AMD and Intel build server chips which don't rely on ST performance. Yet you still seem to believe Apple can scale up to the server market and kill x86.

I will quote from that article again:

At the same time, however, Apple isn’t shifting to ARM in a year, the way it did with x86 chips. Instead, Apple hopes to be done within two years. One way to read this decision is to see it as a reflection of Apple’s long-term focus on mobile. Scaling a 3.9W iPhone chip into a 15-25W laptop form factor is much easier than scaling it into a 250W TDP desktop CPU socket with all the attendant chipset development required to support things like PCIe 4.0 and standard DDR4 / DDR5 (depending on launch window).

SAAA · Jul 3, 2020

Bening real here I'd consider those two things Richie:

1) x86 vendors in your charts go too far into the past, they both suffered fails such as being stuck on 14 nm for way too long on Intel side and being stuck on a bad arch with extremely low IPC on AMD side. Looking only at the past two years and the next two would change things considerably, particularly the steepness of the curves would go closer to ARM if you consider Zen1-Zen4 for AMD and Ice lake to Ocean cove for Intel.

2)While I do believe in the advantage ARM cores have for the time being, especially Apple's big lead in IPC, there isn't really any reason to prevent other teams to catch up and do the same tricks, especially now that it's been proved possible and with the competitive push that will grow with desktops/laptops running ARM silicon in the coming years.

With that said, 100% IPC above Skylake a thing with A14? Cool: then I can only wait for Intel answer after 5 years of slumber, actually close to 6, not to talk about whatever AMD might come up with if they keep the pace up with Zen and their engineering renaissance.

As for the argument of clocks I don't buy x86 being stuck: stock speeds have been slowly growing since the fall from P4 to core, with few exception being initial process issues, but never arch related.
It looks like there's a wall at 5GHz... not, currently passed even by the then "fat" Skylake core. Tiger lake samples on 10 nm have leaked with 4.7 GHz speeds (and +20-25% IPC possibly) not bad for a dead node with initial speeds of 3 GHz on Cannon lake...

What I'm saying is I can see future 3GHz GHz apple cores with 200% Skylake IPC (so old it becomes a metric now xD) but also 5GHz Golden coves with 150% IPC and Zen 3 about the same. You do the math an tell me who still leads on absolute performance then, 2022 time.

Cardyak · Jul 3, 2020

name99 said:
I agree with most of what you say, but that last comment:

Ice Lake how delivered 18% more IPC while increasing 38% more transistors - this inefficient/brute force approach resulted in massive clock decrease

is a bad comment.
- Using more transistors is the whole point of the exercise. Transistors are basically free. Intel SHOULD be using more of them. Apple certainly is for every core. And yes, it's a brute force approach, in the sense that doubling the number of transistors only gets you maybe 20% higher IPC.

Indeed, I seem to remember hearing about an observation in microarchitecture design.

On average if you increase the number of transistors a designer has available, the IPC gain that can be obtained from using these transistors is normally equal to the square root of the transistor increase.

So for example:

2x transistors (1 node shrink) = ~42% IPC increase
4x transistors (2 node shrinks) = ~100% IPC increase (2x)
16x transistors (4 node shrinks) = ~300% IPC Increase (4x)

There are clearly some caveats to this:

- It is not a hard and fast rule. Some designers are more competent than others, and may be able to extract more or less IPC than their competitors.
- Heat is still an issue, particularly on smaller and smaller nodes. If the transistor density doubles but heat is only reduced by 30%, then utilising all of the extra transistors isn’t really viable (Unless you are prepared to reduce clock speeds)
- This observation may not hold in the future, the process of going wider and deeper in core design may hit diminishing returns and become more inefficient moving forward.

Not sure if this is something others have heard of (Or even if this square root observation is true) but it seems to be correct looking at the history of microarchitectures, and it would certainly explain why a 38% increase in transistors for Sunny Cove resulted in only an 18% IPC uplift.

Info TOP 20 of the World's Most Powerful CPU Cores - IPC/PPC comparison

Senior member

Senior member

Golden Member

Golden Member

Senior member

Diamond Member

Senior member

Platinum Member

Moderator Emeritus, Elite Member

Platinum Member

Senior member

Lifer

Senior member

Senior member

Lifer

Platinum Member

Member

Senior member

Lifer

Senior member

Senior member

Senior member

Senior member

Platinum Member

Senior member

Member