Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	12 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop & Mobile H&HX	Mobile U Only	Mobile H
Process Node	Intel 4	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	8P + 16E	4P + 4E	4P + 8E
LLC	24 MB	36 MB ?	12 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

511 · Oct 13, 2024

Iirc glenn Hilton is still working at Intel on the next major thing he was one of the leads of Nehlam and he also made the ring bus at this time intel need a new interconnect fabric it has been working fine but it has limitations

Hulk · Oct 13, 2024

This is kind of interesting.
Intel claims for Skymont over Gracemont:
+32% for ST Int workloads
+72% for ST fp workloads

+32% for MT Int workloads
+55% for MT fp workloads

So integer workload IPC increases are scaling from ST to MT but floating point workload IPC increase is not scaling from ST to MT. +72% ST fp but only +55% MT fp.

Could the memory subsystem be more constraining for fp workoads vs. int workloads, thus holding back fp MT scaling a bit?

Could this also tie into the decreasing IPC performance of Gracemont in CB R24 as additional cores are added?

The final piece of information I can provide is that Gracemont IPC does not increase much as cores are added in CB R23 as compared to CB R24. Is it possible CB R24 is heavier on fp than R23?

This is the evidence. I'm interested to see what the big brains here make of it.

Quick edit. One final piece of "evidence." From what I'm seeing Skymont is not showing the enormous IPC increase in CB R23 that it is in CB R24. I'm seeing about +54% in R24 vs +35% in R23. Based on Skymonts bigger fp IPC increase, could this imply that CB R24 is more floating point heavy than R23?

AMDK11 · Oct 13, 2024

DavidC1:

No way.

-Instruction length data stored in L1i $
-Clustered decode
-OD-ILD, which decodes instruction length on the fly

-Very wide Retire, which is not just an attempt to just widen it but done because this also allows other structures to be decreased. An ideological difference of carefully adding and taking out as needed for area/power efficiency, unlike the P cores.
-More Store ALUs than Load ALUs. My guess is this benefits the clustered decode.
-Fast path hardware which can be power hungry versus Nanocode, which is having specific microcoded ROM added to each clusters. So it won't become suddenly 10x fast, but now it won't block other decode plus it can be parallelized. If you were ok with having an instruction that's dog slow, then you don't suddenly need it to be 10x faster than anything else. This is also a careful balance, unlike the brute force approach of P cores.
-The E cores go for many simple units made for a specific tasks over few powerful units that can do more, like on the P cores.

Nobody wrote that LionCove is the same in detail as Skymont. I wrote that now LionCove is more similar to Skymont in terms of separate schedules for FP and separate for Integer.

Suddenly you gave details of what is more in Skymont and without hesitation you wrote that LionCove is the same as RaptorCove, which is not true. What is it but adding resources in Skymont according to your reasoning that something is only added and in something else it is added for a deeper philosophy.

Compared to Gracemont, Skymont mainly adds resources (according to your reasoning), including 3 decoding clusters (instead of 2 clusters (+50%)), ROB 416 (instead of 256 (+62.5%)), 4x FP128bit (instead of 2x FP128bit(+2x) ), 8x ALU (instead of 4xALU (+2x)), wider schedules, larger physical register files, etc. What is it if not mainly adding resources?

Skymont is optimized to have the smallest and energy-efficient core possible, but executes a large amount of simpler code, which involves design decisions. It is intended to take up as little space as possible on the matrix and offer a good performance-to-surface ratio. They can be easily spammed at a low cost of space, significantly increasing overall MT performance.

LionCove is a heavy core designed to execute heavy code as efficiently as possible.

Two different philosophies that contradict each other, but in the end you always have to strike some balance. For this reason, both teams were combined because these conditions cannot be met if each team has a free hand. The cores in many respects have to be quite consistent mainly because of ISA. This is what Keller at Intel insisted on. If the e-Core team has freedom and independence in design and wants to connect P-Core with e-Core on one matrix, you can only expect problems such as in AlderLake and blocking of AVX512 after some time. Combining the teams will give you more consistency and you can better fit both types of cores into a given generation without ISA mess.

Can anyone tell me which part of Lion Cove, or any P core since Sandy Bridge for that matter was anything more than just add, add, add? Bigger units, more units, that has been the case since Haswell. Even Sandy Bridge for that matter had a core 50% larger than the predecessor. The saving grace for that chip was there were innovations and wasn't just pure expansions.

Where does the idea come from that the Sandy Bridge core is 50% larger than the Westmere core? Sandy Bridge was the first to add a decoded microinstruction cache (UOP L0), a physical register file, a 256-bit FP vector, and a ring bus.

Each generation introduced optimizations, not just resources as you say. Adding resources means new control logic and new/redesigned algorithms. It's not just an accessory as you claim. Haswell has added, for example, a new generation TAGE predictor, etc. Read about the details of changes and optimizations in the core microarchitecture. From Conroe to Skylake, resource additions were quite conservative. These were mainly optimization/reconstruction with relatively conservative expansion. Only Haswell added a fourth ALU port.

The E core team also doesn't shy away from changing things drastically. From Goldmont to Tremont, it had the L2 predecode cache. In Gracemont they took it out entirely and replaced it with the OD-ILD.

Since the L2 predecode was a rather large 128KB SRAM, taking it out was an efficient choice, and OD-ILD isn't limited by low hitrate on large datasizes, meaning it performs better. It's quite amazing to me that they took out an entire feature, added a new one while delivering substantial performance gain overall.

These are still optimizations and design choices that save or replace more complex logic. This is nothing new.

Such optimizations have been, are and will be. Now, for a given moment, one chooses such a solution and then in future generations it is changed to the previous one or even another one if it is more beneficial at a given time.

In each subsequent generation of microarchitects, in addition to adding resources, optimization and reconstruction are always used. This is not always a revolution and does not always have to bring huge profits to IPC.

Even in the case of chipsandcheese, they do not rule out that Zen5, which currently offers a clustered decoder, AMD may return to a single wide one in the future.

naukkis · Oct 13, 2024

AcrosTinus said:
I think it is just one of the core negatives of the ringbus. The more stops it has, the worse it performs.

In Raptor Cove ringbus is little bit stupidly configured - E-cores are in one side of it. When E-cores have heavy traffic part of it has to come past whole ring of P-cores, and as one ring stop serves 4 E-cores they are more starved than P-cores. That changed with Arrowlake - E-cores are distributed evenly with P-cores to balance ring bus traffic.

Josh128 · Oct 13, 2024

Josh128 said:
Honestly, other than the perf uplifts, he got the launch timeframe right. Was he right about the iGPU? As far as the 8+32 thing, it doesnt look like that will happen-- but the fact that Intel is using "285K" as its initial top SKU seems to indicate that a "295K" is either planned or was planned. A completely new and larger die just for a halo SKU for more MT seems like a giant waste of engineering resources and silicon though if 285K already mostly beats 9950X in MT.

So there actually may have been something to that at some point, or still could be.

Right on cue.

https://videocardz.com/newz/intel-lists-core-ultra-295k-sku-successor-to-14900ks-or-just-a-typo

Hulk · Oct 13, 2024

AMDK11 said:
DavidC1:

Nobody wrote that LionCove is the same in detail as Skymont. I wrote that now LionCove is more similar to Skymont in terms of separate schedules for FP and separate for Integer.

Suddenly you gave details of what is more in Skymont and without hesitation you wrote that LionCove is the same as RaptorCove, which is not true. What is it but adding resources in Skymont according to your reasoning that something is only added and in something else it is added for a deeper philosophy.

Compared to Gracemont, Skymont mainly adds resources (according to your reasoning), including 3 decoding clusters (instead of 2 clusters (+50%)), ROB 416 (instead of 256 (+62.5%)), 4x FP128bit (instead of 2x FP128bit(+2x) ), 8x ALU (instead of 4xALU (+2x)), wider schedules, larger physical register files, etc. What is it if not mainly adding resources?

Skymont is optimized to have the smallest and energy-efficient core possible, but executes a large amount of simpler code, which involves design decisions. It is intended to take up as little space as possible on the matrix and offer a good performance-to-surface ratio. They can be easily spammed at a low cost of space, significantly increasing overall MT performance.

LionCove is a heavy core designed to execute heavy code as efficiently as possible.

Two different philosophies that contradict each other, but in the end you always have to strike some balance. For this reason, both teams were combined because these conditions cannot be met if each team has a free hand. The cores in many respects have to be quite consistent mainly because of ISA. This is what Keller at Intel insisted on. If the e-Core team has freedom and independence in design and wants to connect P-Core with e-Core on one matrix, you can only expect problems such as in AlderLake and blocking of AVX512 after some time. Combining the teams will give you more consistency and you can better fit both types of cores into a given generation without ISA mess.

View attachment 109380
Where does the idea come from that the Sandy Bridge core is 50% larger than the Westmere core? Sandy Bridge was the first to add a decoded microinstruction cache (UOP L0), a physical register file, a 256-bit FP vector, and a ring bus.

Each generation introduced optimizations, not just resources as you say. Adding resources means new control logic and new/redesigned algorithms. It's not just an accessory as you claim. Haswell has added, for example, a new generation TAGE predictor, etc. Read about the details of changes and optimizations in the core microarchitecture. From Conroe to Skylake, resource additions were quite conservative. These were mainly optimization/reconstruction with relatively conservative expansion. Only Haswell added a fourth ALU port.

These are still optimizations and design choices that save or replace more complex logic. This is nothing new.

Such optimizations have been, are and will be. Now, for a given moment, one chooses such a solution and then in future generations it is changed to the previous one or even another one if it is more beneficial at a given time.

In each subsequent generation of microarchitects, in addition to adding resources, optimization and reconstruction are always used. This is not always a revolution and does not always have to bring huge profits to IPC.

Even in the case of chipsandcheese, they do not rule out that Zen5, which currently offers a clustered decoder, AMD may return to a single wide one in the future.

Good info, thanks for taking the time to post.

Honestly, I thought that Lion Cove would be architecturally closer to Raptor Cove rather than Skymont because it seems to be the latest evolution of the fundamental "Core" design while Skymont seems to be an evolution of the fundamental "Atom" design. Obviously there is a lot more to it as you have noted than that and these designs are many iterations removed from their progenitors.

As with most nuanced discussions there seems to be more to this than meets the eye.

Hulk · Oct 13, 2024

Josh128 said:
Right on cue.

https://videocardz.com/newz/intel-lists-core-ultra-295k-sku-successor-to-14900ks-or-just-a-typo

I hope this isn't true as I would have hoped Intel would have learned their lesson regarding pushing clocks and silicon to the point of degradation.

Wolverine2349 · Oct 13, 2024

Hulk said:
I hope this isn't true as I would have hoped Intel would have learned their lesson regarding pushing clocks and silicon to the point of degradation.

Well does not necessarily mean degradation.

They had KS with 12th Gen. And it was pushed hard and fine.

The issue is with 13th and 14th Gen. Being pushed too hard and worse yet some design flaw were they degrade easily and or have stability issues right off thr bat and pushing it hard makes it worse of course.

Intel 12th Gen and prior all the way down to CPUs after PIII 1.13 to 1.3GHz were stable and good. And Arrow Lake unlikely to have 13th and 14th Gen problems as Intel would commit suicide to have not learned stability lesson is above all else after RPL debacle.

Hulk · Oct 13, 2024

Wolverine2349 said:
Well does not necessarily mean degradation.

They had KS with 12th Gen. And it was pushed hard and fine.

The issue is with 13th and 14th Gen. Being pushed too hard and worse yet some design flaw were they degrade easily and or have stability issues right off thr bat and pushing it hard makes it worse of course.

Intel 12th Gen and prior all the way down to CPUs after PIII 1.13 to 1.3GHz were stable and good. And Arrow Lake unlikely to have 13th and 14th Gen problems as Intel would commit suicide to have not learned stability lesson is above all else after RPL debacle.

True. I was having a little fun at Intel's expense. I hope they keep things under control.

511 · Oct 13, 2024

Intresting comments at semiwiki going on saying Intel has not placed order for N2 looks like they feel pretty confident for 18A

Intel Shifts to TSMC for Chip Production: A Looming Threat for AMD?

. ...Or A Looming Truth For Intel? Intel’s first processor using rival TSMC’s technology, the Lunar Lake, has officially launched, intensifying the competition with AMD. According to a recent report by TechNews, third-party testing has confirmed Intel’s claims: Lunar Lake is indeed the most...

semiwiki.com

cannedlake240 · Oct 13, 2024

511 said:
Intresting comments at semiwiki going on saying Intel has not placed order for N2 looks like they feel pretty confident for 18A

Intel Shifts to TSMC for Chip Production: A Looming Threat for AMD?

. ...Or A Looming Truth For Intel? Intel’s first processor using rival TSMC’s technology, the Lunar Lake, has officially launched, intensifying the competition with AMD. According to a recent report by TechNews, third-party testing has confirmed Intel’s claims: Lunar Lake is indeed the most...

semiwiki.com

With Intel wait until it's 2 weeks before launch to be certain

511 · Oct 13, 2024

cannedlake240 said:
With Intel wait until it's 2 weeks before launch to be certain

Bruh you can't design in 2 weeks if you want fab capacity you need to book couple of year in advance

OneEng2 · Oct 13, 2024

AMDK11 said:
DavidC1:

Nobody wrote that LionCove is the same in detail as Skymont. I wrote that now LionCove is more similar to Skymont in terms of separate schedules for FP and separate for Integer.

Suddenly you gave details of what is more in Skymont and without hesitation you wrote that LionCove is the same as RaptorCove, which is not true. What is it but adding resources in Skymont according to your reasoning that something is only added and in something else it is added for a deeper philosophy.

Compared to Gracemont, Skymont mainly adds resources (according to your reasoning), including 3 decoding clusters (instead of 2 clusters (+50%)), ROB 416 (instead of 256 (+62.5%)), 4x FP128bit (instead of 2x FP128bit(+2x) ), 8x ALU (instead of 4xALU (+2x)), wider schedules, larger physical register files, etc. What is it if not mainly adding resources?

Skymont is optimized to have the smallest and energy-efficient core possible, but executes a large amount of simpler code, which involves design decisions. It is intended to take up as little space as possible on the matrix and offer a good performance-to-surface ratio. They can be easily spammed at a low cost of space, significantly increasing overall MT performance.

LionCove is a heavy core designed to execute heavy code as efficiently as possible.

Two different philosophies that contradict each other, but in the end you always have to strike some balance. For this reason, both teams were combined because these conditions cannot be met if each team has a free hand. The cores in many respects have to be quite consistent mainly because of ISA. This is what Keller at Intel insisted on. If the e-Core team has freedom and independence in design and wants to connect P-Core with e-Core on one matrix, you can only expect problems such as in AlderLake and blocking of AVX512 after some time. Combining the teams will give you more consistency and you can better fit both types of cores into a given generation without ISA mess.

View attachment 109380
Where does the idea come from that the Sandy Bridge core is 50% larger than the Westmere core? Sandy Bridge was the first to add a decoded microinstruction cache (UOP L0), a physical register file, a 256-bit FP vector, and a ring bus.

Each generation introduced optimizations, not just resources as you say. Adding resources means new control logic and new/redesigned algorithms. It's not just an accessory as you claim. Haswell has added, for example, a new generation TAGE predictor, etc. Read about the details of changes and optimizations in the core microarchitecture. From Conroe to Skylake, resource additions were quite conservative. These were mainly optimization/reconstruction with relatively conservative expansion. Only Haswell added a fourth ALU port.

These are still optimizations and design choices that save or replace more complex logic. This is nothing new.

Such optimizations have been, are and will be. Now, for a given moment, one chooses such a solution and then in future generations it is changed to the previous one or even another one if it is more beneficial at a given time.

In each subsequent generation of microarchitects, in addition to adding resources, optimization and reconstruction are always used. This is not always a revolution and does not always have to bring huge profits to IPC.

Even in the case of chipsandcheese, they do not rule out that Zen5, which currently offers a clustered decoder, AMD may return to a single wide one in the future.

Super cool post. Thanks!

So, in your opinion, Lion Cove is more like Skymont than Raptor? More than anything, I wonder at Intel's divergent design model vs AMD's "strip down the big core" model. Sure, AMD also uses different transistors for its Zen 5c cores, but from the OS point of view, they are much the same.

For Intel and the OS, it seems like the Intel model is more complex and more difficult to achieve. It may well have design superiority in a number of instances though. What are your thoughts?

Hulk said:
Good info, thanks for taking the time to post.

Honestly, I thought that Lion Cove would be architecturally closer to Raptor Cove rather than Skymont because it seems to be the latest evolution of the fundamental "Core" design while Skymont seems to be an evolution of the fundamental "Atom" design. Obviously there is a lot more to it as you have noted than that and these designs are many iterations removed from their progenitors.

As with most nuanced discussions there seems to be more to this than meets the eye.

Agree. Plenty of details to be sifted through here. I am looking forward to the detailed reviews of Arrow Lake on the 24th!

Hulk said:
I hope this isn't true as I would have hoped Intel would have learned their lesson regarding pushing clocks and silicon to the point of degradation.

Tajus again . Ok, not quite so bad maybe.

Intel has a history of pushing the clock speeds though. Pentium III, P4, Raptor. This strategy served them well in the high performance desktop market for decades.

As we now live in a very different time where the lions share of growth and profit are from massively parallel processing data center machines, it is very clear that this old strategy will no longer work. It is for this reason that I believe Intel is on the right track with power efficiency and core density vs max single core performance. IMO, if they have to suffer for a generation to make the switch, it would be well worth it in the long run. The latest GNR vs Turin reviews seem to bear this out in spades.

cannedlake240 said:
With Intel wait until it's 2 weeks before launch to be certain

I am pretty sure Intel is going to launch 18A. Pat G has bet his career on it. There are tons of levers that can be moved in order for Intel to declare 18A is "Launched"; however, moving these levers will affect yields, density, and thermal performance.

Still, even if process sacrifices have to be made, 18A may well be enough for Intel to get back into the IDM business and produce a product that is on-par with AMD and TSMC. To me, this is by far the most important inflection point at Intel in the next 5 years.

DavidC1 · Oct 13, 2024

AMDK11 said:
DavidC1:

Nobody wrote that LionCove is the same in detail as Skymont. I wrote that now LionCove is more similar to Skymont in terms of separate schedules for FP and separate for Integer.

The separate scheduler is a minor detail compared to the rest. I would say in that aspect Lion Cove is more similar to Zen 4.

AMDK11 said:
Suddenly you gave details of what is more in Skymont and without hesitation you wrote that LionCove is the same as RaptorCove, which is not true. What is it but adding resources in Skymont according to your reasoning that something is only added and in something else it is added for a deeper philosophy.

If Golden and Lion were well designed it wouldn't have been so bloated compared to the competitor Zen 4. Zen 3 did quite well too.

AMDK11 said:
Compared to Gracemont, Skymont mainly adds resources (according to your reasoning), including 3 decoding clusters (instead of 2 clusters (+50%)), ROB 416 (instead of 256 (+62.5%)), 4x FP128bit (instead of 2x FP128bit(+2x) ), 8x ALU (instead of 4xALU (+2x)), wider schedules, larger physical register files, etc. What is it if not mainly adding resources?

Despite literally doubling the FP block, the size is still quite small. In fact the worst case scenario I'd estimate it's nearly 1:1 ratio between scalar performance improvement and core size. It's the actual result that matters and clearly there are many low level details that make a big difference.

The difference is, Skymont gets a tremendous result from adding them, so it actually mattered. Unlike P cores of the past.

Also while people harp on about the number of ALUs being doubled on Skymont, they are simpler ALUs and thus quite cheap in terms of power and area, as said by the lead architect himself.

AMDK11 said:
Skymont is optimized to have the smallest and energy-efficient core possible, but executes a large amount of simpler code, which involves design decisions. It is intended to take up as little space as possible on the matrix and offer a good performance-to-surface ratio. They can be easily spammed at a low cost of space, significantly increasing overall MT performance.

At 3x the core size difference I'd say there's more than design targets. One design sucks and one is really good. It's Prescott vs Conroe all over again.

AMDK11 said:
Two different philosophies that contradict each other, but in the end you always have to strike some balance. For this reason, both teams were combined because these conditions cannot be met if each team has a free hand. The cores in many respects have to be quite consistent mainly because of ISA.

Sure, but the P core team/design clearly sucks. Philosophies are one thing but results say way more.

AMDK11 said:
This is what Keller at Intel insisted on. If the e-Core team has freedom and independence in design and wants to connect P-Core with e-Core on one matrix, you can only expect problems such as in AlderLake and blocking of AVX512 after some time.

The leakers/rumors also say it was the E core team that was listening to Keller and the P core team got quite arrogant about their success. Not just that they were actively preventing anyone else from taking their job of being the alpha dog core basically.

AMDK11 said:
Where does the idea come from that the Sandy Bridge core is 50% larger than the Westmere core? Sandy Bridge was the first to add a decoded microinstruction cache (UOP L0), a physical register file, a 256-bit FP vector, and a ring bus.

AMDK11 said:
These were mainly optimization/reconstruction with relatively conservative expansion. Only Haswell added a fourth ALU port.

I got that wrong but post-Sandy were an era of disappointingly small increases. The fact that we're getting to such small improvements in Moore's Law and they would have had lot easier time back then yet the improvements were barely 10% means even in their best days the P core team wasn't anything special.

Again people say it's because the P core team were arrogant and believed nothing could beat them.

AMDK11 said:
Even in the case of chipsandcheese, they do not rule out that Zen5, which currently offers a clustered decoder, AMD may return to a single wide one in the future.

Chips and Cheese is oddly optimistic about Lion Cove. But they have been for all the cores. The author is a glass half full guy.

I don't really care about how they expanded branch capability by 8x, or added another ALU, or such uarch details more than that it delivered a meagre 9% advancement. It is the results that speak for itself.

Intel claims removing HT could reduce area by 15% and it's on N3B, the process that was said to be not very good but quite dense, with core area potentially 50%+ more dense, and yet despite all that it's about the size of Zen 5.

I am confused at how Lion Cove and the P core design can have a redeeming factor when the differences is that big and the end results are nearly identical with perf/clock similar and peak clock identical to Zen 5. Lion Cove is a failure akin to Prescott. Not even Intel talks about the Lion Cove portion being a positive anymore. The Intel guy on the video about Arrowlake just went straight to talking about Skymont.

511 · Oct 13, 2024

Totally agree P core needs to be started from grounds up cause the PPA is horrendous and it drags GNR Down as well Zen5 has a better architecture on similiar node Mont while start from lower baseline but have optimized PPA For mobile and Server where it matters the most

Doug S · Oct 13, 2024

511 said:
Intresting comments at semiwiki going on saying Intel has not placed order for N2 looks like they feel pretty confident for 18A

There have been other rumors claiming Intel along with Apple will be a lead customer for N2. Nobody who knows for sure would be able to make a public statement, so rumors from Digitimes and random Semiwiki posters are all we can go by.

There are some facts we know however. If they wanted to be a "lead customer" for a new TSMC node, they'd have to get that order in a couple years in advance. Not only that, there would be some sort of minimum order commitment, and a financial penalty for order cancellation (something Intel isn't going to want to be on the hook for given their current finances) The lead time required means they would have to place that order before they could know for sure whether 18A would meet objectives in terms of performance and start of mass production at economically feasible yield.

So normally I would say there's no way Intel didn't place some N2 orders, and design for both 18A and N2 as a hedge. However, Gelsinger did say he's "bet the company" on 18A, so maybe we should take him literally. As in "if 18A doesn't work, Intel won't have anything newer than Arrow Lake to sell".

DavidC1 · Oct 13, 2024

This is what David Huang says(translated):

It is difficult to imagine that Lion Cove predicted in the branch three years after Golden Cove's release that this most important aspect for CPU not only did not improve, but also went backwards.

Taking into account the various trends in the development of the Atom micro-architecture mentioned above, I think Atom has shown the potential to become the main nuclear-micro architecture in at least some aspects of this generation.

It has to be said that it is really awkward to see two teams with such a large gap in the same company. Maybe this is the characteristic of Intel.

That's a longwinded way of saying E core design capability >>> P core design

AMDK11 · Oct 13, 2024

DavidC1 said:
The separate scheduler is a minor detail compared to the rest. I would say in that aspect Lion Cove is more similar to Zen 4.

If Golden and Lion were well designed it wouldn't have been so bloated compared to the competitor Zen 4. Zen 3 did quite well too.

Despite literally doubling the FP block, the size is still quite small. In fact the worst case scenario I'd estimate it's nearly 1:1 ratio between scalar performance improvement and core size. It's the actual result that matters and clearly there are many low level details that make a big difference.

Also while people harp on about the number of ALUs being doubled on Skymont, they are simpler ALUs and thus quite cheap in terms of power and area, as said by the lead architect himself.

At 3x the core size difference I'd say there's more than design targets. One design sucks and one is really good. It's Prescott vs Conroe all over again.

Sure, but the P core team/design clearly sucks. Philosophies are one thing but results say way more.

The leakers/rumors also say it was the E core team that was listening to Keller and the P core team got quite arrogant about their success. Not just that they were actively preventing anyone else from taking their job of being the alpha dog core basically.

I got that wrong but post-Sandy were an era of disappointingly small increases. The fact that we're getting to such small improvements in Moore's Law and they would have had lot easier time back then yet the improvements were barely 10% means even in their best days the P core team wasn't anything special.

Again people say it's because the P core team were arrogant and believed nothing could beat them.

Chips and Cheese is oddly optimistic about Lion Cove. But they have been for all the cores. The author is a glass half full guy.

I don't really care about how they expanded branch capability by 8x, or added another ALU, or such uarch details more than that it delivered a meagre 9% advancement. It is the results that speak for itself.

Intel claims removing HT could reduce area by 15% and it's on N3B, the process that was said to be not very good but quite dense, with core area potentially 50%+ more dense, and yet despite all that it's about the size of Zen 5.

I am confused at how Lion Cove and the P core design can have a redeeming factor when the differences is that big and the end results are nearly identical with perf/clock similar and peak clock identical to Zen 5. Lion Cove is a failure akin to Prescott. Not even Intel talks about the Lion Cove portion being a positive anymore. The Intel guy on the video about Arrowlake just went straight to talking about Skymont.

Bloated because it can run at 5.7 GHz. look at Zen4c which while maintaining the same IPC as Zen4, is much smaller than Zen4.

The same applies to Skymont, which has a much simpler design and achieves lower clock speeds, thanks to which it can occupy a much smaller area. In other words, Skymont has a much denser logical packing per mm2.

And now Skymont from a different angle. Gracemont without HT and at a much lower clock speed has roughly the IPC of Skylake from 2015. Skymont has 32% higher IPC INT and 70% higher FP after 9 years of catching up to GoldenCove from 2021! Do you still say it's a breakthrough?

The year is 2024 and Skymont is at the level of the 2021 GoldenCove IPC, with a much lower clock speed and no HT.

I don't think it's as groundbreaking as you make it out to be. Sure, Skymont with 16 cores + 8 P-Core cores gives a very efficient processor in total. But painting Skymont as a revolution is a bit weak. Wait for ArrowLake's independent testing and then we'll make our final assessment.

However, despite some leaks, I believe that E-Core and P-Core will be consistent and adapted to specific geometries in the future.

cannedlake240 · Oct 13, 2024

DavidC1 said:
Again people say it's because the P core team were arrogant and believed nothing could beat them.

And then Apple launched A7 Zen design probably started around that time too. Intel probably thought CPUs and Moores law has peaked and they won. Hence the not believing in EUV viability, lackluster CPU improvements

AMDK11 · Oct 13, 2024

511 said:
Totally agree P core needs to be started from grounds up cause the PPA is horrendous and it drags GNR Down as well Zen5 has a better architecture on similiar node Mont while start from lower baseline but have optimized PPA For mobile and Server where it matters the most

Contrary to what others say, including DavidC1, I argue that LionCove is the first major redesign in the transition from a monolithic to a tiled structure, and therefore loses latency between tiles and, in particular, access to the RAM controller. LionCove is the first project to use new design tools and methods along with reducing overall power consumption for ArrowLake. We will evaluate this in independent tests.

OneEng2 · Oct 13, 2024

DavidC1 said:
Despite literally doubling the FP block, the size is still quite small. In fact the worst case scenario I'd estimate it's nearly 1:1 ratio between scalar performance improvement and core size. It's the actual result that matters and clearly there are many low level details that make a big difference.

The difference is, Skymont gets a tremendous result from adding them, so it actually mattered. Unlike P cores of the past.

Also while people harp on about the number of ALUs being doubled on Skymont, they are simpler ALUs and thus quite cheap in terms of power and area, as said by the lead architect himself.

At 3x the core size difference I'd say there's more than design targets. One design sucks and one is really good. It's Prescott vs Conroe all over again.

Sure, but the P core team/design clearly sucks. Philosophies are one thing but results say way more.

The leakers/rumors also say it was the E core team that was listening to Keller and the P core team got quite arrogant about their success. Not just that they were actively preventing anyone else from taking their job of being the alpha dog core basically.

I got that wrong but post-Sandy were an era of disappointingly small increases. The fact that we're getting to such small improvements in Moore's Law and they would have had lot easier time back then yet the improvements were barely 10% means even in their best days the P core team wasn't anything special.

Again people say it's because the P core team were arrogant and believed nothing could beat them.

Chips and Cheese is oddly optimistic about Lion Cove. But they have been for all the cores. The author is a glass half full guy.

I don't really care about how they expanded branch capability by 8x, or added another ALU, or such uarch details more than that it delivered a meagre 9% advancement. It is the results that speak for itself.

Intel claims removing HT could reduce area by 15% and it's on N3B, the process that was said to be not very good but quite dense, with core area potentially 50%+ more dense, and yet despite all that it's about the size of Zen 5.

I am confused at how Lion Cove and the P core design can have a redeeming factor when the differences is that big and the end results are nearly identical with perf/clock similar and peak clock identical to Zen 5. Lion Cove is a failure akin to Prescott. Not even Intel talks about the Lion Cove portion being a positive anymore. The Intel guy on the video about Arrowlake just went straight to talking about Skymont.

I think it is awfully easy to simply say the P core team sucks. Sure, Skymont got much bigger improvements over Gracemont; however, there is also the thought of diminishing returns in design to consider.

By your logic, it should be possible using the same process for the P core team to increase the performance by 400% (4 Skymont cores fit into the same size as 1 Lion Cove core .... or close enough). If it were this easy, AMD, Apple, and literally everyone else would be having cores that dwarfed both Intel and AMD in single threaded performance.

The truth is likely more in the middle. Yea, it is possible that Lion Cove could use a good tweaking and that there are some bottlenecks in the design that were not foreseen by Intel engineering. Still, they drastically improved the power / performance over the previous architecture.

Yes, Conroe was a wonder architecture compared to P4. P4 was an exercise in producing clock frequency in the belief that you could stretch the pipeline out to as many stages as you needed to achieve the clock speed. Process, efficiency and thermals be damned. Remember, this is the same time window when Itanium and VLIW was going to take over the server market and RAMBUS memory was going to be shoved down the throat of all PC makers.... Intel license and all.

Lion Cove isn't anywhere NEAR that kind of wrong headed thinking. In fact, I think it may do quite well in the laptop and data center markets .... and THAT is where the money is guys ..... not the high end desktop and gaming PC. Those days are gone never to return.

It feels to me like Intel has made some good design direction decisions with the new architecture. It's going to take a year or two to see if I am right or not unfortunately.

Of course, all of this depends on Intel's ability to execute a competitive 18A process node. Otherwise we will be looking at a very different kind of "Intel" in the future.

DavidC1 · Oct 13, 2024

AMDK11 said:
Contrary to what others say, including DavidC1, I argue that LionCove is the first major redesign in the transition from a monolithic to a tiled structure, and therefore loses latency between tiles and, in particular, access to the RAM controller.

Are you saying Skymont, which is on the same ring and die won't also benefit or be harmed from the changes?

The LNC 9% number and SKL 32/72% number is on the same die is it not?

Zen 5 clocks identically in peak to Arrowlake. Both at 5.7GHz. Yet Zen 5 despite being on a significantly less dense node(50-60% difference) is about the same size as Lion Cove.

Hulk · Oct 13, 2024

The discussion regarding P and E cores is interesting.
Is Lion Cove a failure because it only achieves an average 9% IPC increase? Obviously the question is subjective in nature but I think we need to keep in mind that Lion Cove has to deal with the penalty of moving from monolithic to tiles and it looks like Zen 4 to 5 is underperforming from an IPC point-of-view. Maybe there isn't much low hanging fruit left in the design?

Is the huge IPC gain in moving from Crestmont/Gracemont to Skymont a sign that the basic architecture is "better" or there is more low hanging fruit in that design and it will soon top out IPC-wise as well? Skymont is looking magnificent, no doubt about it, especially considering how area efficient it is, but is does have a 20% clock deficit vs. Lion Cove. How much smaller could LC have been if it topped out at 4.6GHz.

I believe we are witnessing the huge cost both in terms of R&D and economics (die area) of very high ST performance. The problem is that there is still plenty of software that requires performant ST cores. Do YOU need 8 of them? Or only 4? Or do you need 12? That of course depends on what each of us considers our primary workloads.

Intel has a good "solution" for MT but alas software development is not quite there yet. If it were and MT was all that mattered then we'd be seeing Arrow Lake having 40 Skymont cores.

Finally, I ask the more knowledgable microprocessor architects here what is the limit for ST performance? I understand this will vary based on the code, but how much instruction parallelism exists in the code? How much and how accurately can you predict what will happen down the line as instructions are processed? Seems like looking into the future and "being ready" for what comes next is getting quite costly in terms of both transistors and ramping up clockspeed, which seem to be the only way to increase ST performance.

Theoretically speaking is there a formula that gives an indication of a limit to ST performance for microprocessors? Or a graph of some sort to indicate ST performance vs number of transistors or something like that?

Hulk · Oct 13, 2024

OneEng2 said:
I think it is awfully easy to simply say the P core team sucks. Sure, Skymont got much bigger improvements over Gracemont; however, there is also the thought of diminishing returns in design to consider.

By your logic, it should be possible using the same process for the P core team to increase the performance by 400% (4 Skymont cores fit into the same size as 1 Lion Cove core .... or close enough). If it were this easy, AMD, Apple, and literally everyone else would be having cores that dwarfed both Intel and AMD in single threaded performance.

The truth is likely more in the middle. Yea, it is possible that Lion Cove could use a good tweaking and that there are some bottlenecks in the design that were not foreseen by Intel engineering. Still, they drastically improved the power / performance over the previous architecture.

Yes, Conroe was a wonder architecture compared to P4. P4 was an exercise in producing clock frequency in the belief that you could stretch the pipeline out to as many stages as you needed to achieve the clock speed. Process, efficiency and thermals be damned. Remember, this is the same time window when Itanium and VLIW was going to take over the server market and RAMBUS memory was going to be shoved down the throat of all PC makers.... Intel license and all.

Lion Cove isn't anywhere NEAR that kind of wrong headed thinking. In fact, I think it may do quite well in the laptop and data center markets .... and THAT is where the money is guys ..... not the high end desktop and gaming PC. Those days are gone never to return.

It feels to me like Intel has made some good design direction decisions with the new architecture. It's going to take a year or two to see if I am right or not unfortunately.

Of course, all of this depends on Intel's ability to execute a competitive 18A process node. Otherwise we will be looking at a very different kind of "Intel" in the future.

Lion Cove may be not showing it's stuff in preliminary benchmarks or perhaps is starved for bandwidth when it comes to gaming.

Can't wait for reviews, or better yet for some of us to test it.

DavidC1 · Oct 13, 2024

Hulk said:
Obviously the question is subjective in nature but I think we need to keep in mind that Lion Cove has to deal with the penalty of moving from monolithic to tiles and it looks like Zen 4 to 5 is underperforming from an IPC point-of-view. Maybe there isn't much low hanging fruit left in the design?

Tell me this:

1. Let's assume there's a penalty in Arrowlake, tile or no tiles.
2. Lion Cove is on the same die as Skymont... no they are both on the same ring!
3. If they are on the same ring and die and 9% for Lion Cove is after the penalty, is it not logical to conclude 32%/72% for Skymont is also after the penalty?
4. What is the conclusion then? Lion Cove without penalty is 14%? What about Skymont then? Same 4.5% would mean 32%/72% would end up at 38%/79%, no?

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Golden Member

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Golden Member

Senior member

Golden Member

Senior member

Golden Member

Golden Member

Platinum Member

Golden Member

Senior member

Senior member

Senior member

Senior member

Golden Member

Diamond Member

Diamond Member

Golden Member