Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 594 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
702
632
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4TSMC N3BTSMC N3BIntel 18A
DateQ4 2023Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P8P + 16E4P + 4E4P + 8E
LLC24 MB36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,014
  • LNL.png
    881.8 KB · Views: 25,501
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,211
1,932
96
Skymont is 1.15mm2. Without the doubled FP block it would have been under 1mm2.

Lion Cove is 3.45mm2. That's a 3.5:1 ratio. That's a similar ratio to Redwood Cove vs Crestmont. This is despite clocking higher than the predecessor and performing 32% higher.

32% vs 9%. One clocks 4.5% higher than predecessor while the other clocks 5% less, both on the same die. Meaning Skymont closed the gap by 33%.
If there are no e core products this UC could be canned any moment Intel feels the financial squeeze
Yea, it's the P core that needs to be canned.

@AMDK11
Apple in Mx and ARM adds a total of 8 ALUs, gaining a significant IPC increase for INT, but in your opinion Skymont adds 2x more for no reason?
No one, and I mean no one including me has claimed this. You said Skymont added 4 additional ALUs, and I said those are simple ALUs. And despite the addition the core is very small.

I still want you to address the 3x core size difference, and the ring clock penalty also affecting Skymont the same way as it would Lion Cove.
 
Last edited:

AMDK11

Senior member
Jul 15, 2019
438
360
136
Lower clock speeds, densely packed Skymont core logic, smaller structures including buffers.

There is no great philosophy here. See on Zen4c compared to Zen4 while maintaining the same IPC, the surface area of Zen4c is much smaller(-34%). The same was done with Skymont, plus less extensive core logic, mainly buffers and no UOP cache.
 

desrever

Senior member
Nov 6, 2021
218
600
106
Not sure why everyone is so into PPA for CPU cores. Ultimately PPA efficiency of CPU cores are pretty negligible on the products that can be built for consumers.

Single digit mm^2 size for core means very little when everything else on die is required still. With the same amount of L3 and L2 cache, a CPU with just Lions Cove or Skymont would be not be a big difference in size if you target the same performance.
 

DavidC1

Golden Member
Dec 29, 2023
1,211
1,932
96
If you watch the Youtube videos about Tremont, there are few that are done by the lead architect, Stephen Robinson.

I'm really excited about Tremont.
1:42 - It's actually great IP in Intel's portfolio,
1:45 - and it's the start of a whole new line of processors.
In the video he says he implies the core to be widely used. It however ended up being used in just two products - Lakefield, which is basically discarded from the start, and Snow Ridge, an Edge SoC, which has no Gracemont based successor despite having a roadmap.

The team that got ignored for a decade now has a chance to change that completely and flip the script 180 degrees. They deserve it.

@desrever
Not sure why everyone is so into PPA for CPU cores. Ultimately PPA efficiency of CPU cores are pretty negligible on the products that can be built for consumers.
Do you think Alderlake got hybrid for no reason? It's cause their P core sucked so much they got the E core in there. A cluster of 4 E cores is about the same size as 1 P core. Look at the die shot and tell me it doesn't make a difference.
 

Hulk

Diamond Member
Oct 9, 1999
4,701
2,863
136
The fact that Gracemont (2021) achieved IPC Skylake (2015) and Skymont (2024) achieved IPC GoldenCove/RaptorCove (2021-2022) is not indicative of the size of the Skymont team. If you have time and many years to build a core from an IPC from several years ago, with different solutions and philosophies in (design assumptions), it does not mean that it is something special. They had a lot of time and freedom to optimize the design for a specific purpose.

We are not 100% sure that the next-gen mont/wilk IPC increase will result in another 30% average increase (INT).

For now, it is only a hypothesis and an assumption that subsequent generations will bring a constant, high increase in IPC that will be equal to or greater than the next Cove generation.

It is also uncertain whether the next generation of Cove will only achieve a slight increase in IPC.

LionCove is a redesign combined with changing the monolith to tiles. I think LionCove is losing a lot because of this. Who knows what IPC profits would bring to LionCove if it was on a monolith like RaptorLake.
I don't want to belabor the point but this is what I originally responded to.

"And now Skymont from a different angle. Gracemont without HT and at a much lower clock speed has roughly the IPC of Skylake from 2015. Skymont has 32% higher IPC INT and 70% higher FP after 9 years of catching up to GoldenCove from 2021! Do you still say it's a breakthrough?

The year is 2024 and Skymont is at the level of the 2021 GoldenCove IPC, with a much lower clock speed and no HT."

I was responding that Skylake was 6 years behind the P and now Skymont is only 3 years behind the P. I think you are saying what I'm not considering is that the P IPC increases have slowed so it's an easier target for Skymont to close in on? That is logical I admit if that is your point.
 
Reactions: Tlh97 and DavidC1

DavidC1

Golden Member
Dec 29, 2023
1,211
1,932
96
From David Huang:

Intel CPU Branch MKPI(Misses per Kilo Instruction)

AMDCPU Branch MKPI(Misses per Kilo Instruction)


His conclusions(translated):
  • The accuracy of Lion Cove’s branch forecast is slightly lower than that of Redwood Cove, which is close to Zen 4;
  • Skymont's branch prediction accuracy has been greatly improved, surpassing Lion Cove/Redwood Cove/Zen 4;
  • The accuracy of Zen 5's branch prediction remains stable in the head.
It is difficult to imagine that Lion Cove predicted in the branch three years after Golden Cove's release that this most important aspect for CPU not only did not improve, but also went backwards.
On the other hand, Skymont showed us Intel's excellent micro-architeration capabilities. Three years after Gracemont's release, the BTB specifications close to Zen 4 alone can achieve branch prediction accuracy beyond Zen 4. Taking into account the various trends in the development of the Atom micro-architecture mentioned above, I think Atom has shown the potential to become the main nuclear-micro architecture in at least some aspects of this generation.
 

cannedlake240

Senior member
Jul 4, 2024
207
111
76
From David Huang:

Intel CPU Branch MKPI(Misses per Kilo Instruction)

AMDCPU Branch MKPI(Misses per Kilo Instruction)


His conclusions(translated):
  • The accuracy of Lion Cove’s branch forecast is slightly lower than that of Redwood Cove, which is close to Zen 4;
  • Skymont's branch prediction accuracy has been greatly improved, surpassing Lion Cove/Redwood Cove/Zen 4;
  • The accuracy of Zen 5's branch prediction remains stable in the head.
Time will tell, maybe the next P core can address the parts where LionCove fell short. Also wonder how the addition of APX that apparently increases instruction length will affect the E core. From testing it seems Skymont's decode is not well suited for longer instructions
 

Hulk

Diamond Member
Oct 9, 1999
4,701
2,863
136
From David Huang:

Intel CPU Branch MKPI(Misses per Kilo Instruction)

AMDCPU Branch MKPI(Misses per Kilo Instruction)


His conclusions(translated):
  • The accuracy of Lion Cove’s branch forecast is slightly lower than that of Redwood Cove, which is close to Zen 4;
  • Skymont's branch prediction accuracy has been greatly improved, surpassing Lion Cove/Redwood Cove/Zen 4;
  • The accuracy of Zen 5's branch prediction remains stable in the head.
I readily admit that I do not have the grasp on these fundamental that many here do but here is what I'm wondering. Please feel free to straighten me out.

If a CPU has a wider front end then isn't it in effect "looking further ahead" than one that is queuing less instructions so therefor to two architectures can't be directly compared apples-to-apples for branch misses? If CPU A can decode 2 instructions at a time but CPU B can decode 8 it seems like CPU B, while having a decoding advantage has a harder job of ordering and predicting how to execute?

I think I'm off-base here but wanted to put it out there so I can learn how to better interpret this kind of data.
 
Reactions: DaaQ and hemedans

desrever

Senior member
Nov 6, 2021
218
600
106
Do you think Alderlake got hybrid for no reason? It's cause their P core sucked so much they got the E core in there. A cluster of 4 E cores is about the same size as 1 P core. Look at the die shot and tell me it doesn't make a difference.
It saves Intel money to go hybrid obviously but its not that much in the grand scheme of things. A cluster of 4 E cores is better at a specific task, multi core. If intel had to do 16 P cores, they could with make that CPU with like a 15-20% increase in die size compared to their current 8+16 situation, all 16 cores will have fat caches too compared to E cores where the cache is shared which limits their performance when loaded, they can also then enable AVX512 too.

Their current implementation with 8+16, they are only basically the same multithreaded performance as AMD's 16 core implementations. 14900k vs 7950x is about even depends on the benchmark and 285k vs 9950x probably going to be similar. On paper, 16 E cores with ~90% the IPC of Zen 5 should blow 8 Zen 5 cores out of the water but ARL muli threaded will probably be about even.
 

OneEng2

Senior member
Sep 19, 2022
259
358
106
Wow. Lots to unpack here.
Isn't this the 1st time that Intel is in serious monetary difficulties together with technical problems? They could buy their way out of trouble before, but I think not this time.
I agree. Not only did they "buy" their way out of trouble, they used their monopoly power and vertical integration with chipsets, mb's, etc to influence OEM's to keep in the game until their designs got out of the woods again. This time is different in that respect. You have a valid point.
If gaming performance was really a collapse why would Intel even release this. Why not cancel release or delay it.

Yes vanilla Zen 5 was disappointing and a flop, but thats because it did not improve gaming performance or did not regress it either compared to vanilla Zen 4.

Its one thing if Arrow Lake is no gaming improvement over Raptor Lake or maybe even a slight regression, but an outrght collapse?? How could intel release such a CPU. Or change marketing towards non gamers only lol if that is really the case.
... just out of curiosity, how much of the CPU market do you believe gamers comprise?

The desktop market is only 20-25% of the CPU market. High end desktops (where gamers purchase) is only 10-20% of that. Furthermore, the trend is decidedly down for these markets over the last decade and continues to trend down year over year.

In other words, in the grand scheme of things there is no reason for Intel to be concerned much at all about gaming (high end desktop). Missing gaming improvements in the first of the new processor generation is hardly catastrophic for Intel, but boy it sure sounds like it around this forum . I do get that gamers are disappointed, but I don't think this miss is going to have much impact on Intel financially.
It's not that dramatic. They botched the fabric for this gen and it's likely fixed by Nova lake or even Panther lake. This is almost nothing compared to their fabs seemingly having no customers. Going all in on fabs is basically why they are in this situation. Funny it's starting to look like cancelling Panther lake-S was a mistake. If the memory latency and LLC clocks are fixed it could've been a decent generation. Creating more 18A volume for IFS would've been a plus. Also the lga1851 platform wouldn't end up being retired after a single gen which is yet another blow to Intel mind share
I agree. It is likely that the 2025 server chip will include improvements in the bottlenecks seen in Arrow Lake. Clearwater Forest is a MUCH more important chip for Intel to get right. Arrow Lake and Lunar Lake are both just warm-ups for that event.
QC/Nvidia+MTK are in laptop Intel is not sitting Idle in Laptop it's the desktop that doesn't have competition till Zen6 as for server yes But GNR is not at bloodbath it will slow AMD quite a bit
GNR is generally behind by a geometric mean of about 40% I believe I read. The only measure that this is not a "blood bath" is in the context of it not being 200% like it has been. GNR is both more expensive, and less performant than Turin, and it requires a new platform while Turin is using the same socket as its 2 previous predecessors I believe. I would say that GNR is a good step in the right direction, but not enough to stop the momentum that AMD has going in the data center. AMD will continue to gain market share at or above the current rate of change from the last couple of years IMO.

Only Clearwater Forest can reverse the trend .... and even then, it is likely that Clearwater Forest will be challenged pretty quickly by EPYC Zen 6 (Venice) which will have 32 core CCD's. Still, Clearwater Forest looks like it could be a very impressive chip. Much higher efficiency, much more dense on 18A with BSPD .... it has all the makings of an impressive DC processor.
This is just a hypothesis, as Netbrust has never had a larger number of execution units or a wider decoder in any generation. How Netbrust would behave if the next generation had a decoder width of 2 instead of 1 is pure hypothesis.

Netbrust has never fought for a higher IPC. Netbrust is a completely different philosophy and has seen IPC drops in subsequent generations.

I wouldn't find any similarities between LionCove and Netbrust. LionCove has a higher IPC, but also loses due to lower clock speeds and suffers in the transition from monolith to tiles.

A complete picture of LionCove's IPC will become available after comprehensive analysis and testing of ArrowLake-S.
Exactly. Netburst spent all its time recovering from branch misses (flushing out that long pipeline). I also think that people forget that achieving higher clock speed involves having more stages. Having good performance across a wide variety of loads involves having lots of complex processing units that are specialized. Being able to do all this while bearing the chip to chip latency .... also more complicated.

Seems like people think that you can take Skymont and just tack on a trivial amount of additional logic and surpass Lion Cove across the board. If it were that easy, seems like everyone would be doing it.... but they aren't.
Lower clock speeds, densely packed Skymont core logic, smaller structures including buffers.
There is no great philosophy here. See on Zen4c compared to Zen4 while maintaining the same IPC, the surface area of Zen4c is much smaller(-34%). The same was done with Skymont, plus less extensive core logic, mainly buffers and no UOP cache.
Yes, but Zen5c is still quite a bit larger than Skymont I believe. My guess is that it is likely that Turin Dense with its Zen 5c cores will likely pound the crap out of an equal number of Skymont cores in data center applications. We won't know for sure until Clearwater Forest I think.
Not sure why everyone is so into PPA for CPU cores. Ultimately PPA efficiency of CPU cores are pretty negligible on the products that can be built for consumers.

Single digit mm^2 size for core means very little when everything else on die is required still. With the same amount of L3 and L2 cache, a CPU with just Lions Cove or Skymont would be not be a big difference in size if you target the same performance.
Because the cores must run within a thermal and power limit. When you get all those cores going at the same time, the trick is to get them all up to higher clocks without reaching a thermal or power limit. Otherwise you get a core that is designed to go 6Ghz that can't run over 2Ghz.
 

OneEng2

Senior member
Sep 19, 2022
259
358
106
.....
Their current implementation with 8+16, they are only basically the same multithreaded performance as AMD's 16 core implementations. 14900k vs 7950x is about even depends on the benchmark and 285k vs 9950x probably going to be similar. On paper, 16 E cores with ~90% the IPC of Zen 5 should blow 8 Zen 5 cores out of the water but ARL muli threaded will probably be about even.
I have heard that Arrow Lake is "about even" with 14900K; however, 14900K was never a match for 9950x in multi-threaded workloads, so I am thinking that Arrow Lake is going to have to be a great deal faster in MT to be equal to 9950x.
 
Reactions: Tlh97

OneEng2

Senior member
Sep 19, 2022
259
358
106
View attachment 109688

well, at least he’s more conservative with IPC increases now.
I think that unified core is the OPPOSITE of where their architecture should be headed.

It is generally true that a specific workload can be most efficiently and most quickly performed by hardware that is designed for that workload.

It is also generally true that a processor that is designed to achieve the maximum performance will do so at the expense of more transistors and less efficient transistors.

Today we have P cores, E cores, AI cores, GPU cores (did I miss any). It is just one engineers opinion; however, I am guessing that we will have more specific cores in the future, not less and that these cores will be sprinkled about in different combinations in order to achieve the maximum performance for a specific kind of usage.

Digital mixers are able to do a metric crap ton of digital processing algorithms to shape sound, and they do it in real time (<1mSec from input signal to output) and produce minimal heat. They do this with specialized processors (DSP chips) with lots of hardware specific logic in them.

I think Intel has lost their minds if this is their direction.
 

GTracing

Member
Aug 6, 2021
168
396
106
Not sure why everyone is so into PPA for CPU cores. Ultimately PPA efficiency of CPU cores are pretty negligible on the products that can be built for consumers.

Single digit mm^2 size for core means very little when everything else on die is required still. With the same amount of L3 and L2 cache, a CPU with just Lions Cove or Skymont would be not be a big difference in size if you target the same performance.
I don't want to keep spamming the thread, but since you asked. A few mm^2 makes a big difference. On server chips that's 15-25% less cores for the same die area. On client chips, ~5mm^2 per unit sold adds up to hundreds of millions of dollars. And then there's the power consumption. A core that needs more transistors to do the same work is going to use more wattage. I don't think I can overstate just how bad the poor PPA is for Intel.
 

alcoholbob

Diamond Member
May 24, 2005
6,338
404
126
It is generally true that a specific workload can be most efficiently and most quickly performed by hardware that is designed for that workload.

It is also generally true that a processor that is designed to achieve the maximum performance will do so at the expense of more transistors and less efficient transistors.

Today we have P cores, E cores, AI cores, GPU cores (did I miss any). It is just one engineers opinion; however, I am guessing that we will have more specific cores in the future, not less and that these cores will be sprinkled about in different combinations in order to achieve the maximum performance for a specific kind of usage.

Digital mixers are able to do a metric crap ton of digital processing algorithms to shape sound, and they do it in real time (<1mSec from input signal to output) and produce minimal heat. They do this with specialized processors (DSP chips) with lots of hardware specific logic in them.

I think Intel has lost their minds if this is their direction.

They want to win back server, and nobody running servers wants to deal with heterogeneous architectures.
 
Last edited:
Reactions: Tlh97 and majord

DavidC1

Golden Member
Dec 29, 2023
1,211
1,932
96
I think I'm off-base here but wanted to put it out there so I can learn how to better interpret this kind of data.
The branch MPKI tests the ability of the branch predictor to predict instructions, so anything after that will have to be measured using a different metric, such as "IPC". Skymont is simply better than Lion Cove here.
From testing it seems Skymont's decode is not well suited for longer instructions.
From where are you getting this?

It's only slow because it's the LNL version with the dog slow SLC cache.
 

511

Golden Member
Jul 12, 2024
1,038
897
106
From reddit

There's a teardown of meteor lake (4) and an analysis of 3 based on 4's performance and intel's claimed 4->3 uplift.

Intel 3 has an optional 2 fin library, threshold voltage shift layer in the gate dielectric, contacts optimized to eliminate contact/s/d overlap capacitance, both 4 and 3 have the 3D MIM capacitor, the more regular fin end cuts allowing more dense layouts (EUV rather than saqp). Intel 3 has customizable number of metal layers and will support through silicon via in the future.

TechInsights at the time of Intel 3's announcement guessed comparable performance between 3 and N3, even though 3's density is much lower. It isn't clear to be whether the new server product actually implements all of those electrostatic improvements, so I don't know if that product is representative of the node's potential.

Intel 3 is a capable node it is the P core dragging them down with GNR 🤣
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |