Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 578 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
702
632
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4TSMC N3BTSMC N3BIntel 18A
DateQ4 2023Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P8P + 16E4P + 4E4P + 8E
LLC24 MB36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,014
  • LNL.png
    881.8 KB · Views: 25,501
Last edited:

Magio

Member
May 13, 2024
104
111
76
That 12 Xe3 IGP looks particularly appealing, although BW will be a problem unless they put some SLC inside.

Yeah I had long planned to upgrade to Lunar Lake when it eventually came, and I have not completely ruled that out still because I do still like what that platform offers but its shortcomings (pure MT performance) and the generally poor designs from OEMs so far in my opinion (there's not a single one that seems "Macbook tier") is making me consider waiting for PTL-P which has a lot of potential with 18A. Early 2026 will have the XPS redesigns which hopefully won't be dumb so waiting for that is tempting.

Will at least wait until CES this January, maybe something will seem worth upgrading to there. But otherwise I'll likely wait one more year.
 

Hulk

Diamond Member
Oct 9, 1999
4,701
2,863
136
LionCove is not even roughly the same microarchitecture as RaptorCove despite a modest average 9% IPC increase. LionCove loses a lot in the overall construction of ArrowLake.

The execution engine in LionCove has been thoroughly rebuilt and now closely resembles Zen5 and Skymont.

I hope to see a detailed analysis of LionCove with ArrowLake and an IPC test with HTT disabled in Raptor.
I defer to your knowledge but looking at these block diagrams it seems as though Lion Cove is much closer architecture-wise to Raptor Cove than Skymont? What am I missing in looking at these diagrams?
 

Attachments

  • 2022 - Raptor Cove.jpg
    504.1 KB · Views: 18
  • 2024 - Lion Cove.jpg
    536.4 KB · Views: 19
  • 2024 - Skymont.jpg
    441.7 KB · Views: 19
Reactions: igor_kavinski

511

Golden Member
Jul 12, 2024
1,038
896
106
I defer to your knowledge but looking at these block diagrams it seems as though Lion Cove is much closer architecture-wise to Raptor Cove than Skymont? What am I missing in looking at these diagrams?
The splitting of scheduler it used to be unified now it's non unified and split into Integer/Vector/ store load Front end changes were related to widening and adding additional L0 not repurposing whole backend like they did and are not reflected in these images
 

511

Golden Member
Jul 12, 2024
1,038
896
106
Windows sucks vs Linux
I crosschecked PL1=45W and PL2= 115W intel spec made sure to check RAPL in linux and boi it feels bad

For some dumb reason The clocks behave weirdly and it isn't boosting in some workload but at 5.1Ghz in linux 2700 is possible Lunar lake should easily cross 3000-3100
 
Last edited:
Reactions: igor_kavinski

Hitman928

Diamond Member
Apr 15, 2012
6,390
11,392
136
Windows sucks vs Linux
I crosschecked PL1=45W and PL2= 115W intel spec made sure to check RAPL in linux and boi it feels bad
View attachment 109346View attachment 109347

Linux is usually faster for CPU performance but not that much. It's because the linked Windows test is running at lower frequencies. Probably the power plan is throttling performance for lower power.
 

Magio

Member
May 13, 2024
104
111
76
Windows sucks vs Linux
I crosschecked PL1=45W and PL2= 115W intel spec made sure to check RAPL in linux and boi it feels bad
View attachment 109346View attachment 109347
For some dumb reason The clocks behave weirdly and it isn't boosting in some workload but at 5.1Ghz in linux 2700 is possible Lunar lake should easily cross 3000-3100
There are already some LNL results at 3000+ ST on Linux, yeah. This one on 258v (which on Windows tops out below 2800) for example.

I haven't seen Linux results for 268v/288v yet though. Phoronix's testing also had Linux outperform Windows on a 256v by 14% on average in CPU tasks, however their testing also showed that LNL's GPU drivers are just not good at all on Linux right now.
 

AMDK11

Senior member
Jul 15, 2019
438
360
136
I defer to your knowledge but looking at these block diagrams it seems as though Lion Cove is much closer architecture-wise to Raptor Cove than Skymont? What am I missing in looking at these diagrams?
The functional block diagram of the core microarchitecture is a far-reaching simplification and illustration of the core structure. New and more complex algorithms cannot be visualized in the logic that controls the underlying resources. You are too literally looking at the diagram and comparing it with other generations.

LionCove has a completely new front-end with, among others: new 8x wider prediction, new 8-Wide decoder, new prefetch and new larger (5250) + wider (12-Wide) UOPS (L0) cache.

new non-scheduling queue buffers between the scheduler and the physical register file.

Most importantly, a completely redesigned split-cheduler execution engine with separate ports for FP/VEC and separate for Integer. Which is a radical change since the Pentium Pro (P6 – 1995).

Additionally, renaming L1-D to L0-D 48KB and adding L1-D 192KB (L1.5-D).




















 
Last edited:

OneEng2

Senior member
Sep 19, 2022
259
358
106
He already told you that in 1 socket comparisons Turin is only 18% ahead, and that includes benchmarks where Granite Rapids performs really bad, as in behind Sapphire Rapids bad. Turin does not have such issues, or nearly as bad, so what do you think will improve more? Turin that's working or Granite Rapids that's not working at all in few workloads?

Is this normal? Where 1P Sierra Forest and Sapphire Rapids is outperforming 2P Granite Rapids? Where Sierra and Sapphire improves in performance with 2 sockets but Granite loses more than 80%?


The 2P Geomean score will improve by 10% if NAMD is excluded. One benchmark underperforming is dragging the score down by that much.

Now let's chalk that up to platform maturity because a 128 core Intel 3 Redwood Cove underperforming 56 core Intel 7 Golden Cove is terrible. And NAMD is not the only benchmark.
Ok, lets actually look at the article in its entirety:

Starting at the name of the article:

AMD EPYC 9755 / 9575F / 9965 Benchmarks Show Dominating Performance​

Just as with Tom's, sure, there are benchmarks where GNR does well, but the overwhelming amount of the time it gets beat and many times by quite a bit.
 
Reactions: lightmanek

Hulk

Diamond Member
Oct 9, 1999
4,701
2,863
136
The functional block diagram of the core microarchitecture is a far-reaching simplification and illustration of the core structure. New and more complex algorithms cannot be visualized in the logic that controls the underlying resources. You are too literally looking at the diagram and comparing it with other generations.

LionCove has a completely new front-end with, among others: new 8x wider prediction, new 8-Wide decoder, new prefetch and new larger (5250) + wider (12-Wide) UOPS (L0) cache.

new non-scheduling queue buffers between the scheduler and the physical register file.

Most importantly, a completely redesigned split-cheduler execution engine with separate ports for FP/VEC and separate for Integer. Which is a radical change since the Pentium Pro (P6 – 1995).

Additionally, renaming L1-D to L0-D 48KB and adding L1-D 192KB (L1.5-D).













Okay so Lion Cove architecture is closer to Skymont than Raptor Cove? I'm not arguing the point, just trying to understand.
 

Hulk

Diamond Member
Oct 9, 1999
4,701
2,863
136
Did some further CB R24 MT testing. As you add P cores strange things happen. I ran these tests at 5GHz P, 4GHz E, just to make sure there is no throttling or other "at the limit" behavior.

Anyway, assuming Raptor Cove scores 22.6 points/GHz here are the scores as you add P cores to 16 E cores during the render.
1P+16E - 15.2 points/GHz for E's
2P+16E - 15.4
4P+16E - 14.7
6P+16E - 14.0
8P+16E - 13.1

Other than the increase from 1 to 2 P's, the IPC of the E's decreases as P's are added. Anybody have any reasoning for this behavior?

It is of course possible the IPC of the P's are also or only changing but I have found the P IPC to be relatively stable when testing various number of P's.

It's a rabbit hole not worth spending too much time on but I have a hard time leavng it alone...
 

AMDK11

Senior member
Jul 15, 2019
438
360
136
Okay so Lion Cove architecture is closer to Skymont than Raptor Cove? I'm not arguing the point, just trying to understand.
Definitely yes. LionCove now has an execution engine identical to that in Zen5 and very similar to Skymont.

Now Mx, Zen, Mont, Cove have a separate schedule for FP and a separate one for Integer.

The main difference in Skymont is clustered decoding without UOP cache. But this solution was used in Skymont not because it was better, but because it saved logic and complexity. The Skymont core still requires many compromises and savings. It has fewer stages in preparation, can run at a maximum lower clock speed, has 4x128-bit FP and 8xALU, so may exhibit similar behavior to LionCove under certain conditions.

Edit:
If I had to guess and the Cove core was replaced with a Mont/Arctic core, it wouldn't look like those cores would suddenly become lighter compared to Cove. Simply put, the new P-Core will have fewer pipeline stages, lower clock speeds, more advanced logic than LionCove, and a higher IPC.

The new core will bring out the best of Cove and Mont.

But on the other hand, I suspect the teams were merged to support each other and better align Cove with Mont, as currently both projects were very independent and this is a problem for a single CPU project. Time will tell whether this will be the case.
 
Last edited:

OneEng2

Senior member
Sep 19, 2022
259
358
106
Is this AMD topic or I got lost?
Didn't mean to confuse you. There is this company called AMD that is a direct competitor to Intel and is currently the biggest cause for concern for Intel with regards to the competitive landscape and Intel's potential financial success.

In the years past when Intel was very dominant in the industry, any paltry improvement the company decided to make was more than enough to keep them market dominant. Since this is no longer the case, it becomes very relevant to discuss how a completely new architecture released by Intel is going to work out for them in the competitive market. While there are some consumers that may still purchase a computer simply because it says "Intel Inside", that number is greatly reduced from years past.

Since the data center is currently the largest growing market, and provides the highest margins, how a brand new architecture from Intel fares in this market is pretty critical to deciding if Intel can continue a dominant position in the future. Second to this market is the full sized laptop market for which the newest Intel chip has yet to be released, but it will likely be close in behavior to the Arrow Lake desktop and less like the Lunar Lake thin and light release.

Hopefully this clears up why AMD benchmarks in data center are important in discussing a new Intel architecture.
Definitely yes. LionCove now has an execution engine identical to that in Zen5 and very similar to Skymont.

Now Mx, Zen, Mont, Cove have a separate schedule for FP and a separate one for Integer.

The main difference in Skymont is clustered decoding without UOP cache. But this solution was used in Skymont not because it was better, but because it saved logic and complexity. The Skymont core still requires many compromises and savings. It has fewer stages in preparation, can run at a maximum lower clock speed, has 4x128-bit FP and 8xALU, so may exhibit similar behavior to LionCove under certain conditions.

Edit:
If I had to guess and the Cove core was replaced with a Mont/Arctic core, it wouldn't look like those cores would suddenly become lighter compared to Cove. Simply put, the new P-Core will have fewer pipeline stages, lower clock speeds, more advanced logic than LionCove, and a higher IPC.

The new core will bring out the best of Cove and Mont.

But on the other hand, I suspect the teams were merged to support each other and better align Cove with Mont, as currently both projects were very independent and this is a problem for a single CPU project. Time will tell whether this will be the case.
It is an interesting balance. The pipeline is increased in order to allow easier alignment within a stage for higher frequency operation ..... the longer the pipeline is, the more transistors are used, and the more inefficient the design becomes due to pipeline flushes when a prediction is missed.

P4 jumped the gun (obviously) and got too long of a pipeline running directly into a thermal and efficiency problem.

Conroe pulled it back and Intel was back in the game.

In general, it appears to me that with higher core counts, the thermal efficiency (and efficiency in general) is playing more and more of a role in overall processor performance. With that in mind, the Lunar Lake path doesn't really seem like a bad idea so much as just a step back.
 

AMDK11

Senior member
Jul 15, 2019
438
360
136
Didn't mean to confuse you. There is this company called AMD that is a direct competitor to Intel and is currently the biggest cause for concern for Intel with regards to the competitive landscape and Intel's potential financial success.

In the years past when Intel was very dominant in the industry, any paltry improvement the company decided to make was more than enough to keep them market dominant. Since this is no longer the case, it becomes very relevant to discuss how a completely new architecture released by Intel is going to work out for them in the competitive market. While there are some consumers that may still purchase a computer simply because it says "Intel Inside", that number is greatly reduced from years past.

Since the data center is currently the largest growing market, and provides the highest margins, how a brand new architecture from Intel fares in this market is pretty critical to deciding if Intel can continue a dominant position in the future. Second to this market is the full sized laptop market for which the newest Intel chip has yet to be released, but it will likely be close in behavior to the Arrow Lake desktop and less like the Lunar Lake thin and light release.

Hopefully this clears up why AMD benchmarks in data center are important in discussing a new Intel architecture.

It is an interesting balance. The pipeline is increased in order to allow easier alignment within a stage for higher frequency operation ..... the longer the pipeline is, the more transistors are used, and the more inefficient the design becomes due to pipeline flushes when a prediction is missed.

P4 jumped the gun (obviously) and got too long of a pipeline running directly into a thermal and efficiency problem.

Conroe pulled it back and Intel was back in the game.

In general, it appears to me that with higher core counts, the thermal efficiency (and efficiency in general) is playing more and more of a role in overall processor performance. With that in mind, the Lunar Lake path doesn't really seem like a bad idea so much as just a step back.
Pentium 4 (Netbrust) placed great emphasis on the highest possible clock speed. Netbrust had a 1-Wide decoder while Pentium III had a 3-Wide decoder. In the context of Netbrust, IPC was of secondary importance.

Conroe (Core 2) is a redesign of Yonah (Core (1) M) and a proper continuation of the high IPC cores.
 

Hulk

Diamond Member
Oct 9, 1999
4,701
2,863
136
This is kind of interesting. Tom's recently posted a Lunar Lake Dell laptop review. They mention how after 10 minutes running CB R24 the Lion Cove clock averaged 2.5GHz and the Skymont clock 2.83 and the score was 470.

I put those clocks into the Arrow Lake CB R24 "Predictor" and it came back with a score of 463. Pretty close to what was reported.

Keep in mind my "Predictor" assumes a +9% bump over Raptor Cove for Lion Cove IPC and +60% for Skymont over Gracemont.

Assuming these numbers are resonably correct for ARL I would expect the mobile varient of these cores to under perform in a mobile system due to the slower memory subsystem of mobile designs. In this case LL is overperforming against ARL estimates. I am thinking this could mean my estimates for IPC increases are low and/or most of CB R24 remains in the faster caches.

I'm getting schooled so much here lately I'm getting a little hesitant to post! Just kidding. If I don't put my thoughts out there I won't learn anything and we won't advance the discussion.

Anyway, it looks like +9% P and +60%E IPC increase for CB R24 might be what we'll see.

Might have been nice if those numbers were more like +20% for P and + 49% for E but we get what we get. For MT work ARL looks to be quite strong, but for apps not using ton of cores we might not see so much of a performance increase.

I do think the E cores have definitely moved permanently out of the "spam" category!
 

Hulk

Diamond Member
Oct 9, 1999
4,701
2,863
136
Pentium 4 (Netbrust) placed great emphasis on the highest possible clock speed. Netbrust had a 1-Wide decoder while Pentium III had a 3-Wide decoder. In the context of Netbrust, IPC was of secondary importance.

Conroe (Core 2) is a redesign of Yonah (Core (1) M) and a proper continuation of the high IPC cores.
This is true. I loved my 10GHz P4. It was great for ST work.





















Oh yeah, Intel never got there.
 

OneEng2

Senior member
Sep 19, 2022
259
358
106
Pentium 4 (Netbrust) placed great emphasis on the highest possible clock speed. Netbrust had a 1-Wide decoder while Pentium III had a 3-Wide decoder. In the context of Netbrust, IPC was of secondary importance.

Conroe (Core 2) is a redesign of Yonah (Core (1) M) and a proper continuation of the high IPC cores.
Absolutely.

I do admit that I had forgotten that Netburst had only a 1 wide decoder. Pentium III and "Core 2" were absolutely better designs.

Reading through the litany of architectural changes in Lion Cove, it certainly appears that this core should be a rocket ship ..... but as with many designs (and I have quite a few under my belt), "It sure looked good on the white board". Actually, I am of the belief that Lion Cove (and its sister designs) will grow into a very successful core design for Intel ...... in a couple of years. Unlike the disaster that was Netburst and Bulldozer, I don't see anything fundamentally mis-calculated here, only a need for both process and design optimization.

Unfortunately, these things take time. Based on the information that we have at this time, Intel will not be "back on top" again until 18A and some design tweaks come about (a couple of years I think).

For those that think me "Anti-Intel", I am absolutely not. No sane person in the world would wish for anything other than strong competition in the market. Furthermore, and on a more personal note, I happen to be a US vet and a long time CPU architecture buff. I want a strong US IP for my country, and Intel is it.

Opinion: Intel needs to fire a bunch of business majors and focus on their product strategy (vs figuring out how to better leverage their monopoly position to maximize their profit). It is my opinion that Intel stagnated under a bunch of tight neck ties and desperately needs an engineering kick in the a$$. In their engineering lethargy, Intel have allowed TSMC to flank them severely. AMD simply hit a great combination of design and lithography advances available and executed on it. In theory, having a vertically integrated design and foundry process SHOULD have allowed Intel to dominate the industry indefinitely. It is only in Intel's horrendous lack of forward vision that AMD and TSMC have unseated them. I believe Pat G can put the company back on track ........ if he gets enough time. It's really hard to work your way through an army of pencil necks. [/end rant]
 

OneEng2

Senior member
Sep 19, 2022
259
358
106
This is kind of interesting. Tom's recently posted a Lunar Lake Dell laptop review. They mention how after 10 minutes running CB R24 the Lion Cove clock averaged 2.5GHz and the Skymont clock 2.83 and the score was 470.

I put those clocks into the Arrow Lake CB R24 "Predictor" and it came back with a score of 463. Pretty close to what was reported.

Keep in mind my "Predictor" assumes a +9% bump over Raptor Cove for Lion Cove IPC and +60% for Skymont over Gracemont.

Assuming these numbers are resonably correct for ARL I would expect the mobile varient of these cores to under perform in a mobile system due to the slower memory subsystem of mobile designs. In this case LL is overperforming against ARL estimates. I am thinking this could mean my estimates for IPC increases are low and/or most of CB R24 remains in the faster caches.

I'm getting schooled so much here lately I'm getting a little hesitant to post! Just kidding. If I don't put my thoughts out there I won't learn anything and we won't advance the discussion.

Anyway, it looks like +9% P and +60%E IPC increase for CB R24 might be what we'll see.

Might have been nice if those numbers were more like +20% for P and + 49% for E but we get what we get. For MT work ARL looks to be quite strong, but for apps not using ton of cores we might not see so much of a performance increase.

I do think the E cores have definitely moved permanently out of the "spam" category!
With Raptor Lake, Intel held a 24 vs 16 "Full Core" advantage which, in theory, should have put that processor ahead of Zen 5 in terms of multi-threaded performance. Instead, it took a beating:


Based on your well thought out calculations, where do you see Arrow Lake performing in these multi-thread heavy workloads/benchmarks relative to Zen 5 and Raptor Lake?
 

OneEng2

Senior member
Sep 19, 2022
259
358
106
This is true. I loved my 10GHz P4. It was great for ST work.
I don't recall the P4 besting Core 2 even in single threaded benchmarks, but then again, I didn't own a P4 back then either (only a Core 2 quad).

Unfortunately for Intel, it wouldn't have really mattered if it did as AMD moved to 64 bit and dual core processors. Intel (at the time) was withholding 64 bit and multiple processors for its Itanium processor lines (another major tight neck tie moment at Intel).

I have hope that Intel's direction and underlying company philosophy have been brought in line by Pat G and that we will see a steady cadence of great advances with the new Lion Cove and Skymont architectures and that Intel will get on the process bandwagon again with 18A and beyond.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |