Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 621 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
700
615
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4TSMC N3BTSMC N3BIntel 18A
DateQ4 2023Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P8P + 16E4P + 4E4P + 8E
LLC24 MB36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,009
  • LNL.png
    881.8 KB · Views: 25,496
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,082
1,727
96
This summarizes x64 desktop in 2024.
Maybe the disappointment with both Zen 5 and Arrowlake is due to unwarranted hype.

How many believed the 30-40% per clock gains by MLID? If people here didn't believe MLID, many outside this forum definitely did.

Zen 5 was hyped because there seemed to be structures that enlarged enough that such high numbers for gains seemed reasonable. It is a lesson that high level disclosures are not worth much.
 

gdansk

Diamond Member
Feb 8, 2011
3,110
4,826
136
Maybe the disappointment with both Zen 5 and Arrowlake is due to unwarranted hype.

How many believed the 30-40% per clock gains by MLID? If people here didn't believe MLID, many outside this forum definitely did.

Zen 5 was hyped because there seemed to be structures that enlarged enough that such high numbers for gains seemed reasonable. It is a lesson that high level disclosures are not worth much.
Not me, I had hype levels much lower than that. I expected Zen 5 to be 5% faster than it is. And I expected Arrow Lake to lose to X3D but not regular Zen 5 in gaming.

I don't think either of those were too hype. It would allow desktop 100W+ x86 to be as fast as a fanless tablet from a lifestyle brand for handling web bloat.
 

Meteor Late

Member
Dec 15, 2023
48
45
51
Sure it does. Because the design teams are different.

Also Skymont is pretty efficient on Lunarlake.

Much lower frequency, though. If Skymont in Lunar Lake was already less efficient than Lion Cove at the upper part of the performance power curve (Geekerwan), it is only getting worse with a 24% bump (3.7 to 4.6GHz) in E core frequency from 268v to 285K compared to a 14% frequency bump (5 to 5.7GHz) in the P core.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,625
3,681
106
They're bleeding share in laptop and pretty aggressively so.
If you mean LNL, then it's more niche than discrete GPUs. But all tablet chips are.

This is going to be a big question. AMD is churning out notebook designs one after another, at faster pace than Intel, and will have a design for nearly every segment of the market and price range. I don't see the 80%-20% market share in favor of Intel as sustainable.
 

gdansk

Diamond Member
Feb 8, 2011
3,110
4,826
136
Can someone expand on which factors may contribute the 9950X and 285K having such similar IPC in 1T when it was shown Zen 5 operates as a 4-wide decode design outside of SMT and Lion Cove is apparently up to 8-wide decode?

Too many mispredictions? There isn't more ILP to exploit? Lion Cove is bottlenecked at execution? Then why does Lion Cove have 8 wide decode?
 

Saylick

Diamond Member
Sep 10, 2012
3,605
8,075
136
This is going to be a big question. AMD is churning out notebook designs one after another, at faster pace than Intel, and will have a design for nearly every segment of the market and price range. I don't see the 80%-20% market share in favor of Intel as sustainable.
There just isn't the volume for AMD to grab market share, in my opinion. I won't believe it until I see businesses getting AMD CPUs in their corporate laptops as standard.
 

gdansk

Diamond Member
Feb 8, 2011
3,110
4,826
136
Gotta wait for CnC.
I read this but I'm still now sure how it ends up about the same. It seems to have more cache, bigger buffers, better rename, and in many measurements it looks better. But in real world code and SPEC it ends up within the margin of error.

 

DavidC1

Golden Member
Dec 29, 2023
1,082
1,727
96
Too many mispredictions? There isn't more ILP to exploit? Lion Cove is bottlenecked at execution? Then why does Lion Cove have 8 wide decode?
This is difficult to know, because we're talking about high level details. Simply put it could be generalized as imbalanced design and shoddy execution.

There was a post a while ago about Tesla engineers saying 8-wide decode on x86 is impossible. So the illogic of Lion Cove is that it did do that despite a 10% gain. One would think 7-wide would have been more than enough. I think they could have stayed at 6-wide for 10% gains.

Plus general purpose performance is not heavily limited by decode. Having more helps, but that alone is not enough. We stayed 4-wide from Core 2 in 2006 to Skylake in 2016, right? ILP, or instruction level parallelism is lower than 2 in average for lots of code. So even 4-wide is double that.

But obviously there are code that benefits from being wider, and Google mentioned this, and so did Jim Keller. And if you improve elsewhere, than the average will go up, and maybe you'll also have more cases where it can reach peak. This is why some believed back in the 2000s, it was hard to get above original Athlon in performance. Yet here we are.

You can see from David Huang's review that Lion Cove regressed in Branch MKPI by 2% over Golden Cove. But Zen 5 got 10% improvement over Zen 4, with 9% Integer gain right? Skymont's 32% Integer gain was roughly matched by 27% in MKPI improvement.

So Lion Cove's 10% gain should have got it's branch predictor little better too, not worse. Branch predictor is very, very important as less misses mean less flushed pipelines, meaning more performance and less wasted power. And it improves ILP. So Lion Cove is wasting it's resources elsewhere, like the overly big decode for one. Where else is LNC wasting it?
 

DavidC1

Golden Member
Dec 29, 2023
1,082
1,727
96
Adding on the explanation:

To get 100% gain, you can't rely on one unit. Because if that was the case, then the code itself is not general purpose, and extremely specialized. Such as Matrix Math like Linpack, or Dhrystone. If you want to see whether the execution units are functional, then they are useful, for academic purposes.

For GPUs, you need more shaders, more texture units, and more memory bandwidth. Those are the three big basic ones. Then when you take all games into account, then the split would be 33%/33%/33%. So if you quadruple memory bandwidth, but leave the rest same, instead of getting 33x2 improvement, you run into diminishing returns.

In a balanced code, 2x bandwidth = 33% gains in the above simplified example. So you say, "Oh it's not bottlenecked". But that's incorrect. You are memory bandwidth bound, just not 100% because you are running a balanced, real world game, not a scripted benchmark to stress only memory(That's what 100% means).

CPUs are general purpose. So instead of 3 basic units, now you need 10 basic units, that each need to add to the final by 10% to get 100% combined gain. Now that balance is hard, but it's the basic idea. So if you have a bad team, you might put too much effort into one area, and almost nothing in others. Lion Cove goes through the trouble of having an "impossible" 8-wide x86 decode, but doesn't improve in branch prediction. Can you say "IMBALANCE!"?
 

DavidC1

Golden Member
Dec 29, 2023
1,082
1,727
96
Much lower frequency, though. If Skymont in Lunar Lake was already less efficient than Lion Cove at the upper part of the performance power curve (Geekerwan), it is only getting worse with a 24% bump (3.7 to 4.6GHz) in E core frequency from 268v to 285K compared to a 14% frequency bump (5 to 5.7GHz) in the P core.
So it should not reach 5.7GHz. Revolutionary idea? Perhaps...

Apple, the peak chip clocks at 4.4GHz with a 9-stage pipeline and uses a mere 7W in ST. x86 cores are 14-19 stages and uses more than 30W at 5.7GHz.

Less than 30% difference in clocks, for more than double worst-case pipeline stages and 5x the power use. And in Intel's case, it degrades while doing so. And in Arrowlake it had to sacrifice ring bus clocks because 4.6GHz ring was too crazy for it. And isn't Apple beating the 5.7GHz x86 chips in pure ST performance?

@adroc_thurston UOP cache hit is only little higher? 14 is only on a hit. 80% hit rate is 15 stages, and likely 60-70% is more likely for average scalar Int, which makes it 16, which is a magic number. When it misses, it's 19 stages, because adding a uop cache adds a few stages.

Magic number, because that was the number of stages before uop cache. Nehalem had 16. So one cannot help but think uop cache is there to up the clocks while minimizing the consequence of increased pipeline stages. So the real goal is subtly similar to Netburst - clockspeed focus.

(Hmm, with Intel's own x86 Optimization Manual saying that Gracemont's way of decode makes it possible for wide x86 decode without needing uop cache, I would suggest a Unified Core will NOT have uop cache. I'm also gonna say Hyperthreading is gone for UC)

However, there is MORE to 5.7GHz. Because this isn't pure clock scaling. This is insanity, pushing it using any means necessary. So I would also challenge that if they change the ideology and completely redesign for low-5GHz clocks at maximum, then things like cache latencies or sizes can be enlarged quite a bit more now.
 
Reactions: Tlh97 and moinmoin

H433x0n

Golden Member
Mar 15, 2023
1,222
1,599
96
ahh I know why it performs well in CB2024, it’s not cause the P core is awesome, it’s because DRAM bandwidth effects ST by a noticeable margin.

The older Cinebench versions are not dependent on memory for ST scaling.
Anyway to prove that? Otherwise we’ll be dealing with conspiracy theories about Maxon and Intel for the next 2 years.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,660
5,281
96

DavidC1

Golden Member
Dec 29, 2023
1,082
1,727
96
They're at 4.5 now and 5 isn't far off.
LOL! Further proves my point. The gap is so close now it's time to back off on pipelines, high clock speed focus, so you can have a smaller core, be more power efficient, et cetera, et cetera.

This is a way of saying Netburst without directly saying Netburst. Abandon the 2024 Pentium 4 and go back to lower clocks.
I think Zen 5 looks pretty darn good and Lunar Lake is definitely an improvement for thin and light. ARL has some issues with some applications but I'm waiting a bit to see how things settle out.
I'd love a Lunarlake efficiency x86 laptop. FHD 11.6 inch screen would get 15 hours on a 50WHr battery and be under 2.3lbs while being a 2-in-1 convertible. It simply is not possible with other platforms.
 

LightningZ71

Golden Member
Mar 10, 2017
1,827
2,203
136
Back to why there is a performance hit for 8p0e, two things come to mind: first, we may be hitting a floor for available threads. Maybe the programs really do need more threads available. Second, does deactivating or just ignoring the e-core clusters negatively affect the L3 slices that they are adjacent to? Either it effectively is unusable, or, the extra latency to cross the ring bus to the next L3 slice is so bad that it obviates it's usefulness to a great degree?
 

Abwx

Lifer
Apr 2, 2011
11,591
4,408
136
Anyway to prove that? Otherwise we’ll be dealing with conspiracy theories about Maxon and Intel for the next 2 years.
Maxon s historical relation with Intel is documented, so why are you doing like it s a conspiration theory while it s a fact.?.

You know better than Maxon s senior software developper.?

 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |