Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 422 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
695
601
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (20A)Arrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4Intel 20ATSMC N3BTSMC N3BIntel 18A
DateQ4 2023Q1 2025 ?Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P6P + 8E ?8P + 16E4P + 4E4P + 8E
LLC24 MB24 MB ?36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,000
  • LNL.png
    881.8 KB · Views: 25,481
Last edited:

DavidC1

Senior member
Dec 29, 2023
782
1,241
96
Twitter/X is abuzz with Zen5 benchmarks. And it’s nowhere near what was suggested before. Even in gaming, it’s only almost on par with 14th gen Raptor refresh.

This gives even more headroom for ARL than expected.
See? I'm telling you every leak goes from Mad AMD bullish to Mad Intel bullish.
That seems like a over generalization to me. Plenty of people watch Netflix on their laptop in full screen.
And Netflix is Local Video Playback? LVP is the idealistic scenario for battery life because you have an optimized application not needing to activate Network devices running video, where the CPU could go low power in-between frames to save power, something which won't be the case in Streaming.

Many other people also do not.
 
Reactions: SiliconFly

SiliconFly

Golden Member
Mar 10, 2023
1,470
832
96
... before settling on the "tile latency" thing? ...
Agree. But MTL did have regression clock-for-clock. Initially, it was attributed to "tile latency", but later tests did show that MTL had other issues that may have contributed to it.

Am I not allowed to speculate now?
The best part about AT forums is the freedom to speculate! Fire away...

...They are also under fire from investors and stockholders to have more profits ...

... The only time where they'll be forced en masse when Taiwan takeover actually happens ...
Very very true. I was about to say the exact same thing! Thats exactly how the white collars work.

Thats one of the primary reasons the entire Intel community/shareholders wanted an engineer at the helm instead of bean counters and paper pushers running the show.

See? I'm telling you every leak goes from Mad AMD bullish to Mad Intel bullish.
Makes it interesting if you ask me!
 
Reactions: 511 and DavidC1

DavidC1

Senior member
Dec 29, 2023
782
1,241
96
That seems like a over generalization to me. Plenty of people watch Netflix on their laptop in full screen.
I can tell you from my Kabylake that it struggles to go to the ideal low power if I have more than 2 tabs open and computer not restarted for more than a few days.

The programs and the coding is far, far from perfect. So the things that need to sleep don't, the things that need to flush memory don't.

This kinda reminds me of Icelake, where it could idle so low and get 20+ hour idle screen-on battery, but the bursty browser workload gain was at best equal to the predecessor, meaning the low idle power was in practice, useless.
 

Det0x

Golden Member
Sep 11, 2014
1,231
3,877
136
So we get 253w instead of 353w ?


Taken from here:
 

DavidC1

Senior member
Dec 29, 2023
782
1,241
96
So for those that care about the reddit stuff.

One guy who said was in the Royal Core team says refutes another guy saying Royal Core will invalidate both the P and the E core. He says that just before he left the gains on SpecINT are mediocre.

I always had doubts on my mind about Royal Core claims that it'll be a titanic gain. The revolutionary stuff is often abandoned because the plain boring regular stuff advances enough that developing what exists is good enough.
 

DavidC1

Senior member
Dec 29, 2023
782
1,241
96
Also, people say the E core has been gaining large because it started from the lower end of the scale. This is a lazy explanation.

Well, what do you say about the way different approach the E core team uses? There hasn't been any new ideas on the P core team since Sandy Bridge in 2011! While the E core team has been bringing new ones every 1-2 generations. Also every generation has been a sweeping change.

Atom Bonnell - Macro Op execution: https://www.anandtech.com/show/2493/9
Atom Silvermont - First OoOE, proper memory subsystem
Atom Goldmont - OoOE FP, 3-way decode, 16KB pre-decode L2 cache
Atom Goldmont Plus - 4-way backend, 64KB predecode L2 cache
Atom Tremont - Clustered decode, 128KB predecode L2
Gracemont - Improved Clustered 2x3 decode with auto-balancer. OD-ILD replaces the 128KB predecode L2 and can work with large code sizes, so now the clustered decode works on all code.
Skymont - 3x3 decode, with commonly used ucode instructions per cluster to improve parallelism. This is probably a area/power efficient way of having a Fast Path for instruction. Doubled FP/Vector units which benefits every code out there, not just few.

They improved branch prediction, size of the BTB, and backend width on Crestmont a mere Tick!

Willing to try new things
-Macro Op execution, rather than decode all
-Clustered decode
-Taking out 128KB L2 Predecode and replacing it with OD-ILD
-Very wide 16-wide retire on Skymont to save resources elsewhere. Probably also benefits clustered decode
-More Stores than Loads

Different ideas at a fundamental level
-Clustered decode can execute loops and it has no Loop Stream Buffer
-Rather than shared buffers, everything is independent
-Many, many simple ports over few very powerful ones
-Doubling FP benefits everyone, over AVX needing recompile every time

This is a truly inspired and dynamic team, and this is why the future is likely with them. AMD is using clustered decode on Zen 5 and so far David Huang isn't seeing it working on single thread. Tremont worked better 4 years ago.
 

MoistOintment

Junior Member
Jul 31, 2024
11
22
36
Lunarlake is currently the only promising client part for Intel. Arrowlake seems nothing special. 5% ST? We used to get that with Ticks.

Unless Pantherlake has something that covers deficiencies, don't count on it.

Lunarlake proves that they need a separate lineup for low power. They had to do that for server right?

By logic LPE can't be "used on any workloads". It is limited by it's performance. Lunarlake's two core setup can already use the E cores for boosting performance. That means the "LPE" core is going to be low frequency again.

Also 4+8+4LPE isn't gonna be a performance leader, similar to not counting Meteorlake as 6+8+2. 6+8 is already low. How will they cover the -H market in 2026?

I hoped they would continue the excellent battery life that Cherry Trail Tablets had. In Braswell they completely gave that up. Why? Because by eliminating that lineup they could save extra few hundred million. They need a specialized lineup again not a generic one.

SoCs aimed at respective markets
-Server P
-Server E for Cloud/VM
-Client High end mobile to Client Desktop: 45W-125W
-Client ultra battery life<--9W to 35W
The rumored 5% ST improvement is for the 285K vs 14900K, which will see the biggest stagnation vs 14th gen as it will have the largest clock speed regression (6Ghz -> 5.7Ghz). Every other SKU that'll offer nearly the same clocks as its predecessor should see the 14% average IPC increase more directly translate to ST improvements.
 

SiliconFly

Golden Member
Mar 10, 2023
1,470
832
96
So we get 253w instead of 353w ?
View attachment 104699

Taken from here:
Really? Is that what you see? You're so focused on the first point you missed out the rather impressive second point that says:

"Performance details of the new gen (ARL) are confidential but are expected to be impressive."

It pretty *literally* says Arrow Lake is gonna have impressive performance, with lower power draw even at higher frequencies, and all that without the issues associated with previous gen parts!
 
Reactions: Henry swagger

KompuKare

Golden Member
Jul 28, 2009
1,164
1,426
136
Not sure what you mean by frame lag, but games are very latency sensitive.

If LNC has 5% regression in frequency, that means the performance drops from mid-double to single digit percentage increase. Add on that a potential tile latency penalty, and it may end up not being a very good gaming improvement, that's all I was saying.
Volunteers already wanted to test ARL in the Fallout 4 CPU testing thread!

Now there is an engine which is latency and memory bandwidth sensitive.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,398
136
Really? Is that what you see? You're so focused on the first point you missed out the rather impressive second point that says:

"Performance details of the new gen (ARL) are confidential but are expected to be impressive."

It pretty *literally* says Arrow Lake is gonna have impressive performance, with lower power draw even at higher frequencies, and all that without the issues associated with previous gen parts!

I would say that's significant, but the last bullet point says that the ASUS motherboards are, "highly impressive," so that kind of negates the meaningfulness of the comment. /s
 

DavidC1

Senior member
Dec 29, 2023
782
1,241
96
The rumored 5% ST improvement is for the 285K vs 14900K, which will see the biggest stagnation vs 14th gen as it will have the largest clock speed regression (6Ghz -> 5.7Ghz). Every other SKU that'll offer nearly the same clocks as its predecessor should see the 14% average IPC increase more directly translate to ST improvements.
This is why the Desktop needs a dedicated part and something that can be used for high end mobile too, and another one for Laptop that reaches Lunarlake level of battery life.
 

SiliconFly

Golden Member
Mar 10, 2023
1,470
832
96
The rumored 5% ST improvement is for the 285K vs 14900K, which will see the biggest stagnation vs 14th gen as it will have the largest clock speed regression (6Ghz -> 5.7Ghz). Every other SKU that'll offer nearly the same clocks as its predecessor should see the 14% average IPC increase more directly translate to ST improvements.
Considering all the leaks and data that I keep seeing, I'm still gonna stick to my original claims of double digit ST increase for ARL-S (a bit revised). Hopefully somewhere between 10% to 20%. Definitely not just 5%.
 

DavidC1

Senior member
Dec 29, 2023
782
1,241
96
Quoting from the Intel x86 Optimization Manual about Gracemont. Very bullish about clustered decode. Not just "cheap way of uop caches".
Gracemont and Tremont microarchitectures, large code footprint workloads may see large benefits. This overall approach to x86 instruction decoding provides a clear path forward to very wide designs without needing to cache post-decoded instructions
Regarding the clustered decode and OD-ILD
One unique performance issue for a microarchitecture of clustered decoders can occur when very long basic blocks are executed. Compilers will sometimes unroll loops of code and generate blocks that can be hundreds of instructions long, trying to provide additional parallelism and reduce the overhead of loops. This is very common for some compilers for floating point and vector processing. Since the method of clustering relies on toggle points, inserting unconditional JMP instructions to the next sequential instruction pointer could have been employed by handwritten assembly using the Tremont microarchitecture. Such insertions should no longer be necessary on Gracemont microarchitecture and beyond. Gracemont microarchitecture addresses this bottleneck by introducing a hardware load-balancer. When the hardware detects long basic blocks, additional toggle points can be created based on internal heuristics. These toggle points are added to the predictors, thus guiding the machine to toggle within the basic block.
One potential weakness can be determining the predecode bits and using those to mark the instruction boundaries. An additional change from the Tremont microarchitecture is the removal of the large (128KB) shared second level predecode cache. This cache helped seed the first level predecode cache whenever there were misses in the first level instruction cache. While this handled the majority of performant cases, loops of critical code with a footprint exceeding 1MB+ could still suffer additional front-end bottlenecks due to low decode bandwidth from incorrect predecode bits.
This could be seen via the event TOPDOWN_FE_BOUND.PREDECODE. Instead of a second level predecode cache, the Gracemont microarchitecture introduces an “on-demand” instruction length decoder (OD-ILD). This block is typically only active when new instruction bytes are brought into the instruction cache from a miss. When this happens, two extra cycles are added to the fetch pipeline in order to generate predecode bits on the fly. These are done across 16 bytes per cycle. With clustering, this means the Gracemont microarchitecture is capable of 32 bytes per cycle across the two independent OD-ILDs. While many workloads will not notice a difference in behavior between the Gracemont and Tremont microarchitectures, large code footprint workloads may see large benefits.
 

SiliconFly

Golden Member
Mar 10, 2023
1,470
832
96
That one I'm not so sure. Again, I'm shooting in the dark here, but I think X3D parts have significantly more performance in gaming. Not single, but double digit increase i presume. Just guessing.
 
Last edited:

DavidC1

Senior member
Dec 29, 2023
782
1,241
96
Considering all the leaks and data that I keep seeing, I'm still gonna stick to my original claims of double digit ST increase for ARL-S (a bit revised). Hopefully somewhere between 10% to 20%. Definitely not just 5%.
It's impossible to have 10-20% gains in average for ST when the clockspeed is nearly down by 10% and Intel themselves claim only 14% per clock.

Arrowlake does not have magic over Lunarlake that gives it additional 10%. Granite Rapids may do over regular RWC, but only thing about Arrowlake is that it goes from 2.5MB to 3MB L2, which will have less than 1% gain in average.

There are two reason I can see why they are expanding the L2 cache, and it applies for Willow Cove as well.

1) It avoids the high latency L3 cache, reduces cases where it can have a big impact.
2) It is more power efficient.

One of the reasons Sandy Bridge was so good is the Ring Bus' simplicity allowed it to have real low latency access to the L3 cache. Since then due to the crazy focus on clocks they lost it.
 
Reactions: Racan and inf64

cebri1

Senior member
Jun 13, 2019
266
261
136
That one I'm not so sure. Again, I'm shooting in the dark here, but I think X3D parts have significantly more performance in gaming. Not single, but double digit increase i presume. Just guessing.
On average, it's usually around 10% over regular Zen. Some games really benefit from the added cache, , others don't notice it at all, others notice it until the cache is filled (e.g. Factorio),etc.
 
Reactions: SiliconFly

SiliconFly

Golden Member
Mar 10, 2023
1,470
832
96
It's impossible to have 10-20% gains in average for ST when the clockspeed is nearly down by 10% and Intel themselves claim only 14% per clock.

Arrowlake does not have magic over Lunarlake that gives it additional 10%. Granite Rapids may do over regular RWC, but only thing about Arrowlake is that it goes from 2.5MB to 3MB L2, which will have less than 1% gain in average.
Clock speed is down by exactly 5% when compare to 14900K. 6->5.7 thats only -5%. That still works out to, 14 + 1 - 5 = 10% for the equivalent top-end ARL sku.

But I'm expecting ARL's LNC to be a bit different from LNL's LNC considering all the leaks and considering all the Jim Keller's early work is at play here (industry standard tools, modular cores, agnostic design, etc). It's not just feasible, it's very much possible considering the sweet time they're taking.
 

Det0x

Golden Member
Sep 11, 2014
1,231
3,877
136
On average, it's usually around 10% over regular Zen. Some games really benefit from the added cache, , others don't notice it at all, others notice it until the cache is filled (e.g. Factorio),etc.
So Z4X3D with its ~700mhz clock handicap is ~10% faster than vanilla Z4, correct ?
(~5000mhz vs ~5700mhz)

Now comes the kicker, dont expect Z5X3D to run at the same clockspeed as Z4X3D
 
Reactions: Racan and inf64

cebri1

Senior member
Jun 13, 2019
266
261
136
So Z4X3D with its ~700mhz clock handicap is ~10% faster than vanilla Z4, correct ?
(~5000mhz vs ~5700mhz)

Now comes the kicker, dont expect Z5X3D to run at the same clockspeed as Z4X3D
7700X is clocked at 5400mhz not at 5700mhz. The 9700X boosts up to 5500mhz so maybe a 100mhz bump but not much more.
 
Reactions: SiliconFly

Det0x

Golden Member
Sep 11, 2014
1,231
3,877
136
7700X is clocked at 5400mhz not at 5700mhz. The 9700X boosts up to 5500mhz so maybe a 100mhz bump but not much more.
Ukai, i were comparing clockspeed-difference between fastest SKUs (7800X3D vs 7950X)

Anyway in HUB latest comparison (3 months old) the 7800X3D is 24% faster than the 7700X, while having a 350mhz clockspeed difference (5050mhz vs 5400)
The clockspeed delta between Z5X3D and 9700X will not we the same as with the 7000 series...


This is the real competition for Arrow Lake, not vanilla Z5
 
Last edited:

cebri1

Senior member
Jun 13, 2019
266
261
136
Ukai, i were comparing clockspeed-difference between fastest SKUs (7800X3D vs 7950X)

Anyway in HUB latest comparison (3 months old) the 7800X3D is 24% faster than the 7700X, while having a 350mhz clockspeed difference (5050mhz vs 5400)
Z5X3D will not run 350mhz slower than 9700X
View attachment 104708

This is the real competition for Arrow Lake, not vanilla Z5



Gaming averages at 1080p. Between 10-15%, depending on the benchmarks used.
 
Reactions: SiliconFly
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |