Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	8 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (20A)	Arrow Lake (N3B)	Arrow Lake Refresh (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop Only	Desktop & Mobile H&HX	Desktop Only	Mobile U Only	Mobile H
Process Node	Intel 4	Intel 20A	TSMC N3B	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Q1 2025 ?	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2025 ?	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	6P + 8E ?	8P + 16E	8P + 32E	4P + 4E	4P + 8E
LLC	24 MB	24 MB ?	36 MB ?	?	8 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

lightisgood · Jun 4, 2024

invisible_city said:
Lunar Lake E Cores are now able to talk to each other through their L1 cache, which should dramatically improve core - core latency: https://hothardware.com/reviews/intel-lunar-lake-deep-dive?page=3

Meaning we avoid this in Arrow Lake: https://www.anandtech.com/show/1704...hybrid-performance-brings-hybrid-complexity/6

Previously core communication required trip through ring bus, or in case of LP cores, Meteor Lake’s LP Scalable Fabric. See also https://chipsandcheese.com/2024/05/13/meteor-lakes-e-cores-crestmont-makes-incremental-progress/

Damned good design changes

I remember that this L1$-to-L1$ link was adopted for C2D (Merom) in 2006...
I had been thinking that Alder Lake, the 1st gen x86 hybrid, is very primitive design.
So, I was correct.

TESKATLIPOKA · Jun 5, 2024

I kinda don't understand why Lunar has 4P+4E cores instead of 2+8 for example.

FlameTail · Jun 5, 2024

Any estimates/guesses for Geekbench 6 ST score of Lunar Lake/Arrow Lake mobile?

tamz_msc · Jun 5, 2024

TESKATLIPOKA said:
I kinda don't understand why Lunar has 4P+4E cores instead of 2+8 for example.

Because customers don't run Cinebench on an ultrabook.

ondma · Jun 5, 2024

dullard said:
That isn't the goal with the P cores, and likely never will be. The P cores are to have a single task done ASAP at the cost of high power. Move to a new architecture, or process and the goal is still the same: complete a single task ASAP at the cost of high power. The P cores are for when you want something very responsive and fluid. But, you can't have large numbers of cores all doing tasks at the cost of high power. There is no free lunch. With 8 P cores, running at 125 W, each gets ~15.6 W. Those P cores can clock a lot faster than 16 P cores each with only ~7.8 W. No matter the architecture or process, when you split your power budget up amongst more and more cores, each core gets less and less to work with.

The E cores are designed to be the workhorses that you can spam in large numbers to do grunt work. The real issue was when the P/E core was first released, the E cores were clocked too high and there were too few of them. The result was that the first E cores were neither that efficient nor that good at grunt work. So, people got the whole idea of P and E cores backwards in their mind thinking that P cores were for the grunt work. You have to switch your mindset. You want more E cores for more work done.

I mean, you just gave a textbook justification of hybrid architecture. You didnt really address the point of my post though. Sorry to keep bringing up AMD in an Intel thread, but they are able to put 16 big cores into a chip and still have excellent performance and power consumption. I guess what I am trying to say, is that Lion Cove still seems behind in performance and/or power consumption, or they would not have to bother with the E cores. It is also disappointing that Lunar Lake and the most performant Arrow Lake are on a TSMC node. What happened to process leadership? I though 20A was supposed to bring leadership. Are we depending on 18A now? And if it is simply a matter of supply, I dont consider a process leading edge if it cant provide sufficient wafers with adequate yields to satisfy production demands.

poke01 · Jun 5, 2024

talking about Lunar Lake while wearing a Apple shirt, love the irony.
Can't wait for the deepdive from them.

Intel has implemented simliar power management to M1, these are the best chips to come out in a long time from Intel.

TwistedAndy · Jun 5, 2024

TESKATLIPOKA said:
I kinda don't understand why Lunar has 4P+4E cores instead of 2+8 for example.

That was made to achieve better efficiency. In Lunar Lake, E-cores are always active. Having more E cores will increase the idle power consumption. Intel is planning to turn off the whole P-cluster when it's not used.

FlameTail · Jun 5, 2024

8 PCIe lanes for the dGPU, 12 PCIe lanes for the SSDs... fine. But what'a that other 8 PCIe lanes for?

DrMrLordX · Jun 5, 2024

TwistedAndy said:
In Lunar Lake, E-cores are always active.

Is there some reason for that?

TESKATLIPOKA · Jun 5, 2024

tamz_msc said:
Because customers don't run Cinebench on an ultrabook.

Thanks, you didn't disappoint with your "useful" reply as always.

So once more, why did they choose 4+4 config instead of 2+8 for example, which would be comparable in size If not a bit smaller.

Lion Cove is for max ST performance and responsiveness, so It's understandable, to use them, but why 4, when this is intended for ultrabooks with a limited TDP?
Skymont cluster offers better perf/W than a Lion Cove cluster and is also a lot smaller, 2 of them would provide significantly higher performance than a single Lion Cove cluster.

TwistedAndy said:
That was made to achieve better efficiency. In Lunar Lake, E-cores are always active. Having more E cores will increase the idle power consumption. Intel is planning to turn off the whole P-cluster when it's not used.

Why can't there be 3 clusters? One with 2 P-cores and 2 clusters with 4 E-cores each?
And Intel could keep active only a single E-core cluster.

Joe NYC · Jun 5, 2024

TESKATLIPOKA said:
Intel Lunar Lake Technical Deep Dive - So many Revolutions in One Chip

E-core looks great on paper.

Even a CPU with only Skymont cores would be strong.

P.S. I am kinda more excited about Lunar Lake than Strix.

You are excited about lower clock speeds and lower IPC?

Seems like you are excited about Zen 3 in era of Zen 5...

Joe NYC · Jun 5, 2024

eek2121 said:
I am expecting a good 15-20% total single core uplift (ipc + clocks) over Raptor Lake. Multicore is going to come down to what process gets used due to power limits. The more power efficient the process is, the better the performance. We could see Intel lead AMD by a substantial amount here, but they are also (allegedly) pulling back power limits to be similar to AMD’s limits, so who knows?

Raptor Lake goes up to 6.2 GHz (or 6.0 GHz). Do you expect +1% to +6% clock speed increase?

It 5.7 GHz is the clock speed of Arrow Lake, then it is -5% to -9% clock speed regression.

mikk · Jun 5, 2024

Here is a graphics/power comparison between LNL and MTL capped at 60 fps. Reminder that on LNL the on package RAM is included in the package power (roughly 2W), it's not even a fair comparison. It's MTL-H there (Arc graphics)

TESKATLIPOKA · Jun 5, 2024

Joe NYC said:
You are excited about lower clock speeds and lower IPC?

Seems like you are excited about Zen 3 in era of Zen 5...

Both types of cores have higher IPC than the predecessors and LNL is a low TDP SoC, so boost clockspeed doesn't necessarily need to be much lower than MTL-U(5GHz/3.8GHz) or RPL-U(5.2GHz/3.9GHz) for either core.
Not sure about sustained clocks during full load, we will have to wait for reviews.

It will be very interesting to limit MTL-U, LNL, PHX(2) and Strix to 15W-30W and see how It performs in CB.

DavidC1 · Jun 5, 2024

invisible_city said:
Also Intel stated that at iso power Lion Cove in Lunar Lame is up to an 18% performance uplift, not 14. Just depends on where you sit on the power curve.

Something most are missing is theyre describing 14% uplift in the Lunar Lake iteration, not in all implementations.

That has nothing to do with perf/clock.

The curve has shifted likely due to design/process change which benefits lower power.

Ghostsonplanets said:
Another tidbit from Chips n' Cheese:

So we're looking at possibilities of, on Arrow Lake DT:
- Bigger cache
- Return of HT
- L1 to L2 bandwidth to 110B per cycle

Intel new modern sea of cells design really allow for finer grained changes that fit different markers. Quite interesting.

Granite Rapids according to Pat: ten-plus % changes in the core

TwistedAndy · Jun 5, 2024

TESKATLIPOKA said:
Why can't there be 3 clusters? One with 2 P-cores and 2 clusters with 4 E-cores each?
And Intel could keep active only a single E-core cluster.

Intel probably decided that there was no sense in having three independent clusters because of the power and memory latency issues. Intel had to introduce a separate 8MB side cache to make the current approach with two independent clusters work.

DavidC1 · Jun 5, 2024

TESKATLIPOKA said:
Thanks, you didn't disappoint with your "useful" reply as always.

So once more, why did they choose 4+4 config instead of 2+8 for example, which would be comparable in size If not a bit smaller.

4+4 might be better for Lunarlake being focused as a low power(I mean for battery life, not TDP).

Skymont even at lower clocks is high enough performance to cover most performance needs, and two cores is little bit small nowadays so they bumped it up to 4.

2x P cores again is under the core requirements so for applications that require higher responsiveness and lightly threaded 4 is a good number.

This is just a guess, there might be technical reasons to do so, but Apple also does something similar.

FlameTail · Jun 5, 2024

mikk said:
Reminder that on LNL the on package RAM is included in the package power (roughly 2W), it's not even a fair comparison.

I suppose that scales with RAM capacity?

16 GB = 1W
32 GB = 2W
64 GB = 4W
128 GB = 8W
256 GB = 16 W
256 TB = 16384W

mikk · Jun 5, 2024

FlameTail said:
I suppose that scales with RAM capacity?

16 GB = 1W
32 GB = 2W
64 GB = 4W
128 GB = 8W
256 GB = 16 W
256 TB = 16384W

No idea how it scales but Intel thinks it's 2W and that's why LNL uses 17W and 30W TDP instead of the usual 15W and 28W.

DavidC1 · Jun 5, 2024

FlameTail said:
I suppose that scales with RAM capacity?

16 GB = 1W
32 GB = 2W
64 GB = 4W
128 GB = 8W
256 GB = 16 W
256 TB = 16384W

No. Capacity has little to do with power because only the ones being actively accessed needed to be active, since it's not compute.

mikk · Jun 5, 2024

Something about PTL and the future royal core:

https://www.reddit.com/r/intel/comments/1d88psj/some_funny_stories_regarding_the_current_pe_core

https://twitter.com/x/status/1798089713623970079

DavidC1 · Jun 5, 2024

He's basically saying what @Exist50 has said.

P core design is in shambles, in addition to the E core team being excellent.

Third: @adroc_thurston doesn't really have sources. I was waiting and waiting to see what he says is true.

adroc_thurston said:
CWF is 18A so that's even better, possibly.
Either way, the thing is basically Z4c with worse SIMD.

Cope. Again, and again. Can you at least admit you are wrong once in a while? Or at least don't be like AI and pretend everything you say is written in stone?

FlameTail · Jun 5, 2024

Intel's P-cores are clearly excessively bloated.

Where's the Lunar Lake die shot? I want to compare Lion Cove and Apple M3-P core die area.

@poke01 you said you would make a Lunar Lake vs M3 thread sometime?

coercitiv · Jun 5, 2024

TESKATLIPOKA said:
Thanks, you didn't disappoint with your "useful" reply as always.

So once more, why did they choose 4+4 config instead of 2+8 for example, which would be comparable in size If not a bit smaller.

Lion Cove is for max ST performance and responsiveness, so It's understandable, to use them, but why 4, when this is intended for ultrabooks with a limited TDP?
Skymont cluster offers better perf/W than a Lion Cove cluster and is also a lot smaller, 2 of them would provide significantly higher performance than a single Lion Cove cluster.

His reply may have seemed cryptic because you're less focused on the needs of the users who will be buying this product. Workloads will be relatively lightly threaded and rather latency sensitive, 4P cores will make the device look snappy, more cores overall will only help in isolated cases. (in fact most of them see a "real" MT workload when they boot or when they make OS updates)

Browsing and apps built on chromium will probably make up quite a good chunk of the user scenarios. Modern browsers can scale to 6+ cores, but what is more important for browser speed is ST performance of the cores being used. This is in stark contrast with Cinebench, where available throughput is all that matters, because software scaling is... well... embarrassing

For the upper range of TDP covered by LNL it would be nice if it came with something like 4+8 (my favorite would still be 6+4, with a better P core), but the NPU stole the rest of the pizza, sorry.

DavidC1 · Jun 5, 2024

coercitiv said:
His reply may have seemed cryptic because you're less focused on the needs of the users who will be buying this product. Workloads will be relatively lightly threaded and rather latency sensitive, 4P cores will make the device look snappy, more cores overall will only help in isolated cases. (in fact most of them see a "real" MT workload when they boot or when they make OS updates)

Browsing and apps built on chromium will probably make up quite a good chunk of the user scenarios. Modern browsers can scale to 6+ cores, but what is more important for browser speed is ST performance of the cores being used. This is in stark contrast with Cinebench, where available throughput is all that matters, because software scaling is... well... embarrassing

For the upper range of TDP covered by LNL it would be nice if it came with something like 4+8 (my favorite would still be 6+4, with a better P core), but the NPU stole the rest of the pizza, sorry.

View attachment 100557

Let's think of that die shot.
-Take out the P cores
-Take out the NPUs

There's probably enough room left to put a 20 Xe core monster in there. So much for "AI revolution". 20 Xe cores 320 EUs in old Intel terminology. Skymont is more than fast enough to feed such a GPU.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Senior member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Golden Member

Member

Diamond Member

Lifer

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Senior member

Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member