Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	12 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop & Mobile H&HX	Mobile U Only	Mobile H
Process Node	Intel 4	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	8P + 16E	4P + 4E	4P + 8E
LLC	24 MB	36 MB ?	12 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

AcrosTinus · Sep 29, 2024

Hitman928 said:
You could very well be right, but with the latest Windows patches and bios update, Zen 5 very nearly caught up to RPL. For ARL, gaming loads are very branchy with larger data sets that spill out of typical CPU cache sizes regularly which puts a lot of pressure on the memory to provide the data with as low of latency as possible. From LNL testing, the BPU isn’t really improved over previous generations and early ARL leaks show a regression in memory latency. That’s why I’m not really sure ARL will be an improvement over RPL and be able to have a convincing lead over Zen 5. Again, this is just all speculation based upon what I’ve seen so far, but I could very well be wrong.

If Intel launches a bigger and more advanced core on a leading node to just loose against their brute force raptor which is unleashed so far it even bites itself, then crazy people are leading the company.

I think that will not be the case, no matter how many people hated the power hungry Rocketship with its Cypress Cove cores, it was faster than the 10Core Skylake part in gaming. So, I'm optimistic that the new cache hierarchy in the cores as well as new arch compensate for the chiplet approach, in 3W we will see.

CouncilorIrissa · Sep 29, 2024

AcrosTinus said:
That is solidly behind RaptorLake (in gaming) and Intel has never released a CPU generation that regressed in gaming

Rocket Lake.

Besides, you can't apply past trends to this case because Intel has never released a non-monolithic DT chip. So the past isn't necessarily indicative of what will happen in the future.

I'm not saying that ARL will be slower than RPL in gaming, but it's not out of the question.

MarkPost · Sep 29, 2024

cebri1 said:
Latest chart from Videocardz

View attachment 108409

Some comparison with that chart:

The first 285K entry (stock?) is with DDR5 @6400 (3450 / 22997): https://browser.geekbench.com/v6/cpu/7425530
The second 285K entry (stock?) is with DDR5 @6400 too (3449 / 23024): https://browser.geekbench.com/v6/cpu/7425534

This is my 9950X (stock) with DDR5 @6400 (3512 / 23571): https://browser.geekbench.com/v6/cpu/8043533

The third 285K entry (stock?) is with DDR5 @7200 (3420 / 23376): https://browser.geekbench.com/v6/cpu/7609759

This is my 9950X (stock) with DDR5 @7200 (3523 / 23373): https://browser.geekbench.com/v6/cpu/8043324

The fourth 285k entry (stock?) is with DDR5 @5600 (3329 / 21786): https://browser.geekbench.com/v6/cpu/8003145

The only 9950X entry (stock) is with DDR5 @5600 (3411 / 21686).

This is my 9950X (stock) with DDR5 @5600 (3476 / 21716): https://browser.geekbench.com/v6/cpu/8042219

Hulk · Sep 29, 2024

I started making a somewhat in-depth investigation of Geekbench 6. I'm not going to post anything right now as I want to verify what I'm thinking but here is something I am confident in writing.

Unlike CB and Blender, which as we know are very well ("ridiculously") multithreaded, Geekbench works more like most actual applications in that it scales less and less as you throw more cores at it. This isn't a good or bad thing, it's just different. It won't provide a max compute available when you throw a million cores at it, but it will also provide a more realistic result of what you can expect from most software as you increase core count. Both types of benches are important to get an accurate representation of a CPU.

So, this explains the good (if true) MT ARL leak scores in Geekbench 6. Since GK6 doesn't scale linearly with core count it provides somewhat of an advantage to lower core count/higher clocked/higher IPC cpu's.

In other words, those 8 Lion Cove cores are most likely producing much of that GK6 score.

The score for my 5.5/4.4 14900K is ~18,000 with or without HT on, which supports my thoughts above. By the time you get from 24 to 32 cores those extra 8 logical cores are basically useless for the test in GB6.

14900K, 16 E cores at 4.4 and 1 P at 800. Score is 11,700.

ST GB6 scores do seem to scale with frequency as expected and at this early testing stage I'm seeing the throughput (IPC specific to this application) of the E cores at being about 65% of the P cores. As with CB, when you clock the P's really low and the E's really high the benchmark uses the faster/better performing core for the ST result. This is better than the 52% for the same comparision of E's to P's in ST CB R23. I forget do the E's have better FP or Int vs. the P's in Raptor Lake?

Getting more data and will try to get up some graphs and figure what is going on in there.

MarkPost · Sep 29, 2024

Hitman928 said:
Technically there are three: photo library, background blur, and object detection. However, someone on this forum already tested with AVX512 enabled/disabled on Zen4 and found no difference in score between them, so it seems that the AVX2 code path is just as fast for Zen as the AVX512 path. We just had this discussion for like the third time a couple of weeks ago.

I think people are conflating the AVX512 and Apple’s AMX score differences as AMX does provide like a 5% score boost for Apple versus nothing with AVX512 for Zen.

yeah no difference with and without AVX512:

9950X DDR5 6400 with AVX512 enabled in BIOS ((3512 / 23571): https://browser.geekbench.com/v6/cpu/8043533

9950X DDR5 6400 with AVX512 disabled in BIOS (3510 / 23613): https://browser.geekbench.com/v6/cpu/8044153

Hitman928 · Sep 29, 2024

MarkPost said:
yeah no difference with and without AVX512:

9950X DDR5 6400 with AVX512 enabled in BIOS ((3512 / 23571): https://browser.geekbench.com/v6/cpu/8043533
View attachment 108413

9950X DDR5 6400 with AVX512 disabled in BIOS (3510 / 23613): https://browser.geekbench.com/v6/cpu/8044153
View attachment 108414

Thanks for the confirmation.

Hitman928 · Sep 29, 2024

Hulk said:
I started making a somewhat in-depth investigation of Geekbench 6. I'm not going to post anything right now as I want to verify what I'm thinking but here is something I am confident in writing.

Unlike CB and Blender, which as we know are very well ("ridiculously") multithreaded, Geekbench works more like most actual applications in that it scales less and less as you throw more cores at it. This isn't a good or bad thing, it's just different. It won't provide a max compute available when you throw a million cores at it, but it will also provide a more realistic result of what you can expect from most software as you increase core count. Both types of benches are important to get an accurate representation of a CPU.

So, this explains the good (if true) MT ARL leak scores in Geekbench 6. Since GK6 doesn't scale linearly with core count it provides somewhat of an advantage to lower core count/higher clocked/higher IPC cpu's.

In other words, those 8 Lion Cove cores are most likely producing much of that GK6 score.

The score for my 5.5/4.4 14900K is ~18,000 with or without HT on, which supports my thoughts above. By the time you get from 24 to 32 cores those extra 8 logical cores are basically useless for the test in GB6.

14900K, 16 E cores at 4.4 and 1 P at 800. Score is 11,700.

ST GB6 scores do seem to scale with frequency as expected and at this early testing stage I'm seeing the throughput (IPC specific to this application) of the E cores at being about 65% of the P cores. As with CB, when you clock the P's really low and the E's really high the benchmark uses the faster/better performing core for the ST result. This is better than the 52% for the same comparision of E's to P's in ST CB R23. I forget do the E's have better FP or Int vs. the P's in Raptor Lake?

Getting more data and will try to get up some graphs and figure what is going on in there.

Yes, you can look at the GB6 thread, we discussed this back when it was first released. There’s really only one sub test that scales to many cores and overall, the MT score scaling plateaus well below modern top end CPUs. This was a major change from GB5 where the MT score would scale well to large core count processors.

naukkis · Sep 29, 2024

Hitman928 said:
Yes, you can look at the GB6 thread, we discussed this back when it was first released. There’s really only one sub test that scales to many cores and overall, the MT score scaling plateaus well below modern top end CPUs. This was a major change from GB5 where the MT score would scale well to large core count processors.

Modern desktop CPU's have more cores than most of programs can use. There's always beneficial to have faster cores - adding cores or trading them to slower ones to have even more of them ain't beneficial for most use cases. People buy those cpus with more cores because they think they are offering better user experience. GB6 is pretty much only benchmark that gives more realistic MT results to use to find best cpu for normal desktop usage.

Wolverine2349 · Sep 29, 2024

CouncilorIrissa said:
Rocket Lake.
View attachment 108411

Besides, you can't apply past trends to this case because Intel has never released a non-monolithic DT chip. So the past isn't necessarily indicative of what will happen in the future.

I'm not saying that ARL will be slower than RPL in gaming, but it's not out of the question.

Isn't Intel still releasing a ring bus architecture monolithic 8 +16 die on a ring bus just on a much improved advanced TSMC 3nm node?? Or is it truly non monolithic.

511 · Sep 29, 2024

Wolverine2349 said:
Isn't Intel still releasing a ring bus architecture monolithic 8 +16 die on a ring bus just on a much improved advanced TSMC 3nm node?? Or is it truly non monolithic.

Same as meteor lake

Wolverine2349 · Sep 29, 2024

511 said:
Same as meteor lake

Isn't Meteor Lake monolithic die just on advanced TSMC node? Except for the separate LPE island core which is on a separate die?

DrMrLordX · Sep 29, 2024

Hulk said:
Unlike CB and Blender, which as we know are very well ("ridiculously") multithreaded, Geekbench works more like most actual applications in that it scales less and less as you throw more cores at it. This isn't a good or bad thing, it's just different. It won't provide a max compute available when you throw a million cores at it, but it will also provide a more realistic result of what you can expect from most software as you increase core count. Both types of benches are important to get an accurate representation of a CPU.

People should doubt whether GB6 (or really any version of GeekBench) represents "real world" performance.

Hitman928 said:
Yes, you can look at the GB6 thread, we discussed this back when it was first released. There’s really only one sub test that scales to many cores and overall, the MT score scaling plateaus well below modern top end CPUs. This was a major change from GB5 where the MT score would scale well to large core count processors.

Even GB5 and earlier struggled to utilize all of a CPU's resources well. Even when people were seeing high levels of CPU utilization across all cores, the power consumption during GB5 MT was always well below many other MT benchmarks. GB6 is just worse about it.

naukkis · Sep 29, 2024

Wolverine2349 said:
Isn't Meteor Lake monolithic die just on advanced TSMC node? Except for the separate LPE island core which is on a separate die?

No. It's tile based and cpu tile is produced on Intel 4 process, rest of tiles are TSMC mature processes. Memory controller is on it's own tile, which means that it will have a bit more latency than monolithic designs where memory contoller sits in same silicon as cpu cores.

Wolverine2349 · Sep 29, 2024

naukkis said:
No. It's tile based and cpu tile is produced on Intel 4 process, rest of tiles are TSMC mature processes. Memory controller is on it's own tile, which means that it will have a bit more latency than monolithic designs where memory contoller sits in same silicon as cpu cores.

Though how much more latency and won't Intel use a far superior interconnect than AMD does to make it near as good as RPL?

SiliconFly · Sep 29, 2024

Wolverine2349 said:
Though how much more latency and won't Intel use a far superior interconnect than AMD does to make it near as good as RPL?

The fact that there is an interconnect itself introduces latency. Although, the newer fabric minimizes the latency over the previous one (how much exactly it does idk).

Josh128 · Sep 29, 2024

I wouldnt get too excited about gaming prospects just yet, that recent tweet isnt very inspiring.

"ARL suck in games"

Nothingness · Sep 29, 2024

DrMrLordX said:
People should doubt whether GB6 (or really any version of GeekBench) represents "real world" performance.

Even GB5 and earlier struggled to utilize all of a CPU's resources well. Even when people were seeing high levels of CPU utilization across all cores, the power consumption during GB5 MT was always well below many other MT benchmarks. GB6 is just worse about it.

It’s well known most ‘real world’ applications people use scale well with the number of cores

SiliconFly · Sep 29, 2024

Josh128 said:
I wouldnt get too excited about gaming prospects just yet, that recent tweet isnt very inspiring.

"ARL suck in gaming"

Totally. Only Zen 5% is good at gaming.

511 · Sep 29, 2024

GB6 is the worst MT Benchmark
either programs is single threaded
They will use only 4-8 cores( Games Being prime example)
Or heavy MT scales nicely with the cores
Geekbench multi doesn't fit into last two

naukkis · Sep 29, 2024

511 said:
GB6 is the worst MT Benchmark
either programs is single threaded
They will use only 4-8 cores( Games Being prime example)
Or heavy MT scales nicely with the cores
Geekbench multi doesn't fit into last two

Almost every program today is multithreaded. Every program scale to as many threads as far programmers ability goes - limited by Amdahl's law. So when making benchmark that tries to simulate that result is something like GB6. Single thread speed ain't most important today, most commonly used workloads scale to multiple threads but not infinite threads. GB6 MT result reflects also gaming performance much better than something with unlimited scaling like CB or spec nt.

What GB6 fails to show is to make badly performing cpu with many cores to have good result. That seems to be a great problem for many.

poke01 · Sep 29, 2024

Thanks for @Hitman928 and @MarkPost for clarifying about AVX-512 in GB6.

DavidC1 · Sep 29, 2024

Wolverine2349 said:
Though how much more latency and won't Intel use a far superior interconnect than AMD does to make it near as good as RPL?

We won't know for sure until Arrowlake arrives and gets thoroughly tested. These things are subject to execution abilities of the team. Why was Nehalem able to beat Athlon X2's memory performance despite both having the integrated memory controller? Simply it had a better controller.

Intel not only had a process lead, but had absolute leadership in cache density and performance as well. They also adopted the latest memory standards the quickest.

Now they don't have that anymore. That's what brain drain results in. You don't see the effect right away, but over longer periods of time you do.

DavidC1 · Sep 29, 2024

AMDK11 said:
I'm sorry, but what you wrote in response is gibberish. I asked what new ideas/solutions Mx and Zen5 bring and you told me about energy efficiency and a better predictor. These are completely new solutions and ideas . Massacre.

Ironic that you accuse me of being biased and writing nonsense, when you are missing the obvious, which is their P core design is step by step losing fundamental advantages over what was a puny Atom team. Such a big core should not be regressing in branch MKPI metrics to the predecessor 3 years prior! Even David Huang says that such big differences(where E core beats P core in MKPI) should not be happening.

Apple Mx comparisons are irrelevant, because Apple is far, far ahead of what Intel can do. IF they catch up to Apple, then we'll talk, but this argument has no merit.

Zen 5 brings the new branch predictor and clustered decode, which of is a big departure and new ideas compared to Zen 4. Lion Cove does nothing new compared to Golden Cove, and Golden Cove does nothing new compared to Sunny Cove, and on and on all the way until Skylake.

Zen 5's new ideas allow them to have new knobs to adjust for optimization.

You yourself is biased towards Lion Cove being somehow an excellent core, when it can barely be at Zen 5 levels a process generation ahead and being substantially larger, while not clocking higher either. This is despite Intel claiming they took out Hyperthreading saying that it'll increase PPA greatly. This is an embarassment.

-Loss of HT: AMD win
-Larger die: AMD win
-New process: AMD win
-Similar clocks: Tie

While a future "P core" might do better, I wouldn't be surprised if that's because like the rumors, the E core team takes over being the top dog.

OneEng2 · Sep 29, 2024

NTMBK said:
AMD is making huge inroads in the high margin datacenter market, even though they're not really getting anywhere in laptops. (Since they make much higher profits for the same number of wafers, it makes sense that they prioritise that market.)

Agree, but they are also making inroads into laptops I believe. You are correct that they would be wise to focus on the high margin datacenter market as the first priority.

poke01 said:
Keep in mind 9950X Geekbench score is boosted by AVX-512.

Arrow lake should have better ST performance

Moving forward, it seems like it would be a good assumption that more AVX512 will be supported (once Intel has it again). What is interesting to me is how power hungry it is, and how Intel abandoned it on the desktop after introducing it themselves.

AcrosTinus said:
Vanilla Zen5 is essentially Zen4 with a 5% gain. That is solidly behind RaptorLake (in gaming) and Intel has never released a CPU generation that regressed in gaming, so it is pretty clear that ArrowLake will beat Zen5 without 3D cache and my crystal ball says lose by 5% to the 3D Cache versions and equal with CUDIMM.

For games that thrive on cache (most of them), Zen 5 X3D will likely provide a very comfortable lead for AMD (not 5%. You are dreaming). I would bet quadruple that. It is a pretty interesting way of gaining a performance edge. I am sure we will see Intel adopt this idea just as Intel adopted chiplets.

My thoughts on the LNL and ARL launches:

Pretty good showing for Intel in general. In the desktop and laptop market, cost of goods count though. Intel is indeed besting AMD with its latest release; however, it is doing so by paying more for the chips. Another thing that is interesting is that Intel has a more dense process node (N3B) than does AMD (N4P). Intel also gains by having a significant power and transistor budget (which they pay for by using the more advanced process node). Of course, AMD will now throw a spoiler into the mix with its early release of X3D. IMO, this shows that AMD understands they lost the hand this time around as they are now bringing forward plans vs waiting and maximizing profits. FYI, this is a good thing. Competition is good for the consumer. I think that AMD may well be able to price Intel into a hurting position on their LNL and ARL chips. We will have to see though.

In ARL, Intel will again use their "big/little" approach I understand. This will likely keep them ahead in multi-threaded apps. Keep in mind that in AMD's SMT, that 2nd thread is really only like 25% of a core vs an actual full core. I don't suspect AMD is going to fix this issue until Zen 6 when the CCD will be increased from 8 to 16. A 32 full core Zen6 with SMT will indeed be a generational jump compared to Zen5 and ARL/LNL. Of course the big question is if the Intel 18A parts will be out in time or not .... and if Intel can produce them at a good yield and at a low enough price to compete with AMD on a TSMC process node.

Now, for the datacenter chips, the extra cost for the best process makes huge sense. Pay for the best because you are easily going to make it up in margin. I still hold the belief

AMDK11 · Sep 29, 2024

DavidC1 said:
Lion Cove does nothing new compared to Golden Cove, and Golden Cove does nothing new compared to Sunny Cove, and on and on all the way until Skylake.

"Out-of-Order Execution Engine

One of those sweeping changes applies to the schedulers, which have been reorganized with a view towards scalability. Since the Pentium Pro from 1995, Intel has served both integer and FP/vector operations with a unified scheduler. Scaling a large unified scheduler can be difficult, so Intel split the scheduler over time. Skylake put memory address generation ops on a separate scheduler. Sunny Cove split the memory scheduler, and Golden Cove revised the memory scheduler split.

Lion Cove finally splits the unified math scheduler into separate ones for integer and floating point/vector ops. Intel also split register renaming for floating point and integer operations. That’s not visible from software, but it does suggest Intel’s core is now laid out a lot like AMD’s Zen. Both do register renaming separately for integer and vector operations, and use separate schedulers for those operations."

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Senior member

Golden Member

Senior member

Lifer

Senior member

Senior member

Golden Member

Senior member

Diamond Member

Golden Member

Golden Member

Senior member

Platinum Member

Golden Member

Golden Member

Senior member

Senior member

"Out-of-Order Execution Engine​

"Out-of-Order Execution Engine