Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 23 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
695
601
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (20A)Arrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4Intel 20ATSMC N3BTSMC N3BIntel 18A
DateQ4 2023Q1 2025 ?Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P6P + 8E ?8P + 16E4P + 4E4P + 8E
LLC24 MB24 MB ?36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,000
  • LNL.png
    881.8 KB · Views: 25,481
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,295
1,368
106
Heard that you start seeing diminishing returns as you continue increasing core width. Massively increasing the front end is great, but wonder when they will start working on the back end again, considering that even golden cove did not change that aspect of the design much. But to be fair, I also heard that it is not often that the back end is a huge bottleneck in the core.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
Yeah same. I don't think XTU has a one-click feature yet to rein in power use.
you would think intel could implement such a feature and make xtu as pretty as ryzen master. Xtu doesn't roll off the tongue as nice as ryzen master either. Intel is stuck in the 2000s with naming conventions.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
Heard that you start seeing diminishing returns as you continue increasing core width. Massively increasing the front end is great, but wonder when they will start working on the back end again, considering that even golden cove did not change that aspect of the design much. But to be fair, I also heard that it is not often that the back end is a huge bottleneck in the core.
interestingly amd is due to increase width in zen 5. you paint a very good question. Afaik Lunar lake utilizes the same big and small cores but I don't remember if lunar lake is a mobile platform. I forget if youve posted about it but any info on panther and darkmont?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
6 to 8 is substantial. Golden Cove went from 1 complex + 4 simple to 6 simple decoders. Guess it makes sense since Meteorlake should be the big gain but it isn't.

I think they might be targetting 30% gains here.

@Geddagod Yea they know this so decoder width comes with increasing things elsewhere.

Pentium MMX - 1 complex + 1 simple
Pentium Pro/II - 1 complex + 2 simple
Core 2 - 1 complex + 3 simple
Sunny Cove(Icelake/Rocketlake) - 1 complex + 4 simple
Golden Cove - 6 simple

If you look at it that way, we gained better than decoder width improvements. Way better. Also Sunny Cove has 25% more decoders for 18% more performance. Golden Cove has 20% more decoders for 19% more performance.

Also it proves that our @Exist50 is a reliable source. He was one of the first to say Lion Cove is not Royal Core.
 
Last edited:

Henry swagger

Senior member
Feb 9, 2022
494
300
106
6 to 8 is substantial. Golden Cove went from 1 complex + 4 simple to 6 simple decoders. Guess it makes sense since Meteorlake should be the big gain but it isn't.

I think they might be targetting 30% gains here.

@Geddagod Yea they know this so decoder width comes with increasing things elsewhere.

Pentium MMX - 1 complex + 1 simple
Pentium Pro/II - 1 complex + 2 simple
Core 2 - 1 complex + 3 simple
Sunny Cove(Icelake/Rocketlake) - 1 complex + 4 simple
Golden Cove - 6 simple

If you look at it that way, we gained better than decoder width improvements. Way better. Also Sunny Cove has 25% more decoders for 18% more performance. Golden Cove has 20% more decoders for 19% more performance.

Also it proves that our @Exist50 is a reliable source. He was one of the first to say Lion Cove is not Royal Core.
How many decoders does zen 4 have ?
 

mikk

Diamond Member
May 15, 2012
4,234
2,292
136
Meteor Lake seems to be a typical tick release from a performance standpoint, this isn't something new. The big deal will be Arrow Lake. In mobile we might see bigger gains from MTL because of the power efficiency importance...if they can add a separate voltage rail and Intel 4 should help. The jump from 6 to 8 on LNC core is a major change, this is a first sign we will see bigger changes.
 
Reactions: Exist50

BorisTheBlade82

Senior member
May 1, 2020
680
1,069
136
Just to reiterate: Being capable of producing such a wide and deep core would not be feasible without Big.little.
Producing a 16c SKU with these monsters would simply not make sense from an economic perspective. And being limited to <=8c would not make sense in the grand scheme.
So in the long term even lightly threaded workloads (AKA Gaming) will benefit from it.
 

Henry swagger

Senior member
Feb 9, 2022
494
300
106
Just to reiterate: Being capable of producing such a wide and deep core would not be feasible without Big.little.
Producing a 16c SKU with these monsters would simply not make sense from an economic perspective. And being limited to <=8c would not make sense in the grand scheme.
So in the long term even lightly threaded workloads (AKA Gaming) will benefit from it.
Thats the brilliance of the arm apple like design
 

lightisgood

Senior member
May 27, 2022
211
97
71
Supposing Redwood Cove is almost Golden Cove.
What selling point is MTL/Redwood Cove?

If Redwood Cove only has 6-way decoder, Redwood can increase its IPC based on enhancement backend.
I remember Haswell done this.

If SoC tile has large LLC,of course, this system arch improvement boosts "almost Golden Cove"(=Redwood Cove?) perf.
// We know that the SRAM Cell Area in TSMC N5 is smaller than it in Intel 4...
// Intel has the motive for expelling LLC from coumpute tile.
 

yuri69

Senior member
Jul 16, 2013
531
951
136
Just to reiterate: Being capable of producing such a wide and deep core would not be feasible without Big.little.
Producing a 16c SKU with these monsters would simply not make sense from an economic perspective. And being limited to <=8c would not make sense in the grand scheme.
So in the long term even lightly threaded workloads (AKA Gaming) will benefit from it.
Trying to go wide & deep at 6GHz leads to bad thermals. However, 60c wide & deep processors exist - one of them is Sapphire Rapids.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
Just to reiterate: Being capable of producing such a wide and deep core would not be feasible without Big.little.
Producing a 16c SKU with these monsters would simply not make sense from an economic perspective. And being limited to <=8c would not make sense in the grand scheme.
So in the long term even lightly threaded workloads (AKA Gaming) will benefit from it.
I don't actually expect Lion Cove to be a significant area increase, even putting aside the node shrinks. They should improve a lot from Redwood Cove.
 

Geddagod

Golden Member
Dec 28, 2021
1,295
1,368
106
Supposing Redwood Cove is almost Golden Cove.
What selling point is MTL/Redwood Cove?

If Redwood Cove only has 6-way decoder, Redwood can increase its IPC based on enhancement backend.
I remember Haswell done this.

If SoC tile has large LLC,of course, this system arch improvement boosts "almost Golden Cove"(=Redwood Cove?) perf.
// We know that the SRAM Cell Area in TSMC N5 is smaller than it in Intel 4...
// Intel has the motive for expelling LLC from coumpute tile.
We already have seen a dieshot of the meteor lake compute tile, we know that they have still have L3 slices on the compute tile itself.
Something interesting is that apparently L2 is going to increase in size again, and with larger smaller caches, it is likely that a higher latency L3 can be "hid" in a sense with the lower level caches.
But that shouldn't even be a large issue because all the cores are on the same compute tile, along with the L3 cache.
Maybe if there is some cache on the SOC tile, it can be used as a level 4 cache, like the extra cache on Haswell. I think it would have better latencies than the L4 cache used on Haswell though.
 

Geddagod

Golden Member
Dec 28, 2021
1,295
1,368
106
Thats the brilliance of the arm apple like design
They are already doing something similar with redwood cove. The relative size of the big cores compared to the small cores are growing, meaning that the big cores are seemingly getting less and less area efficient compared to the small cores.
The thing is though Intel still has to keep some area efficiency with the big cores in mind because they still use big cores for servers.
And exacerbating the issue is that Intel server chiplets are very different from the (imo smarter) way AMD is using chiplets- Intel has massive tiles comprised of large amount of cores, but only a few of said tiles, while AMD has smaller chiplets but a large amount of chiplets. More numerous, smaller tiles, should make better sense for the Intel strategy of sacrificing area for higher performance because using big cores =larger tiles= lower yield rates.
 

Geddagod

Golden Member
Dec 28, 2021
1,295
1,368
106
Trying to go wide & deep at 6GHz leads to bad thermals. However, 60c wide & deep processors exist - one of them is Sapphire Rapids.
Ik it's harder to clock wider architectures as it costs more power, but aren't deeper architectures easier to clock higher because more pipeline stages = higher clocks (for a general rule of thumb)?
 

Doug S

Platinum Member
Feb 8, 2020
2,713
4,606
136
Ik it's harder to clock wider architectures as it costs more power, but aren't deeper architectures easier to clock higher because more pipeline stages = higher clocks (for a general rule of thumb)?

Yes, but a larger clock network burns more power so the higher frequency isn't "free" even if you're doing the same amount of work per unit of time.
 
Reactions: Tlh97 and Geddagod

Geddagod

Golden Member
Dec 28, 2021
1,295
1,368
106
Yes, but a larger clock network burns more power so the higher frequency isn't "free" even if you're doing the same amount of work per unit of time.
So do longer pipelines allow higher clocks at iso power or does it just allow better clock scaling at higher amounts of power?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
So do longer pipelines allow higher clocks at iso power or does it just allow better clock scaling at higher amounts of power?

Clock frequency itself is part of the power formula. The simplified equation is P=FCV2, so it scales linearly with capacitance, frequency, and square with voltage.

In reality it'll use more than that because longer pipelines need more transistors, and you need better branch prediction logic to make up for the per clock performance loss by increasing pipeline stages.

You also go into diminishing returns, so the more pipeline stages you have, the clock speed increase is less and less.
 

Geddagod

Golden Member
Dec 28, 2021
1,295
1,368
106
Clock frequency itself is part of the power formula. The simplified equation is P=FCV2, so it scales linearly with capacitance, frequency, and square with voltage
Ye but core architecture also plays a part in clock frequency. Different architectures have different frequency curves and max frequencies. So would a longer pipelined architecture allow for higher max frequency or just better frequency at iso power?
 

moinmoin

Diamond Member
Jun 1, 2017
5,063
8,025
136
The thing is though Intel still has to keep some area efficiency with the big cores in mind because they still use big cores for servers.
Somewhere Intel messed this up. The consumer P cores keep including the unused AVX512 portion. The server cores as seen in SPR have become so huge due to all the additional accelerators that it pretty much has become cost prohibitive to compete directly with AMD's core count (and Bergamo is going to exacerbate that even more).
 

Geddagod

Golden Member
Dec 28, 2021
1,295
1,368
106
Somewhere Intel messed this up. The consumer P cores keep including the unused AVX512 portion. The server cores as seen in SPR have become so huge due to all the additional accelerators that it pretty much has become cost prohibitive to compete directly with AMD's core count (and Bergamo is going to exacerbate that even more).
The consumer P-cores keeping AVX-512 hardware but not using them is a giant waste, I agree. Do you know how much exact percentage of the die space it uses up though? Haven't been able to find data on that. But I also think it might be a waste of resources to design two cores, one with avx-512 and one without, to differentiate the too. It might just be easier for Intel to take the loss on the consumer side.
As for the accelerators, I think AMX is the only accelerator that expands the core size as each core gets it. For the other accelerators- DSA, QAT, and DLB, they all take up the same amount of space on the die as one CPU core. I think Intel's P-cores are just hilariously big compared to zen 3 and 4 cores, and paired with Intel's worse chiplet strategy, they just can't scale as well.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
So do longer pipelines allow higher clocks at iso power or does it just allow better clock scaling at higher amounts of power?
It's a question of overhead. With zero overhead, more pipestages reduced your critical path, giving you proportionally more speed OR you can lower the voltage for the same speed (saving power), or any combination of the two. But as others have pointed out, the flops between each stage add power, timing, and performance overhead, so there's a balance. IIRC, roughly 16 FO4 delay has been something of a floor, but I don't recall where/when I heard that, so take it with a grain of salt.
 
Reactions: Geddagod
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |