Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 25 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
695
601
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (20A)Arrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4Intel 20ATSMC N3BTSMC N3BIntel 18A
DateQ4 2023Q1 2025 ?Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P6P + 8E ?8P + 16E4P + 4E4P + 8E
LLC24 MB24 MB ?36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,000
  • LNL.png
    881.8 KB · Views: 25,481
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
Ye but core architecture also plays a part in clock frequency. Different architectures have different frequency curves and max frequencies. So would a longer pipelined architecture allow for higher max frequency or just better frequency at iso power?

The reason for my post is that higher clocks always results in higher power consumption, because you are raising the clock of the entire CPU core. You need some radical differences(like Pentium 4 vs Pentium M) before one is more "efficient" per MHz. Actually Pentium M vs Pentium 4 is solid evidence that hyper pipelined CPUs use way more power per MHz.

Voltage scaling is pretty much dead. At the load clocks you aren't reducing voltage to any significant degree, if at all. So the whole thing about using deeper pipelines to clock higher so you can save power is thrown out the window.

Besides, extreme pipelined CPUs basically did not meet a single goal of the designers. Higher clocks? Barely. Efficient? Think opposite. Streamlined? Nope, it's more complex.

Realistically when you increase pipeline stages a lot all you get is lower performance per clock while noticeably increasing transistor count, die size, and power use. Look at Power 6, In-order Atoms(pre-22nm), Bulldozer, and Netburst uarch CPUs. The successors performed better, used less power and was simpler!
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
30% IPC gain over meteorlake or raptorlake?

Meteorlake.

@mikk We're getting 2 straight years of small gains and we have evidence of BIG changes with the Lion Cove core. It would actually be a disappointment but I guess it won't be a huge surprise since usually always disappoints.

It'll also be weird to see why they would add TWO extra decoders if they could just get away with 1.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
This reason is another one why Intel in particular isn't keen on fixed-function hardware blocks for video encode/decode. They already leverage the iGPU.
The media block is an independent IP on most SoCs I'm aware of. You can use the GPU for hybrid decode, but I think that's relatively rare.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
and this raichu person has never been wrong? Why should I place more faith in this person over that person from august?
¯\_(ツ)_/¯ then don't. For myself, I'm stating "Lion Cove is not Royal" in the same way I'd say "The sky is blue". It'll all bear out in due time.
I know what a moonshot is. I was asking if it was a new core intel was designing. Zen possibly, core not really.
Then yes, Royal is a new core.
 

BorisTheBlade82

Senior member
May 1, 2020
680
1,069
136
I'm guessing the problem with server is that Intel tiles have large sections of the tile stuffed with EMIB connectors, but also stuff like IO, which for AMD is moved off to it's own chiplet.
Yes, Intel uses about 20% IIRC for the EMIB interconnect alone. This is a massive bandwidth considering how dense in Gbps/mm2 the connection is. They need this in order to make all the resources of one tile available to another tile. (https://www.anandtech.com/show/1692...nextgen-xeon-scalable-gets-a-tiling-upgrade/2)
Interestingly, AMD with its topology does not need these bandwidths and still there does not seem to be a big impact on performance.
When I first heard of SPR my initial impression was that Intel might turn around DC with it. And in 2020 that might have worked. But Milan and Genoa show the superiority of AMD's approach and made Intel rethink theirs as well.
 

mikk

Diamond Member
May 15, 2012
4,233
2,290
136
Meteorlake.

@mikk We're getting 2 straight years of small gains and we have evidence of BIG changes with the Lion Cove core. It would actually be a disappointment but I guess it won't be a huge surprise since usually always disappoints.

It'll also be weird to see why they would add TWO extra decoders if they could just get away with 1.


I would like to add that the chief architect of Intel's performance core recently said we will see bigger and bigger jumps after Raptor Lake and Meteor Lake. Coupled with the stronger competition and fixed 8 big core count for now (thanks to big little), I can believe we might see bigger improvements compared to the past. Intel was stuck on 14nm and 10nm for many years. We will see Intel 4/TSMC 3nm/20A/18A in a relatively short timeframe which allows investing in more transistors and bigger achitectures in the next few years.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
Yes, Intel uses about 20% IIRC for the EMIB interconnect alone. This is a massive bandwidth considering how dense in Gbps/mm2 the connection is. They need this in order to make all the resources of one tile available to another tile. (https://www.anandtech.com/show/1692...nextgen-xeon-scalable-gets-a-tiling-upgrade/2)
Interestingly, AMD with its topology does not need these bandwidths and still there does not seem to be a big impact on performance.
When I first heard of SPR my initial impression was that Intel might turn around DC with it. And in 2020 that might have worked. But Milan and Genoa show the superiority of AMD's approach and made Intel rethink theirs as well.
I think the topology differences are more about cost than anything else. AMD pays some power/area overhead with the SERDES links and extra L3, but gains better yields, a relatively cheap IO die, and avoids some the cost of advanced packaging. But if SPR had launched around when Milan did, performance would likely not have been a major issue for Intel. It's the delays, independent of chiplet strategy, that have sunk their performance competitiveness.

Though that said, it's difficult to assess the pros and cons of each from the fairly limited testing most outlets perform. The greatest weakness of AMD's chiplet strategy would be things like bin-packing VMs with only 8c granularities per CCX. You're not going to see that kind of stuff from Cinebench, Geekbench, or SPEC. But clearly those are fairly minor issues in the big picture.

I think GNR vs Turin will make for some very interesting comparisons. Should be roughly iso-process, and I expect AMD to have a core uarch advantage if the RWC+ rumor is true, but topologically, seems like Intel's still using large tiles.
 
Reactions: Tlh97 and Saylick

Hulk

Diamond Member
Oct 9, 1999
4,457
2,375
136
I don't see how a 6+8 Meteor Lake could compete with 8+16 Raptor Lake? I've read that there will probably be a bit of a clock speed regression in moving to Intel 4 so some ground may be lost there.

It would require a 33% IPC increase in the P cores for 6 to equal 8. As we all know, that's enormous.
And even if Hypertheading were enabled on the E's, which provided about 26% MT uplift for Skylake that would still mean 12.7 E's would be required for parity.

Also I remember reading (can't find it though) that HT is a good way to increase MT performance from both an area and power point-of-view since it is basically exploiting unused CPU resources. Seems like a good fit for the E's, no? The only thing I'm thinking is that logical cores are already weaker than physical ones and in a hypothetical 8+16 arrangement you'd have 8 "weaker" logical P cores and 16 "weaker yet" logical E cores. So, in order to make good use of all of those threads you'd need a really well optimized MT application, and many of them don't exist outside of benchmarks so that is why Intel has not gone down this path?

Intel has set itself a pretty high performance bar with the 13900K. Or more correctly AMD forced their hand in setting this bar. Now they have to figure a way to jump it on their next pass of the track.

This feels similar to the situation with 10900K to 11900K if 6P core rumors are true for ML-S.
 
Reactions: Tlh97

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
I don't see how a 6+8 Meteor Lake could compete with 8+16 Raptor Lake?
The latest rumor is 6+16, fwiw.
It would require a 33% IPC increase in the P cores for 6 to equal 8. As we all know, that's enormous.
Now, I'm not going to claim to know how MTL will compare to RPL in everything, but they wouldn't need such an IPC increase. Even with no IPC gains, you can use the performance gains from the new node for better clocks at iso-power. If Intel 4 were worse than Intel 7 across the VF curve, it would be DOA.
Also I remember reading (can't find it though) that HT is a good way to increase MT performance from both an area and power point-of-view since it is basically exploiting unused CPU resources. Seems like a good fit for the E's, no?
I think the reality tends to be a bit more complicated.
 
Jul 27, 2020
19,613
13,480
146
Also I remember reading (can't find it though) that HT is a good way to increase MT performance from both an area and power point-of-view since it is basically exploiting unused CPU resources. Seems like a good fit for the E's, no?
I recall that HT needs 5% of die area for implementation. Do the E-cores have it implemented in their silicon? Coz that would increase the size of the E-core cluster by 20% per cluster and 80% overall for all the clusters combined for 16 cores. Seems wasteful if it's there but disabled.

Further, the extra threads would increase pressure on their shared cache. They would also need more bandwidth from RAM coz the extra threads need to be fed with data. All of this activity will produce extra heat in the already crammed area taken up by the closely packed E-cores. It's possible that Intel has tried this already and the cons outweighed the pros. Maybe in future when they are able to refine E-cores further.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
I recall that HT needs 5% of die area for implementation. Do the E-cores have it implemented in their silicon? Coz that would increase the size of the E-core cluster by 20% per cluster and 80% overall for all the clusters combined for 16 cores. Seems wasteful if it's there but disabled.
Uh, that's not how the math worths. Taking those numbers at face value, a 5% area increase per core would increase the area of a cluster by 20% of a single core's area, not 20% flat.
 

Doug S

Platinum Member
Feb 8, 2020
2,711
4,602
136
It's a question of overhead. With zero overhead, more pipestages reduced your critical path, giving you proportionally more speed OR you can lower the voltage for the same speed (saving power), or any combination of the two. But as others have pointed out, the flops between each stage add power, timing, and performance overhead, so there's a balance. IIRC, roughly 16 FO4 delay has been something of a floor, but I don't recall where/when I heard that, so take it with a grain of salt.


To expand on this a little, there is some engineering margin or "slop factor" in every pipe stage, because the work in a stage MUST complete during the clock cycle. Some stages may have tighter timing margins and others looser, depending on how much work there is a particular stage for a particular function.

So e.g. splitting up a 15 stage pipeline into 30 stages won't let you double your frequency, because that engineering margin "slop factor" is paid 30x instead of 15x.

If asynchronous CPUs ever became a thing then this wouldn't be a problem because you'd wouldn't have that wasted time in each cycle, and without a clock network you'd save that power too (though that's probably largely paid back or even more than paid back by the latching network that would replace it)
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
I recall that HT needs 5% of die area for implementation. Do the E-cores have it implemented in their silicon? Coz that would increase the size of the E-core cluster by 20% per cluster
Uh, that's not how the math worths. Taking those numbers at face value, a 5% area increase per core would increase the area of a cluster by 20% of a single core's area, not 20% flat.
To be honest a 5% per core would translate to 5% per cluster.

But the issue is the e core's design. They are simply not design for neither HT nor AVX-512, as you have seen from Meteor Lake and Arrow Lake diagram they follow the same design philosophy.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
To be honest a 5% per core would translate to 5% per cluster.
I wanted to avoid any dependency on module overhead. And also show where than "20%" could mistakenly come from.
Guys, I ain't no math genius so Nicalandia and Exist50, show me your homework. Why are both of you not in agreement?
Say you have a 100mm2 core, just to make the math prettier. 5% HT overhead on top would be 0.05 * 100mm2 = 5mm2. If you have four cores, you have 4 x 5mm2 = 20mm2 for HT, but that's on top of 4 x 100mm2 = 400mm2 baseline. 20mm2/400mm2 = 5mm2/100mm2 = 5%.

Or perhaps more intuitively, if you increase the area of part of the die by 5%, you'll always get ≤5% for the die as a whole.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Guys, I ain't no math genius so Nicalandia and Exist50, show me your homework. Why are both of you not in agreement?

Let's use MTL Crestmont e core as an example.

1 e core size is 1.046, lets round that to 1mm^2
1 Quad Cluster size is 5.907 mm^2 , but for illustrations purposes we will say 4 mm^2 to keep numbers even

e core die area is about 1 mm^2 a 5% increase on die area is 1.05 mm^2 right? so 1.05 x 4 = 4.2 mm^2 and 4.2/4 is 1.05 which is 5%...

So at worst a 5% increase in die area per core would translate to 5% die area per cluster.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Ah! I understand it now. Thanks!

5% of single core IS NOT 5% of cluster. DOH!
Well.... 5% die area is still 5% die area regardless of how many clusters they put, as far as I am ware MTL-S will have 4 quad cluster at the very top of the SKU(14900K) still 5% is really nothing when compared to the theoretical maximum MT performance boost which is 15%-30%...

Except those e cores were not designed with HT and AVX-512 in mind.. Intel pulled a 6% IPC boost from Gracemont to Raptormont just by doubling the L2(I don't have the exact die area size so it could be more than 5%) I suspect that the MTL-S will have 6MiB per cluster, faster internal ring bus and we could see double digit IPC boosts on those e cores.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,457
2,375
136
The latest rumor is 6+16, fwiw.

6+16 along with the rumors that while the P's are lightly upgraded the E's are significantly enhanced makes sense.
First, Intel is probably noticing that many apps that only really rely on 8 cores, can do as well with 6 and if those 6 cores have better IPC than Raptor then it's all the better.
Second, as we move into the future applications are getting better at MT so moving some more compute to the E's makes sense. 16 greatly enhance E's would be very beneficial to having ML surpass RPL. Currently 1 P is worth about 2 E's based on IPC alone, more when you figure in the increased clocks on the P's. If they could reduce that disparity by 15 or 20%.. well, there you go. At this point in the hybrid Big.Little development I would think that there is lower hanging fruit on the E trees compared to the P trees
 

poke01

Platinum Member
Mar 8, 2022
2,004
2,542
106
Arrow Lake will be Intel's M1 that is it be a wide arch like Firestorm. Guess what Firestorm clocks at 3.2Ghz. Arrow lake will be perfect for ultrabooks. (If Lunar Lake is based on Lion Cove no wonder Intel is aiming for pref/w leadership)

High IPC, low clocks but still very powerful. I never liked high clocks arch or as some people say speed demons. These sort archs like Golden Cove are horrible in laptops.

So we will have 2 chip designers that will have a 8-wide chip in 2024. Like Exist50 says, Arrow Lake will see Intel gain good ground.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
Arrow Lake will be Intel's M1 that is it be a wide arch like Firestorm. Guess what Firestorm clocks at 3.2Ghz. Arrow lake will be perfect for ultrabooks. (If Lunar Lake is based on Lion Cove no wonder Intel is aiming for pref/w leadership)

High IPC, low clocks but still very powerful. I never liked high clocks arch or as some people say speed demons. These sort archs like Golden Cove are horrible in laptops.

So we will have 2 chip designers that will have a 8-wide chip in 2024. Like Exist50 says, Arrow Lake will see Intel gain good ground.
Well slow down there a second. I really don't know what IPC gains we'll see with Lion Cove. Certainly bigger than RWC, but beyond that, not sure. I do expect it to look a whole lot better from a PPA standpoint, but how all that shakes out is very tbd. I would be surprised if it's <6GHz on 20A, however.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Meteor Lake on Top of Raptor Lake




Alder Lake On Top of Raptor Lake



Meteor Lake and Alder Lake appear to be very similar on design layout(At same size for comparison)
 

Geddagod

Golden Member
Dec 28, 2021
1,295
1,368
106
Arrow Lake will be Intel's M1 that is it be a wide arch like Firestorm. Guess what Firestorm clocks at 3.2Ghz. Arrow lake will be perfect for ultrabooks. (If Lunar Lake is based on Lion Cove no wonder Intel is aiming for pref/w leadership)

High IPC, low clocks but still very powerful. I never liked high clocks arch or as some people say speed demons. These sort archs like Golden Cove are horrible in laptops.

So we will have 2 chip designers that will have a 8-wide chip in 2024. Like Exist50 says, Arrow Lake will see Intel gain good ground.
I think Arrow Lake will still have high clocks. When Intel went wider with Golden Cove, they still managed to keep 5 Ghz, same as Willow cove. They also used a better process, arrow lake will also use a better process compared to meteor lake.
Apple and Intel being 8 wide is nice, but you gotta wonder what about zen 5. Decode Width isn't everything, but I think zen 5 is going to end up being less wide than lion cove. I really don't think AMD is going to double their width in one generation with zen 5.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |