Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 673 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
702
632
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4TSMC N3BTSMC N3BIntel 18A
DateQ4 2023Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P8P + 16E4P + 4E4P + 8E
LLC24 MB36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,014
  • LNL.png
    881.8 KB · Views: 25,501
Last edited:

LightningZ71

Golden Member
Mar 10, 2017
1,910
2,260
136
Also keep in mind the 288 core Clearwater Forest is 12 dies of 24 cores each, with 24 cores being composed of 6x quad core clusters.
I missed that detail. So, in a certain way, it's similar to a hypothetical 12CCD Epyc processor where each CCD is a 6 core CCX, but the cores are 'mont quads. Assuming that they get their I/O setup right, it should be broadly competitive.
 

Saylick

Diamond Member
Sep 10, 2012
3,645
8,223
136
I missed that detail. So, in a certain way, it's similar to a hypothetical 12CCD Epyc processor where each CCD is a 6 core CCX, but the cores are 'mont quads. Assuming that they get their I/O setup right, it should be broadly competitive.
Not quite as similar to EPYC, I think. CCD-to-CCD communication has to go through the IOD in Zen, while Intel uses a mesh interconnect, so each cluster talks to each other via the on-die network, where latency is based on the number of hops between nodes in the mesh. It was this way for Sapphire Rapids, Emerald Rapids, and Granite Rapids. I don’t see why it would change for what comes next, even if there are more compute tiles.

For P-core server products, there’s one network node per core. I’ll have to double check, but it would not surprise me if for E-core server products, there’s one network node per E-core cluster.
 
Reactions: Elfear

OneEng2

Senior member
Sep 19, 2022
259
359
106
How are you calculating per core performance is AVX-512 included in them than yes elso no for pure integer workloads which many are both will be similar integer performance per thread

I agree for VM part but there are customers who disables SMT so it will give them more physical cores to work with
On a sidenote not a single site benchmarks the accelerator in Silicons they are niche but have decent use cases
Zen 5c can operate on 1.4 threads at a time per core. Skymont can operate on 1 thread at a time. If the single core IPC of Zen 5c was exactly equal to Skymont, and they were clocked at the same speed, Zen 5c would still perform 1.4 times faster than Skymont. This would be the worst case in a MT application.

Additionally, if there are any AVX512 executions in the workload, Zen 5c gets another big boost.

That is where I got the 40-60% "guesstimate" or SWAG .
Yeah, idk about that... Unless 18A is complete garbage(which is a possibility) 40-60% higher perf seems too optimistic outside niche HPC/AI workloads and Clearwater has more cores. Epyc Zen 4C had 50% higher perf per core... Against 8ch Sierra that had Crestmont and 100mb L3. CLF has Darkmont equipped with more L3 and likely faster 12ch DDR5. Crestmont was slower than Zen 4 in int perf let alone FP, but Skymont already narrowed that gap
Clearwater does have more cores, but each core can only operate on 1 thread while Zen 5c can operate on 1.4 threads at a time in a MT workload. Add in any AVX512 or FP tasks in the workload and it isn't hard to see each Zen 5c core performing 1.5 times as much as each Skymont core.

Someone show me where my math is off here. Seems like lots of people think I am off base (and I might be).
My biggest concern for Intel's very high core count 'mont server processors has nothing to do with the cores themselves, save if they can have a competitive AVX-512 implementation and remain compact and efficient enough, but is far more about Intel's mesh fabric connecting them all. 288 cores in clusters of 4 is still 72 reservation stations. How will the mesh affect performance for them?
That is a valid concern as well. Feeding 288 cores is a marvel all on its own. I think we are definitely looking at more bandwidth (and socket power) for future DC processors.
 
Reactions: Tlh97
Jul 27, 2020
20,917
14,493
146
No need to be an Intel beta tester.
That's actually a great argument in trying to get a free 285K.

Dear Intel,

With my two prior RMA requests X and Y and now a third one, I think I have demonstrated quite consistently that I'm the sort of user who is perfectly suited to testing your CPUs and stressing them in normal workloads without any sort of overclocking involved. I think it would be prudent to let me have the 285K so the respective product teams can learn how actual users work in real life using Intel processors, instead of running Cinebench on a constant loop for X amount of hours and declaring a processor fit for public consumption.

Yours truly,
The "three times successful smasher of Intel CPUs" Hulk
 

jdubs03

Golden Member
Oct 1, 2013
1,079
746
136
That's actually a great argument in trying to get a free 285K.

Dear Intel,

With my two prior RMA requests X and Y and now a third one, I think I have demonstrated quite consistently that I'm the sort of user who is perfectly suited to testing your CPUs and stressing them in normal workloads without any sort of overclocking involved. I think it would be prudent to let me have the 285K so the respective product teams can learn how actual users work in real life using Intel processors, instead of running Cinebench on a constant loop for X amount of hours and declaring a processor fit for public consumption.

Yours truly,
The "three times successful smasher of Intel CPUs" Hulk
Heh, might work except for maybe that last sentence there.
Put the pressure on Mr. Banner!
 

Hulk

Diamond Member
Oct 9, 1999
4,701
2,863
136
That's actually a great argument in trying to get a free 285K.

Dear Intel,

With my two prior RMA requests X and Y and now a third one, I think I have demonstrated quite consistently that I'm the sort of user who is perfectly suited to testing your CPUs and stressing them in normal workloads without any sort of overclocking involved. I think it would be prudent to let me have the 285K so the respective product teams can learn how actual users work in real life using Intel processors, instead of running Cinebench on a constant loop for X amount of hours and declaring a processor fit for public consumption.

Yours truly,
The "three times successful smasher of Intel CPUs" Hulk
I also cap frequency at 5.5GHz and power at 200W and I still seem to burn them out.

Funny thing is, around 20 years ago I used to do a lot of beta testing and actually wrote a few "how to" books on some of the software I was testing. Even got to go to NAB in Vegas twice and speak as an "expert" on video editing and get paid for the trip. Then YouTube came and wiped out that market.
 
Jul 27, 2020
20,917
14,493
146
I also cap frequency at 5.5GHz and power at 200W and I still seem to burn them out.
Hey, maybe the CPU tries to compensate for the lack of power by overperforming in ST with higher than normal boosts since it isn't allowed to flex its muscles MT-wise? Definitely seems like some sort of boosting algo designed to win benchmarks in the short term.
 

DavidC1

Golden Member
Dec 29, 2023
1,211
1,932
96
Not quite as similar to EPYC, I think. CCD-to-CCD communication has to go through the IOD in Zen, while Intel uses a mesh interconnect, so each cluster talks to each other via the on-die network, where latency is based on the number of hops between nodes in the mesh. It was this way for Sapphire Rapids, Emerald Rapids, and Granite Rapids. I don’t see why it would change for what comes next, even if there are more compute tiles.
While it uses the mesh, Clearwater Forest potentially has one big advantage over current server chips, and it's that it can use Foveros Direct to communicate, whereas now it's using EMIB.

Foveros Direct is the most advanced version:


So potentially the connections between the clusters can be much faster and with less power penalties.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,793
5,489
96
While it uses the mesh, Clearwater Forest potentially has one big advantage over current server chips, and it's that it can use Foveros Direct to communicate, whereas now it's using EMIB.
Nope. The bases and the I/O caps are still chained over EMIB.
Hybrid bonding just allows them to put cores in top of cache, same as PVC/MI300/Granite Ridge-X/you-name-it.
Plus you're still dealing with a rather xboxhueg mesh in any case, which is slow.
 

DavidC1

Golden Member
Dec 29, 2023
1,211
1,932
96
Skymont is a beast. Lion Cove is a total letdown. I am really curious as to how much larger Skymont would be if they scaled it up to hit the same clocks as Lion Cove and gave it similar instruction set, e.g. AVX-512.
Better way is to make it wide as possible and keep it at 5GHz or below. Like 50% faster per clock.

Lion Cove and Skymont should also be able to put out few % more if the SoC itself did not suck.

@adroc_thurston That's a bit of a disappointment. I guess it's V-cache with cache as a base tile then.
 
Reactions: Tlh97

DavidC1

Golden Member
Dec 29, 2023
1,211
1,932
96
Each Clearwater Forest 24 core tile seems to be about 90mm2. That means the quad core cluster is little under 15mm2.

Total of 12 compute tiles is ~1400mm2. The base is Intel 3. They can put a LOT of cache underneath if they want to. I read it's around only 1/2 GB though? If they make it take up the space underneath, they can get 1GB of SRAM under there.
 
Reactions: Tlh97

cannedlake240

Senior member
Jul 4, 2024
207
111
76
Each Clearwater Forest 24 core tile seems to be about 90mm2. That means the quad core cluster is little under 15mm2.

Total of 12 compute tiles is ~1400mm2. The base is Intel 3. They can put a LOT of cache underneath if they want to. I read it's around only 1/2 GB though? If they make it take up the space underneath, they can get 1GB of SRAM under there.
Apparently the 4x3 CLF Intels been showing isn't how the actual package looks like. Bionic on Twitter said a while ago that Clearwater has more than 2X L3 over SRF, so only twice the 216mb of the 288C Sierra. Base tiles house emib phys, memory controllers and the mesh fabric.

One could also take the "doubling of L3", as 2x over 144C SRF which would just be baffling lol... Imagine a 288C cpu with so much SRAM real estate only having a little over 200mb of L3. If that's the case they should at least double the cluster L2 as well, 8mb cluster L2 has been talked about since Tremont days
 
Last edited:
Reactions: Tlh97

Saylick

Diamond Member
Sep 10, 2012
3,645
8,223
136
Better way is to make it wide as possible and keep it at 5GHz or below. Like 50% faster per clock.
Would be interesting to see how well the clustered decode approach scales. Why not just add another cluster of 3-wide decode at this point and then widen everything else downstream.
 

511

Golden Member
Jul 12, 2024
1,038
897
106
I also cap frequency at 5.5GHz and power at 200W and I still seem to burn them out.

Funny thing is, around 20 years ago I used to do a lot of beta testing and actually wrote a few "how to" books on some of the software I was testing. Even got to go to NAB in Vegas twice and speak as an "expert" on video editing and get paid for the trip. Then YouTube came and wiped out that market.
If you had multiple defective cpu either you are getting wrong CPUs constantly or there somewhere something in MB causing issues my advice would be cap IA voltage to around 1.45V or maybe 1.5V depending on the desired frequency it will prevent degradation altogether on new cpu
How is zen 5 so fast in FP?
They significantly improved the FP performance full fat AVX-512 with more units to feed
 
Last edited:

511

Golden Member
Jul 12, 2024
1,038
897
106
Each Clearwater Forest 24 core tile seems to be about 90mm2. That means the quad core cluster is little under 15mm2.

Total of 12 compute tiles is ~1400mm2. The base is Intel 3. They can put a LOT of cache underneath if they want to. I read it's around only 1/2 GB though? If they make it take up the space underneath, they can get 1GB of SRAM under there.
Dou you have a die shot available?
 

DavidC1

Golden Member
Dec 29, 2023
1,211
1,932
96
Would be interesting to see how well the clustered decode approach scales. Why not just add another cluster of 3-wide decode at this point and then widen everything else downstream.
It is exactly what it says on the optimization manual for Gracemont.
This overall approach to x86 instruction decoding provides a clear path forward to very wide designs without needing to cache post-decoded instructions.
Can't be more optimistic than that.
-It saves on complexity, meaning less time
-It saves on transistors, meaning less power and area
-It can scale easily, while going above 8-wide traditionally is questionable
-Each clusters are only 3-wide, so easier to fill
-Works on both branches and loops
-Further opportunities for improvement, not just on the decode section but coupled with changes elsewhere.

There was an X post about Keller having worked on Intel's next architecture with 12-wide decode. This is likely Arctic Wolf.

And I doubt they're widening it by 33% to get 5-10% gains. That's not what they've been doing. Branch predictor on Skymont is 27% over Gracemont. FP is 20-30% more area for 30% extra performance.

Anandtech article about Atom said the design goals within that team were 1% power for 2% performance. Or the compactness of the core. You need to be very balanced to be like that. Can't go spending too much on one area and skimping on another. Would not be surprised if they bring some more new ideas to deliver on it.
 
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,211
1,932
96
Dou you have a die shot available?
You don't need a die shot. You only need a shot of the package. You find the size of the LGA7529 package. The actual shot shows having 3x narrow dies. The narrowness is because the dies are right next to each other like on Meteorlake. Each narrow die is actually 4 of them. You find the size of the narrow geometry, and divide by 4.

It's possible 4 of the dies are connected by Foveros, and connection to another of the quad die groups are done using EMIB. Then again, SPR is using EMIB only and it's pretty close between the dies.

Same with how I got Turin Dense Zen 5c's size. You get the package size, and measure the die. There's a clear separation between core and L2 for AMD, so it's even easier to find core size only.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,184
11,890
136
Is it likely that until put more resources into the development of skymont because it does double duty, client and server? Whereas lion cove is pretty much just client.

Technically, Skymont won't see use in any server products. Darkmont will, though.

It is unfortunate that Intel doesn't have a Lion Cove-based server product and instead chose to use Redwood Cove in Granite Rapids.

Oh yeah! X3D didn’t. Turin didn’t. Same as GNR didn’t. And Lunar Lake didn’t.

Granite Rapids (assuming you're talking about that, and not Granite Ridge) isn't even using the same core as the 285k. It's Redwood Cove.

Lunar Lake doesn't share the same compute chiplet or even have the same package layout and is a niche product for 15W and below. It's good for what it is, but . . . not exactly the same thing!

Meanwhile, Turin, Granite Rapids, and Granite Rapids-X use the same CCDs. It's all the same product.

When Zen 5 desktop parts were launched it was a disaster

Why, because some reviewers didn't like the game performance on the 9950X? Please. It's the most lucrative disaster AMD ever had. And a few weeks later, X3D parts hit the streets and all was forgiven. Meanwhile, take a look at client market share for Q3 2024 and see what's really happening.
 
Reactions: Tlh97 and misuspita
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |