Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 496 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
694
600
106






As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (20A)Arrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4Intel 20ATSMC N3BTSMC N3BIntel 18A
DateQ4 2023Q1 2025 ?Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P6P + 8E ?8P + 16E4P + 4E4P + 8E
LLC24 MB24 MB ?36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake



As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



 

Attachments

  • PantherLake.png
    283.5 KB · Views: 24,000
  • LNL.png
    881.8 KB · Views: 25,481
Last edited:

Nothingness

Diamond Member
Jul 3, 2013
3,031
1,971
136
So, AVX512 is more of a niche.
On this I partially agree. But when it's useful, it can bring more speedups than two or three generations of Intel CPU microarchitectures.

I remember people used to say the same about AVX2 (especially during the 10 years braindead Intel segment marketing made sure it wasn't available on all of their CPUs). Of course as vectors get wider, their use is more and more difficult, and more and more niche at first sight. But then you see good programmers making use of these extensions to benefit more and more applications.

I'm curious to see what AVX10.2 will bring to the table with its improved ISA and likely 256-bit first implementation.
 

DrMrLordX

Lifer
Apr 27, 2000
22,000
11,561
136
So, AVX512 is more of a niche.
It's more like this (and I'm grossly oversimplifying things here):

Let's say you're dealing with fp32 data. How many operations can you squeeze out of your code where you can carry out addition and/or multiplication operations on sixteen of those data points at once without any dependencies? Can you do this repeatedly without necessarily having to do something else in the same thread, other than something simple like iterating a loop counter? If the answer to those questions is "a lot" and "yes", then if you code for it, you can potentially make use of AVX512! Of course AVX512 isn't just AVX256/AVX2 on steroids. It does have other functions.

Regardless, you need to know what you're doing to actually leverage SIMD like AVX512. Even if you take the easy route and rely on autovectorization, your code has to be set up so the compiler or runtime engine can figure out how to align all the code properly and produce what you want. And that is (or can be) niche.

There's probably a fair amount of code out there that could be rewritten to make better use of AVX2 and even AVX512. The question is, who's going to go to all that effort?
 

MS_AT

Senior member
Jul 15, 2024
209
497
96
Looks like AVX512 is not useful for general consumers after all!
I am afraid it's more useful than Cinebench itself is.

You confuse usefulness of AVX512 vs historical availability of AVX512. Software is usually written for lowest common denominator and since AVX512 HW wasn't generally available so nobody was writing consumer software capable of using it. What is more Skylake-X implementation was suffering from instruction licences so sprinkling bits of AVX512 here and there carried performance penalty for code that used AVX512 sparingly. Intel fixed that with Icelake and AMD never had the problem to begin with. But this initial situation gave AVX512 opinion of something being HPC niche.

The reason why Intel dropped AVX512 with Alder Lake was their shift to use E-cores and initial strategic mistake. The mistake being even though AVX512 instructions apply to 128b,256b,512b register widths, the 512b width is mandated and other are optional. And adding 512b support while keeping 128b execution width for E-cores to save space would led to terrible performance for reasons other than quadruple pumping.

And Intel doesn't consider AVX512 niche or useless as evidenced by their push for AVX10 that will be available on consumer hardware and it will make all the fancy AVX512 instructions available for targets supporting at least 256b execution width that is much easier to implement on E-cores as double pumped 128 units than AVX512 would be possible. If anything you could think of 512b execution width being niche.

And why AVX512 is useful? Because it makes it easier to work with vectors due to new predicate instructions and masking in general, therefore I would argue it's easier to add support for AVX512 than for AVX2 to your program. Why people are not doing that? Once again, there was not enough HW on the market capable of using it, to make it worthwhile. Even with runtime dispatch based on detected CPU features it adds validation costs etc, so you need to make sure you can get return on this investment.

The additional benefit of SIMD is that it lowers the burden on the decoders as you are able to do more work with fewer instructions. Zen4 nicely showed that, and this is more important for x64 than it is for ARM, where decode is relatively easier to implement than for x64.

And the compatibility moat for x64 would be that much harder for ARM to overcome if Intel assumed NVidia approach [same features are supported on consumer and enterprise products but some are slower on consumer hardware. Still you can use consumer hw for development. That was not possible with AVX512 that was Xeon only at the beginning].
 

Hulk

Diamond Member
Oct 9, 1999
4,455
2,373
136
The reason why Intel dropped AVX512 with Alder Lake was their shift to use E-cores and initial strategic mistake. The mistake being even though AVX512 instructions apply to 128b,256b,512b register widths, the 512b width is mandated and other are optional. And adding 512b support while keeping 128b execution width for E-cores to save space would led to terrible performance for reasons other than quadruple pumping.
Great post. Can you clarify this paragraph a bit more? Was the mistake that E cores didn't have AVX512 or that AVX512 was not correctly developed upon inception?
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,911
3,523
136
Great post. Can you clarify this paragraph a bit more? Was the mistake that E cores didn't have AVX512 or that AVX512 was not correctly developed upon inception?
its that AVX512 deliberately made choices that make splitting 512 vectors into small vectors hard , operations that cross "lanes" / the major binary boundaries (128/256). Those operations are useful but an overall flexability detriment.
 

Aeonsim

Junior Member
May 10, 2020
10
27
91
AVX512 isn't useless for the standard user. Plenty of basic functions, such as parsing JSON (lots of websites), text processing, sorting, base64 encoding/decoding and similar, can substantially benefit (30-60% faster) when rewritten to use AVX512 operations. The problem is simply the lack of penetration in the CPU market due to Intel's confused approach to AVX12. This has meant there has been little benefit for the massive amount of legacy code and libraries to switch to using the instructions.
 

alcoholbob

Diamond Member
May 24, 2005
6,303
348
126
Great post. Can you clarify this paragraph a bit more? Was the mistake that E cores didn't have AVX512 or that AVX512 was not correctly developed upon inception?

Probably a feature that requires going into the BIOS to turn off e-cores just ended up being something Intel didn't want to deal with, and have to validate. Or they were going have to do something stupid like AMD having users open Xbox Game Bar...
 

Wolverine2349

Senior member
Oct 9, 2022
371
112
76
No idea that’s a bug or not.


Well interesting if true or not?

I have heard that DDR5 cannot do Gear 1 1:1 on either AMD or Intel?

Though on Zen 4, isn't DDR5 actually Gear 1 with the UCLK (memory controller). I mean I have 7800X3D and UCLK is at 3000 and my RAM speed is at 3000 (6000 MT/s for DDR). Or is that not Gear 1 because the Infinity Fabric is not 1:1

I mean Intel is still similar to AMD in latency with Raptor Lake with 6000 tuned RAM despite needing to run in Gear 2??

Maybe is it possible since the IMC in Arrow Lake is not longer part of ring bus Gear is possible? Or is it true that even AMD its not really Gear 1 or can you not say that cause it works so differently than Intel?

Is the Infinity Fabric on Zen 3 and 4 and 5 like the ring bus on Intel 12th to 14th Gen? Or is IF more like the IMC or no??

Cause if AMD is truly Gear 1 on Zen 4 and 5 with DDR5 6000 (3000 MHz), Intel must just have so much better latency the fact that they can equal AMD ns with half the memory clock?
 

alcoholbob

Diamond Member
May 24, 2005
6,303
348
126
Intel's ringbus system has way lower latency than AMD's due to their chiplet design. It's more than enough to offset the fact that AMD can run RAM in Gear 1 and Intel is forced to run in Gear 2 with DDR5.

This may also explain why Intel seems to benefit less (in gaming anyway) from lower RAM latency than just brute forcing more bandwidth, whereas AMD seems to benefit more from lower first word latency in their RAM.
 
Reactions: dr1337

Wolverine2349

Senior member
Oct 9, 2022
371
112
76
Intel's ringbus system has way lower latency than AMD's due to their chiplet design. It's more than enough to offset the fact that AMD can run RAM in Gear 1 and Intel is forced to run in Gear 2 with DDR5.

This may also explain why Intel seems to benefit less (in gaming anyway) from lower RAM latency than just brute forcing more bandwidth, whereas AMD seems to benefit more from lower first word latency in their RAM.

Is Arrow Lake still going to have Intel ring bus system or something else?

Is AMD FCLK/Infinity fabric like Intel ring bus or no?
 

SiliconFly

Golden Member
Mar 10, 2023
1,467
827
96
GPU accleration either looks like crap or has a much larger file size. It's fine for streaming. For archiving, nothing beats CPU transcoding.
On Intel platforms, most encoders/decoders typically use the built-in Quick Sync accelerators for transcoding. It's CPU based (not GPU based).

Encoders/decoders that use Quick Sync don't need AVX512. Only those that don't use Quick Sync (very few) may benefit from AVX512 acceleration if it's coded properly with AVX512 supported.

Imho, this is one of the primary reasons AVX512 is dead in the water I believe.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,955
4,484
136
On Intel platforms, most encoders/decoders typically use the built-in Quick Sync accelerators for transcoding. It's CPU based (not GPU based).

Encoders/decoders that use Quick Sync don't need AVX512. Only those that don't use Quick Sync may benefit from AVX512 acceleration if it's coded properly with AVX512 supported (which is very few).

Imho, this is one of the primary reasons AVX512 is dead in the water I believe.

Quicksync is great, I loved it on my Ivy Bridge, It still resulted in a larger file size though. Whether that matters, well thats up to the user to decide,
 

hemedans

Senior member
Jan 31, 2015
223
113
116
On Intel platforms, most encoders/decoders typically use the built-in Quick Sync accelerators for transcoding. It's CPU based (not GPU based).

Encoders/decoders that use Quick Sync don't need AVX512. Only those that don't use Quick Sync (very few) may benefit from AVX512 acceleration if it's coded properly with AVX512 supported.

Imho, this is one of the primary reasons AVX512 is dead in the water I believe.
Quicksync is gpu based,
 
Reactions: Thunder 57
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |