Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	8 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (20A)	Arrow Lake (N3B)	Arrow Lake Refresh (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop Only	Desktop & Mobile H&HX	Desktop Only	Mobile U Only	Mobile H
Process Node	Intel 4	Intel 20A	TSMC N3B	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Q1 2025 ?	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2025 ?	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	6P + 8E ?	8P + 16E	8P + 32E	4P + 4E	4P + 8E
LLC	24 MB	24 MB ?	36 MB ?	?	8 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

TESKATLIPOKA · Dec 29, 2023

SiliconFly said:
Apple Silicon has only one thread per core. Hyper threading isn't a necessity for IPC uplift in LNC I guess. New architecture, new paradigm.

HT increases nT performance by 25-30%. It's certainly not a small amount, but It also increased power consumption by a similar amount, as far as I remember from Skylake reviews I think.
So let's say 4 P-cores consume 20W at 2.5GHz, then enabling HT would increase It to 25W.
In this case by getting rid of HT, you save 5W, which can be used for higher clockspeed.
The problem is that this 5W or 25% higher power won't allow you to clock these 4 P-cores 25% higher, so you either loose performance or efficiency.
Intel has E-cores, but you would need 2 E-cores at 2.2-2.5GHz, which can't consume more than 5W in total to compensate for the missing HT.

FlameTail said:
I think by removing HT, they can dedicate more resources in the core to uplift ST performance.

HT uses very little core space <10%, as far as I remember.

mikk · Dec 29, 2023

From Whiskey Lake To Meteor Lake: The Intel CPU Linux Performance Evolution: https://www.phoronix.com/review/intel-whiskeylake-meteorlake/12

Remember MLID told us Meteor Lake is like Whiskey Lake

SiliconFly · Dec 29, 2023

Hulk said:
I have to admit I do listen to MLID, it's a guilty pleasure. Every now and then while I'm working I put him on in the background. You know what I noticed? He's not really a tech guy. Seems like his interest/knowledge in the tech is superficial. Where is his encyclopedic is in his marketing knowledge. He seems to know when every product CPU and GPU was released, how much it cost, and general performance.

He was explaining to the other guy in the latest video how the E cores are for efficiency but they're really not all that efficient. So many people still don't understand that Intel told us they were for area efficiency, not power efficiency. MLID was saying the P cores are actually more efficient than the E cores! Yeah, that's right, that's because they should be given the die space they consume for the compute they produce. Argh!

I'm sure he's directly sponsored by the same company he promotes all the time. He may lack technical knowledge, but he has one of the best sources (inside amd I presume).

adroc_thurston · Dec 29, 2023

TESKATLIPOKA said:
HT increases nT performance by 25-30%

welll ughhh.
Very workload-dependent and there are other factors.

CouncilorIrissa · Dec 29, 2023

SiliconFly said:
I'm sure he's directly sponsored by the same company he promotes all the time. He may lack technical knowledge, but he has one of the best sources (inside amd I presume).

I don't think he promotes AMD, he just throws shit at the wall and then cleans up that which didn't stick.
Didn't he suggest that RWC would bring double-digit IPC gains?

adroc_thurston · Dec 29, 2023

CouncilorIrissa said:
I don't think he promotes AMD, he just throws shit at the wall and then cleans up that which didn't stick.
Didn't he suggest that RWC would bring double-digit IPC gains?

Bingo, all e-beggars are like that.

mikk · Dec 29, 2023

Seems like Acer lowered the sustained power quite a bit in retail devices on the Acer Swift Go, it's 27W in this retail review :

From the pre production test from Notebookcheck:

As mentioned at the beginning, the Acer Swift Go 14 we are dealing with is a device that corresponds to the standard spec hardware. Our review machine's software and firmware aren't yet quite perfected. For example, in our Swift Go 14, the values for the boost performance were initially configured somewhat too high. In the course of the test, we also had problems with the preinstalled AlterView which creates visually enticing 3D backgrounds. After a lively exchange with Acer, we decided to remove the software. We were also able to lower the PL2 to 55 watts while leaving the PL1 at 45 watts, all with the aid of TechPowerUp's Throttle Stop. This helped the laptop run considerably better and more stable. Acer will undertake some significantly more detailed fine-tuning when it comes the final performance management. This should result in the laptop enjoying better performance than it currently does.

DavidC1 · Dec 29, 2023

H433x0n said:
I disagree that the P cores are all that matters. It’s entirely possible that Skymont nearly has a 12-14% IPC increase over Gracemont. This gets Skymont pretty close to Zen 3 IPC. So ARL will basically have 8 pcores with 16 ecores that are basically equivalent to a 5950X without SMT.

Nevermind Skymont. Crestmont* in Meteorlake improves it by 4-6% according to Intel, but based on one Chinese review, they got nearly 7.5% improvement.

*Sierra Glen gets zero pretty much. Opposite on the server, where the Granite Rapids core gets decent improvements but Redwood Cove in MTL gets almost nothing.

JoeRambo said:
So they are preparing atom core for closing the gap with big cores -> introducing the 3rd cluster, having 3 fetch queues and chewing 36-48 bytes per clock from L1I

Chipsandcheese has got one thing wrong about Gracemont.

Gracemont is fed by the L1i cache at 2x32B rate, which is double Golden Cove's and also double the rate fed by it's own OD-ILD(2x16B).

Regards,
formerly IU2K

SiliconFly · Dec 29, 2023

TESKATLIPOKA said:
HT increases nT performance by 25-30%. It's certainly not a small amount, but It also increased power consumption by a similar amount, as far as I remember from Skylake reviews I think.
So let's say 4 P-cores consume 20W at 2.5GHz, then enabling HT would increase It to 25W.
In this case by getting rid of HT, you save 5W, which can be used for higher clockspeed.
The problem is that this 5W or 25% higher power won't allow you to clock these 4 P-cores 25% higher, so you either loose performance or efficiency.
Intel has E-cores, but you would need 2 E-cores at 2.2-2.5GHz, which can't consume more than 5W in total to compensate for the missing HT.

HT uses very little core space <10%, as far as I remember.

One of the key reasons Intel might have ditched Hyper-threading in LNC is cos it's power hungry. One of the stated drawbacks of HT:

"HT was criticized for energy inefficiency. ARM stated SMT can use up to 46% more power than ordinary dual-core designs (and also increases cache thrashing)."

And LNC being a grounds up power efficient design, HT may not be a good fit. And the most important reason I think is, Hyper-threading does not contribute to ST performance in any way. And having HT in this age with each CPU having multiple cores doesn't make much sense (cos HT kicks in only when all the physical cores are running full steam already).

Note: HT is actually very good for servers though.

adroc_thurston · Dec 29, 2023

DavidC1 said:
where the Granite Rapids core gets decent improvements

no? no.

DavidC1 · Dec 29, 2023

SiliconFly said:
One of the key reasons Intel might have ditched Hyper-threading in LNC is cos it's power hungry. One of the stated drawbacks of HT:

"HT was criticized for energy inefficiency. ARM stated SMT can use up to 46% more power than ordinary dual-core designs (and also increases cache thrashing)."

And LNC being a grounds up power efficient design, HT may not be a good fit. And the most important reason I think is, Hyper-threading does not contribute to ST performance in any way. And having HT in this age with each CPU having multiple cores doesn't make much sense (cos HT kicks in only when all the physical cores are running full steam already).

SMT uses less than 5% die area of a core, probably 2-3%.

The problem with SMT that even before the potential security flaws, it increases validation time with the design. Back in the low core days it made much sense, but it seems we're in an era where the tradeoff isn't as worth it.

Better execution over many generations may end up being better over having HT.

Remember, their own Atom team that has consistent track record of execution also does not use HT, abandoned ever since they moved to OoOE back in the second Atom.

@adroc_thurston GNR gets Improved FP, OoOE units, and improved branch predictor over just the L1i doubling present in client Redwood Cove. Decent meaning few % not zero.

SiliconFly · Dec 29, 2023

DavidC1 said:
SMT uses less than 5% die area of a core, probably 2-3%.

The problem with SMT that even before the potential security flaws, it increases validation time with the design. Back in the low core days it made much sense, but it seems we're in an era where the tradeoff isn't as worth it.

Better execution over many generations may end up being better over having HT.

Remember, their own Atom team that has consistent track record of execution also does not use HT, abandoned ever since they moved to OoOE back in the second Atom.

@adroc_thurston GNR gets Improved FP, OoOE units, and improved branch predictor over just the L1i doubling present in client Redwood Cove. Decent meaning few % not zero.

Agree. Actually, recent implementations may take more than 5% due to further optimizations and security mitigations.

And the added complexity is just not worth it as it gets in the way of ST performance design/optimizations.

DavidC1 · Dec 29, 2023

SiliconFly said:
Agree. Actually, recent implementations may take more than 5% due to further optimizations and security mitigations.

It don't matter. Extra space taken up by SMT is still 2-3%.

SMT is actually pretty power efficient too, when the tasks are well threaded. Hence why some call it "poor man's SMP". But now even pocket computers have 4+ cores.

But making it difficult to validate matters, because people always forget it's the guys working on the product is what makes it work, and any theoretical gains are nullified by increased risks. Every generation that gets delayed feeds into the successors. Every generation with SMT increases the potential for the delay.

adroc_thurston · Dec 29, 2023

DavidC1 said:
GNR gets Improved FP, OoOE units, and improved branch predictor over just the L1i doubling present in client Redwood Cove. Decent meaning few % not zero.

No, it's the exact same thing but on i3. lmao.

DavidC1 · Dec 29, 2023

adroc_thurston said:
No, it's the exact same thing but on i3. lmao.

I'll wait for better sources as they had two distinct presentations and we know Redwood Cove on MTL is a 0% gain.

SiliconFly · Dec 29, 2023

DavidC1 said:
I'll wait for better sources as they had two distinct presentations and we know Redwood Cove on MTL is a 0% gain.

RWC is exactly same as previous gen, clock frequencies too are similar to previous gen & no known significant performance optimizations either. They played it too safe. So, we can't expect much performance gains with RWC at this point I guess.

But Intel 7 to Intel 4 combined with DLVR should have provided at least 15% to 20% efficiency gains for RWC alone. But the power efficiency results with pre-production laptops are all over the place and it's a bit confusing at the moment. Hopefully, newer tests with updated pcode should give clearer results.

ondma · Dec 29, 2023

SiliconFly said:
One of the key reasons Intel might have ditched Hyper-threading in LNC is cos it's power hungry. One of the stated drawbacks of HT:

"HT was criticized for energy inefficiency. ARM stated SMT can use up to 46% more power than ordinary dual-core designs (and also increases cache thrashing)."

And LNC being a grounds up power efficient design, HT may not be a good fit. And the most important reason I think is, Hyper-threading does not contribute to ST performance in any way. And having HT in this age with each CPU having multiple cores doesn't make much sense (cos HT kicks in only when all the physical cores are running full steam already).

Note: HT is actually very good for servers though.

I thought it was because they were supposed to go to "rentable units" and they could not get them working.

SiliconFly · Dec 29, 2023

ondma said:
I thought it was because they were supposed to go to "rentable units" and they could not get them working.

Rentable Units is a relatively new concept. And I believe, at this point, it's just a concept and may not make it into end products any time now. It's extremely complex. If it was doable, other companies like AMD, Apple, Qualcomm too would have picked up on it already. So, it's safe to say, not to give it too much thought until (or if) they announce it..

Other reason is, it sounds too good to be true. The golden rule is, on a given (existing) system, a thread's performance cannot exceed the performance of a single core. Whereas, in a Rentable Unit system, the thread performance can casually exceed that hard limit without breaking a sweat. Sounds like a dream, cos it probably is.

Other way to look at it is, on a 8 core system with a full implementation of Rentable Units, a single thread can run 8X faster when compared to running on a 8 core computer without Rentable Units. A single thread's speed is not limited by a single core's ST performance anymore!

In short. under very ideal circumstances, ST & MT performance will become the same in Rentable Units. Sounds way too good to be true. So, meh!

controlflow · Dec 29, 2023

mikk said:
Seems like Acer lowered the sustained power quite a bit in retail devices on the Acer Swift Go, it's 27W in this retail review :

From the pre production test from Notebookcheck:

They show 2 different Cinebench R23 MT scores. 15,047 and 13,446. Is this an error or is the 15k score at 45W+ and the lower score at 28W? He mentions "28W" for the 13.4k score but it is not clear if he is saying the test was truly capped at 28W or if it was boosting above it for a while.

These numbers seem much higher than the Zenbook on R23.

FlameTail · Dec 29, 2023

SiliconFly said:
In short. under very ideal circumstances, ST & MT performance will become the same in Rentable Units. Sounds way too good to be true. So, meh!

that's nuts.

ondma · Dec 29, 2023

FlameTail said:
that's nuts.

IDK, I am not a computer engineer, but I read an article comparing rentable units and hyperthreading, and I didnt see them come to that conclusion.
The surprising thing about ARL is that if it truly does not have hyperthreading, the leaks I saw didnt say anything about increasing E cores either.

Seems like the worst of both worlds. Loss of single thread performance due to clock regression, and loss of ultimate MT performance due to no HT.
My feeling is that Zen 5 will dominate in both.

Saylick · Dec 29, 2023

SiliconFly said:
Other way to look at it is, on a 8 core system with a full implementation of Rentable Units, a single thread can run 8X faster when compared to running on a 8 core computer without Rentable Units. A single thread's speed is not limited by a single core's ST performance anymore!

Pretty sure you cannot scale ST performance by X amount just by scaling up a theoretical core's resources by the same increase. A single thread will never fully saturate the width of a core at all times because of instruction dependencies, which is exactly why going wider doesn't give you proportional IPC uplift. It's also why SMT was created, so that you get more throughput of a given core, but not more ST performance. Rentable Units, if it's actually possible, likely means better utilization of silicon area since you don't need a separate big cores, which are not efficient from a perf/mm2 point of view since ST performance has diminishing returns with core area.

Khato · Dec 29, 2023

Back on MTL, another video has come along comparing the Asus new and old BIOS:

The benchmarks themselves have the usual problem of not being run at a static power level, so I wouldn't say they're of much interest. What is nice is starting around 3 minutes in are graphs of average temperature, frequency, and power versus time on a Prime95 run with both new and old BIOS. While the power graph clearly indicates that the new BIOS does allow the CPU to consume more power for a time, by the end of the graph the power consumption is equivalent between new and old BIOS... but the clock speed with new BIOS at that steady state is about 10% higher than the old BIOS. Also the new BIOS shows a markedly more consistent clock speed in general.

cebri1 · Dec 29, 2023

Khato said:
Back on MTL, another video has come along comparing the Asus new and old BIOS:

The benchmarks themselves have the usual problem of not being run at a static power level, so I wouldn't say they're of much interest. What is nice is starting around 3 minutes in are graphs of average temperature, frequency, and power versus time on a Prime95 run with both new and old BIOS. While the power graph clearly indicates that the new BIOS does allow the CPU to consume more power for a time, by the end of the graph the power consumption is equivalent between new and old BIOS... but the clock speed with new BIOS at that steady state is about 10% higher than the old BIOS. Also the new BIOS shows a markedly more consistent clock speed in general.

That is pretty much in line with other results that showed a 10-12% increase in performance at different power levels.

Khato · Dec 29, 2023

Regarding rentable units... Sadly the reality is quite boring, especially compared to the fanciful fiction. I bet that the term was included without context in some presentation that a non-technical 'leaker' received. So clearly some explanation for the term needed to be created in order to be able to 'leak' it.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Platinum Member

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Golden Member

Diamond Member

Senior member

Golden Member

Senior member

Diamond Member

Senior member

Golden Member

Platinum Member

Golden Member

Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Member

Golden Member