Question Intel Mont thread

igor_kavinski · Feb 20, 2023

https://store.acer.com/en-us/aspire-3-laptop-a315-510p-3905

First Gracemont laptop available for sale.

Anybody got disposable $500 to buy and test this laptop?

DavidC1 · Dec 22, 2024

igor_kavinski said:
By the way, game fps improving with one Lion Cove and all Skymonts enabled? I think it's coz of reduced number of ring stops.

That wasn't the case previously though. Now it's close enough.

LightningZ71 said:
They didn't do that. Either they know something we don't, or they just made another poor market decision.

Or, they screwed up and/or did not meet to their expectations. Which I think is extremely reasonable when you are dealing with a device with tens of billions of transistors. Literally there are tens of thousands of paths that could have gone wrong. You can't distill the problems down to a simple manner on a project as complex as a CPU. Why was Pentium 4 bad? Why was Bulldozer bad?

After all, Arrowlake was supposed to come before Lunarlake, not the other way around.

I think the original plan was Arrowlake instead of 14900K. That would have changed the competitive landscape quite a bit no?

DavidC1 · Dec 22, 2024

MS_AT said:
It is not so simple. Skymont has 4x128b symetrical SIMD units, Zen2 has asymetrical units so for inner loops of things like matrix multiply they are equally matched.

Zen 2 has 2x FPAdd and 2x FPMul, while Skymont has 4x 128-bit FMA. Zen 2 also has 2x 256-bit Load and 1x 256-bit Store while Skymont has 3x128-bit Loads and 4x128-bit Stores.

Intel said with every time they doubled FP capability doubling L/S was necessary. Zen 2 has 2x the max Flop capability and 33% more Load capability. In certain scenarios yes it'll be that much faster.

Boosting niche scenarios is much easier than having a solid architecture, which Skymont has. FPUs have roots in accelerators because previous to 486 CPUs had FPUs as add-on boards. FP performance can nearly double by simply doubling the number of units, such as with Skymont when combined with uarch improvements. In contrast, Integer has no straight path for doubling performance.

cannedlake240 · Dec 23, 2024

igor_kavinski said:
I think the worst result was getting beat by Zen 2 with HT, of all things.

ZEN TWO!

SMH.

View attachment 113582

One skm 4c cluster gets less L3 bandwidth than a single Zen 2 core. This might be a disadvantage for Skymont. Plus an entire zen 2 ccx is larger than one skm cluster. Any attempt at a unified atom core likely won't be arranged in 4c clusters in "E core" fashion since area efficiency will be less of a concern

igor_kavinski · Dec 23, 2024

cannedlake240 said:
Any attempt at a unified atom core likely won't use arranged in cluster in this fashion

Fair enough.

LightningZ71 · Dec 23, 2024

The only reliable-ish info we have is that on Arrow lake, the Skymont cores take up about 17.8 billion transistors, giving about 1 billion transistors per core and it's share of the L2. No idea if this includes L3 slices, but from the wording, probably not.

The entire Zen2 CCD, 16MB L3, 2 CCXs of 4 cores each, is 3.8 Billion transistors. Zen 2 is at least half the size of Skymont on a per-core basis.

It is conceivable that AMD could produce a processor with 32 Zen 2 cores and 8 Zen 5 cores that are instruction restricted to Zen 2's capabilities in the same space as Arrow Lake and run rings around it in MT tasks with 48 active threads. This assumes that there are no memory bottlenecks.

GTracing · Dec 23, 2024

LightningZ71 said:
The only reliable-ish info we have is that on Arrow lake, the Skymont cores take up about 17.8 billion transistors, giving about 1 billion transistors per core and it's share of the L2. No idea if this includes L3 slices, but from the wording, probably not.

The entire Zen2 CCD, 16MB L3, 2 CCXs of 4 cores each, is 3.8 Billion transistors. Zen 2 is at least half the size of Skymont on a per-core basis.

It is conceivable that AMD could produce a processor with 32 Zen 2 cores and 8 Zen 5 cores that are instruction restricted to Zen 2's capabilities in the same space as Arrow Lake and run rings around it in MT tasks with 48 active threads. This assumes that there are no memory bottlenecks.

I don't know where you heard Arrow Lake's e-cores use 17.8 billion transistors. Intel via TechPowerUp makes it sounds like all the dies together add up to 17.8 billion. Comparing that number and the die size to Apple's M3, I believe that the compute die is 17.8 billion transistors.

Intel says that the die-area of Arrow Lake-S (8P+16E) is 243 mm², and its total transistor count is 17.8 billion.

Intel Core Ultra 9 285K Review

Finally! Intel's new Arrow Lake architecture is launched. The new CPUs are full of design changes, like removal of Hyper-Threading, new Lion Cove P-Cores, an improved Thread Director and more. In our review we got surprising results that were both impressive and disappointing.

www.techpowerup.com

Just dividing transistors by core area like you did gives 95 million transistors for Skymont and 144 million for Zen2. But in my opinion estimates like that are worthless. A few reasons:

Transistor counts are estimates and the number can vary depending on how you're counting.
Some structures on the die are more transistor dense than others. Cache in particular is denser than logic. To further complicate it, new nodes improve logic density more than cache density.
A core can have a more dense or a less dense layout, like Zen5 vs Zen5c.

LightningZ71 · Dec 23, 2024

That's what I get for trusting the AI summary of several articles.

ajsdkflsdjfio · Dec 23, 2024

MS_AT said:
Now if you think Intel thinks that AVX512 is not useful in consumer market then why they are rolling out AVX10? it only purpose is to bring AVX512 features to E cores as AVX10 doesnt require 512b execution units. With AVX10/AVX512 the compiler will spill less often, due to bigger architectural register pool, even for scalar floating point operations. Handling corner cases is easier thanks to masking etc.

Why do you think AVX10 is meant for consumers? I doubt Intel is so worried about bringing AVX features to consumers, more likely they are bringing AVX-10 so that their E-core datacenter products have even more of an edge, since they are going to be their only competitive datacenter products going forward. Intel being able to enable AVX-512 on P-cores and E-cores for consumers sounds more like a trickle off effect rather than their main intention.

It might be true that wider and more parallel computing is going to become more and more popular in CPUs in the future. But as you say, AVX-512 has been completely ignored in the consumer market and a side effect of this is that 99% of consumer applications do not take advantage of AVX-512 performance benefits, or even rely heavily on SIMD computing at all. Going forward AVX512/10 might be more widely adopted for the consumer but I just don't see that being the case currently, meaning E-cores are perfectly fine as they are for consumers.

Either way, I still stand by the idea that AVX-512 is basically useless for 99% of consumers. Zen5 didn't bring anything revolutionary with their full AVX-512 version compared to Zen4 double 256-bit, even with generational architectural changes it performs more like Zen4+. So at the VERY MOST, even if Zen5 improved 0% in all other aspects, AVX-512 implementation would only have made a 5-10% difference on average. With this extremely generous figure, it might be said that AVX-512 is in fact useful for the average consumer, but it still wouldn't be enough to completely invalidate processors without it.

Meteor Late said:
AVX512 is niche for consumers, but so are 16-core CPUs, it's just a Cinebench benchmarking contest about "which CPU has more raw power".

I wouldn't think so. Most modern games utilize 8-cores ATLEAST, and it's extremely beneficial to have more than that so that you can have multiple other programs running in the background without significantly effecting your game. Even when just doing general computer work without advanced productivity applications, many people have multiple programs open and multiple chrome windows with dozens of tabs per window open at the same time. I've seen it where when they press alt tab there are like 10-20 windows open to switch between . Although technically you can just be more organized, I think having 12-16 cores to not slowdown when heavily multi-tasking is very nice to have.

adroc_thurston · Dec 23, 2024

ajsdkflsdjfio said:
they are going to be their only competitive datacenter products going forward.

is this why CWF-AP and future Atom xeons are dead?

ajsdkflsdjfio · Dec 23, 2024

adroc_thurston said:
is this why CWF-AP and future Atom xeons are dead?

The only scenario in which future Atom xeons are dead is one where Intel themselves are dead.

adroc_thurston · Dec 23, 2024

ajsdkflsdjfio said:
The only scenario in which future Atom xeons are dead is one where Intel themselves are dead.

But they are dead (no, seriously.). At least until Unified Core, lol.
It's extra funny given how DCAI realigned itself politically.

ajsdkflsdjfio · Dec 23, 2024

adroc_thurston said:
But they are dead (no, seriously.). At least until Unified Core, lol.
It's extra funny given how DCAI realigned itself politically.

If clearwater forest wasn't Intel's best bet why are they heralding it as such and devoting so much resources to launching it in 2025 with diamond rapids nowhere to be seen? Also why do you think Unified Core is their current plan? If atom is so shit why create an entire architecture surrounding it?

adroc_thurston · Dec 23, 2024

ajsdkflsdjfio said:
If clearwater forest wasn't Intel's best bet why are they heralding it as such and devoting so much resources to launching it in 2025

2025?

ajsdkflsdjfio said:
with diamond rapids nowhere to be seen

That's a Diamond Rapids issue.

ajsdkflsdjfio said:
Also why do you think Unified Core is their current plan?

Because it's a good idea. It worked for AMD. Why won't it work for them?

ajsdkflsdjfio said:
If atom is so shit why create an entire architecture surrounding it?

It's not shit, just that Atom server products are not competitive enough to get hyperscaler traction. Ergo they're dead.
Like SRF-SP was made for Meta and Meta just bought Bergamo instead. gg no re

ajsdkflsdjfio · Dec 23, 2024

adroc_thurston said:
Because it's a good idea. It worked for AMD. Why won't it work for them?

No I'm not asking why Intel is shifting to a singular core architecture, I'm asking why they are shifting to a singular core architecture based on Atom and not P-core or even Royal Core?

adroc_thurston said:
It's not shit, just that Atom server products are not competitive enough to get hyperscaler traction. Ergo they're dead.
Like SRF-SP was made for Meta and Meta just bought Bergamo instead. gg no re

SFR uses Crestmont which is not at all comparable to Skymont/Darkmont. Also in general it was a super late product. Crestmont is only an incremental improvement over gracemont which was released in what... 2021? 2021 e-cores in 2024 is likely not going to be a massive success, especially when you consider that the performance of crestmont/gracemont was a lot weaker compared to their P-core counterparts. Skymont/darkmont is a different story. You have a design based on 2024 e-cores which are much more competitive both compared to their contemporary P-cores and compared to competing Zen cores of the same release cadence.

adroc_thurston said:
That's a Diamond Rapids issue.

I won't argue that Diamond Rapids doesn't have issues, but for the past couple years Intel has been talking about clearwater forest and not diamond rapids, before any major engineering issues might've taken place. I think they simply realized their e-core architecture was going to be their main attraction going forward.

adroc_thurston · Dec 23, 2024

ajsdkflsdjfio said:
I'm asking why they are shifting to a singular core architecture based on Atom and not P-core or even Royal Core?

Because Atom guys aren't washed and output competitive PPA, and Royal Core was an absolute mess and that team is poof anyway.

ajsdkflsdjfio said:
SFR uses Crestmont which is not at all comparable to Skymont/Darkmont.

ughhh. Well it's a throughput part first and foremost.

ajsdkflsdjfio said:
Also in general it was a super late product.

Not really, went from H2'23 to H1'24. Fine.

ajsdkflsdjfio said:
2021 e-cores in 2024 is likely not going to be a massive success, especially when you consider that the performance of crestmont/gracemont was a lot weaker compared to their P-core counterparts

It. Is. a. throughput product. Made for favelas.
144 good enough things on a single die were supposed to sell something but they did, in fact, not.

ajsdkflsdjfio said:
You have a design based on 2024 e-cores which are much more competitive both compared to their contemporary P-cores and compared to competing Zen cores of the same release cadence.

If you say so.

ajsdkflsdjfio said:
I think they simply realized their e-core architecture was going to be their main attraction going forward.

Which is why Atom xeons are dead?

ajsdkflsdjfio · Dec 23, 2024

adroc_thurston said:
Because Atom guys aren't washed and output competitive PPA, and Royal Core was an absolute mess and that team is poof anyway.

Which is exactly what I'm trying to say?

adroc_thurston said:
ughhh. Well it's a throughput part first and foremost.
It. Is. a. throughput product. Made for favelas.
144 good enough things on a single die were supposed to sell something but they did, in fact, not.

So why are you making the comparison between SRF and CWF? Skymont is more than "good enough things" and in my opinion useful for more than just throughput with good IPC in many workloads, the aforementioned AVX-10 being pushed out for improved SIMD performance, and plus the fact that it is power efficient enough so that its performance isn't as kneecapped in server scenarios making it even most competitive versus other cores.

Even in a situation where CWF is only useful for throughput, it would be a vastly superior product to SRF and therefore much more attractive even if just in smaller market.

If you say so

Gracemont was Skylake (2015) IPC in 2021, Skymont is Zen4 (2022) IPC in 2024.

Which is why Atom xeons are dead?

My statement that "their e-core architecture was going to be their main attraction going forward" includes e-cores in Xeon, not just their unified architecture.

adroc_thurston · Dec 23, 2024

ajsdkflsdjfio said:
So why are you making the comparison between SRF and CWF?

same swimlane. same target customer.

ajsdkflsdjfio said:
the aforementioned AVX-10 being pushed out for improved SIMD performance

years and years and years away.

ajsdkflsdjfio said:
Even in a situation where CWF is only useful for throughput, it would be a vastly superior product to SRF and therefore much more attractive even if just in smaller market.

You forgot the part where CWF is much, much more expensive aka loses half the reason of Atom Xeon even exist (they're cheapo).

ajsdkflsdjfio said:
Gracemont was Skylake (2015) IPC in 2021, Skymont is Zen4 (2022) IPC in 2024.

These parts are not defined by 1t PPC, they're defined by socket-level throughput ISO power.

ajsdkflsdjfio said:
My statement that "their e-core architecture was going to be their main attraction going forward" includes e-cores in Xeon, not just their unified architecture.

Well the point is that there are no e-cores in Xeon anymore. Dead.

ajsdkflsdjfio · Dec 23, 2024

adroc_thurston said:
same swimlane. same target customer.

Wrong for the many reasons I pointed out.

adroc_thurston said:
years and years and years away.

You are right about this, but also darkmont and consequently its server versions most definitely improve on vector performance making it more viable than skymont.

adroc_thurston said:
You forgot the part where CWF is much, much more expensive aka loses half the reason of Atom Xeon even exist (they're cheapo).

Maybe because the "Atom" cores are going to perform more like P-cores instead? Why devote all this packaging cost for a core that's "just an e-core" like all previous e-cores.

adroc_thurston said:
These parts are not defined by 1t PPC, they're defined by socket-level throughput ISO power.

Not solely defined by 1t PPC, but if all other things are equal, a 1t PPC improvement would translate 1:1 with socket-level throughput ISO-power.

All other things aren't infact equal but that's a more complicated discussion, which ultimately results in the same conclusion. Skymont is ALSO much more performant for its area/power.

adroc_thurston said:
Well the point is that there are no e-cores in Xeon anymore. Dead.

If you define e-cores as cores similar in function/perf to atom-cores before skymont sure. I define e-cores as cores designed significantly more area/power efficient than P-cores by the atom team. With your definition sure E-cores are dead in Xeons, since these new e-cores are so much better that they play a role larger than traditional e-cores.

adroc_thurston · Dec 23, 2024

ajsdkflsdjfio said:
Wrong for the many reasons I pointed out.

It's literally the same product lane. tf are you on?

ajsdkflsdjfio said:
You are right about this, but also darkmont and consequently its server versions most definitely improve on vector performance making it more viable than skymont.

It's still a joke.

ajsdkflsdjfio said:
Maybe because the "Atom" cores are going to perform more like P-cores instead?

No?

ajsdkflsdjfio said:
Why devote all this packaging cost for a core that's "just an e-core" like all previous e-cores.

18A is kinda expensive and SRAM scaling there is a joke. It's not "all that packaging cost", it's an attempt to ship an immature node early at a reasonable (reasonable enough) price.

ajsdkflsdjfio said:
a 1t PPC improvement would translate 1:1 with socket-level throughput ISO-power.

I have some major news for you, but IPC isn't free.

ajsdkflsdjfio said:
Skymont is ALSO much more performant for its area/power.

you haven't seen a single SKT implementation on the same node as GRT.

ajsdkflsdjfio said:
With your definition sure E-cores are dead in Xeons, since these new e-cores are so much better that they play a role larger than traditional e-cores.

They're dead because the products are dead.
CWF-AF is dodo and RRF also seems not there.

ajsdkflsdjfio · Dec 23, 2024

adroc_thurston said:
It's literally the same product lane. tf are you on?

Same product family =/= same customers. Same product lane =/= same competitiveness either.

adroc_thurston said:
It's still a joke.

Okay then it's a joke and CWF fails, Intel DC/AI takes further losses putting them that much closer to Intel failing.

adroc_thurston said:
No?

Yes? I just compared gracemont and skymont to equivalent level P-cores ipc. Skymont is clearly closer to contemporary P-cores than gracemont meaning it performs more of the functions of a P-core.

adroc_thurston said:
18A is kinda expensive and SRAM scaling there is a joke. It's not "all that packaging cost", it's an attempt to ship an immature node early at a reasonable (reasonable enough) price.

Right, the ONLY purpose of CWF is to ship 18a out. Not like it's going to be their most competitive offering in the past decade.

adroc_thurston said:
I have some major news for you, but IPC isn't free.

I addressed the issue of IPC not being free by saying that even considering area/power increases, Skymont IPC is still massively improved. IPC is indeed not free, but it's 30% ipc gain is not reflected in an equivalent power/area increase.

Relative to their matching P-cores, they are about the same size. OFC the P-cores and subsequently E-cores increased size from ADL->ARL but even so Lion cove had a 9% (if even) IPC gain while skymont had 30%+ gain with a similar increase in die size.

adroc_thurston said:
you haven't seen a single SKT implementation on the same node as GRT.

We've seen GLC vs LNC, and GRT vs SKT. So we can extrapolate enough data there so show that SKT is indeed much more performant all things considered.

adroc_thurston said:
They're dead because the products are dead.
CWF-AF is dodo and RRF also seems not there.

Right... because you said so.

adroc_thurston · Dec 23, 2024

ajsdkflsdjfio said:
Same product family =/= same customers.

Yeah it is. A favela part is a favela part and boy does it have favela customers.

ajsdkflsdjfio said:
Okay then it's a joke and CWF fails, Intel DC/AI takes further losses putting them that much closer to Intel failing.

Not getting it cap'n.

ajsdkflsdjfio said:
Skymont is clearly closer to contemporary P-cores than gracemont meaning it performs more of the functions of a P-core.

I'm sorry to disappoint you but there's more to a big core than SIR2017 1t rate.

ajsdkflsdjfio said:
Right, the ONLY purpose of CWF is to ship 18a out.

YES. Pat spent eons talking about i18a and how CWF is their first product on it!

ajsdkflsdjfio said:
I addressed the issue of IPC not being free by saying that even considering area/power increases, Skymont IPC is still massively improved. IPC is indeed not free, but it's 30% ipc gain is not reflected in an equivalent power/area increase.

IT'S A SHRINK. THERE IS NO ISO NODE COMPARISON BETWEEN GRACEMONT AND SKYMONT.

ajsdkflsdjfio said:
We've seen GLC vs LNC, and SKT vs GRT. So we can extrapolate enough data there so show that SKT is indeed much more performant all things considered.

what does that mean

ajsdkflsdjfio said:
Right... because you said so.

yeah.

ajsdkflsdjfio · Dec 23, 2024

adroc_thurston said:
what does that mean

We can extrapolate node differences by showing GLC->LNC improvements vs GRT->SKT improvements. Both comparisons are intel 7 vs N3B and have similar size increases(ISO node), SKT gains much more performance than GLC to the effect of triple the amount.

Whatever, if you genuinely think Skymont is an uncompetitive as GRT/Crestmont after all this, then I can't help you.

adroc_thurston · Dec 23, 2024

ajsdkflsdjfio said:
We can extrapolate node differences by showing GLC->LNC improvements vs GRT->SKT improvements.

you can, in fact, NOT.
They're not even the same foundry ffs.

ajsdkflsdjfio · Dec 23, 2024

adroc_thurston said:
you can, in fact, NOT.
They're not even the same foundry ffs.

That's not what I'm talking about. You say that there is no ISO node comparison between gracemont and skymont which is true.

I counter by saying we can estimate ISO node comparisons by comparing Golden cove/Lion cove to gracemont/skymont. Both comparisons have the same node shrink from Intel 7 to N3B, both comparisons have similar size increases(ISO node) with e-cores having around 30% area of their respective P-cores. With these things being relatively equal, Lion cove gains much less performance than Skymont does with the same node shrink and core size increase (ISO node).

adroc_thurston · Dec 23, 2024

ajsdkflsdjfio said:
I counter by saying we can estimate ISO node comparisons by comparing Golden cove/Lion cove to gracemont/skymont

you in fact, can not. Not one knows any actual perf/power/area/Cac/whatever difference between i7 and N3b. Just a futile effort.

Question Intel Mont thread

Lifer

Golden Member

Golden Member

Senior member

Lifer

Platinum Member

Senior member

Platinum Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Member

Diamond Member