Question Intel Mont thread

Page 15 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DavidC1

Golden Member
Dec 29, 2023
1,435
2,333
96
By the way, game fps improving with one Lion Cove and all Skymonts enabled? I think it's coz of reduced number of ring stops.
That wasn't the case previously though. Now it's close enough.
They didn't do that. Either they know something we don't, or they just made another poor market decision.
Or, they screwed up and/or did not meet to their expectations. Which I think is extremely reasonable when you are dealing with a device with tens of billions of transistors. Literally there are tens of thousands of paths that could have gone wrong. You can't distill the problems down to a simple manner on a project as complex as a CPU. Why was Pentium 4 bad? Why was Bulldozer bad?

After all, Arrowlake was supposed to come before Lunarlake, not the other way around.

I think the original plan was Arrowlake instead of 14900K. That would have changed the competitive landscape quite a bit no?
 
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,435
2,333
96
It is not so simple. Skymont has 4x128b symetrical SIMD units, Zen2 has asymetrical units so for inner loops of things like matrix multiply they are equally matched.
Zen 2 has 2x FPAdd and 2x FPMul, while Skymont has 4x 128-bit FMA. Zen 2 also has 2x 256-bit Load and 1x 256-bit Store while Skymont has 3x128-bit Loads and 4x128-bit Stores.

Intel said with every time they doubled FP capability doubling L/S was necessary. Zen 2 has 2x the max Flop capability and 33% more Load capability. In certain scenarios yes it'll be that much faster.

Boosting niche scenarios is much easier than having a solid architecture, which Skymont has. FPUs have roots in accelerators because previous to 486 CPUs had FPUs as add-on boards. FP performance can nearly double by simply doubling the number of units, such as with Skymont when combined with uarch improvements. In contrast, Integer has no straight path for doubling performance.
 
Last edited:

cannedlake240

Senior member
Jul 4, 2024
247
138
76
I think the worst result was getting beat by Zen 2 with HT, of all things.

ZEN TWO!

SMH.

View attachment 113582
One skm 4c cluster gets less L3 bandwidth than a single Zen 2 core. This might be a disadvantage for Skymont. Plus an entire zen 2 ccx is larger than one skm cluster. Any attempt at a unified atom core likely won't be arranged in 4c clusters in "E core" fashion since area efficiency will be less of a concern
 
Last edited:

LightningZ71

Platinum Member
Mar 10, 2017
2,077
2,525
136
The only reliable-ish info we have is that on Arrow lake, the Skymont cores take up about 17.8 billion transistors, giving about 1 billion transistors per core and it's share of the L2. No idea if this includes L3 slices, but from the wording, probably not.

The entire Zen2 CCD, 16MB L3, 2 CCXs of 4 cores each, is 3.8 Billion transistors. Zen 2 is at least half the size of Skymont on a per-core basis.

It is conceivable that AMD could produce a processor with 32 Zen 2 cores and 8 Zen 5 cores that are instruction restricted to Zen 2's capabilities in the same space as Arrow Lake and run rings around it in MT tasks with 48 active threads. This assumes that there are no memory bottlenecks.
 

GTracing

Senior member
Aug 6, 2021
440
1,033
106
The only reliable-ish info we have is that on Arrow lake, the Skymont cores take up about 17.8 billion transistors, giving about 1 billion transistors per core and it's share of the L2. No idea if this includes L3 slices, but from the wording, probably not.

The entire Zen2 CCD, 16MB L3, 2 CCXs of 4 cores each, is 3.8 Billion transistors. Zen 2 is at least half the size of Skymont on a per-core basis.

It is conceivable that AMD could produce a processor with 32 Zen 2 cores and 8 Zen 5 cores that are instruction restricted to Zen 2's capabilities in the same space as Arrow Lake and run rings around it in MT tasks with 48 active threads. This assumes that there are no memory bottlenecks.
I don't know where you heard Arrow Lake's e-cores use 17.8 billion transistors. Intel via TechPowerUp makes it sounds like all the dies together add up to 17.8 billion. Comparing that number and the die size to Apple's M3, I believe that the compute die is 17.8 billion transistors.

Intel says that the die-area of Arrow Lake-S (8P+16E) is 243 mm², and its total transistor count is 17.8 billion.

Just dividing transistors by core area like you did gives 95 million transistors for Skymont and 144 million for Zen2. But in my opinion estimates like that are worthless. A few reasons:
  • Transistor counts are estimates and the number can vary depending on how you're counting.
  • Some structures on the die are more transistor dense than others. Cache in particular is denser than logic. To further complicate it, new nodes improve logic density more than cache density.
  • A core can have a more dense or a less dense layout, like Zen5 vs Zen5c.
 

ajsdkflsdjfio

Member
Nov 20, 2024
185
132
76
Now if you think Intel thinks that AVX512 is not useful in consumer market then why they are rolling out AVX10? it only purpose is to bring AVX512 features to E cores as AVX10 doesnt require 512b execution units. With AVX10/AVX512 the compiler will spill less often, due to bigger architectural register pool, even for scalar floating point operations. Handling corner cases is easier thanks to masking etc.
Why do you think AVX10 is meant for consumers? I doubt Intel is so worried about bringing AVX features to consumers, more likely they are bringing AVX-10 so that their E-core datacenter products have even more of an edge, since they are going to be their only competitive datacenter products going forward. Intel being able to enable AVX-512 on P-cores and E-cores for consumers sounds more like a trickle off effect rather than their main intention.

It might be true that wider and more parallel computing is going to become more and more popular in CPUs in the future. But as you say, AVX-512 has been completely ignored in the consumer market and a side effect of this is that 99% of consumer applications do not take advantage of AVX-512 performance benefits, or even rely heavily on SIMD computing at all. Going forward AVX512/10 might be more widely adopted for the consumer but I just don't see that being the case currently, meaning E-cores are perfectly fine as they are for consumers.



Either way, I still stand by the idea that AVX-512 is basically useless for 99% of consumers. Zen5 didn't bring anything revolutionary with their full AVX-512 version compared to Zen4 double 256-bit, even with generational architectural changes it performs more like Zen4+. So at the VERY MOST, even if Zen5 improved 0% in all other aspects, AVX-512 implementation would only have made a 5-10% difference on average. With this extremely generous figure, it might be said that AVX-512 is in fact useful for the average consumer, but it still wouldn't be enough to completely invalidate processors without it.
AVX512 is niche for consumers, but so are 16-core CPUs, it's just a Cinebench benchmarking contest about "which CPU has more raw power".
I wouldn't think so. Most modern games utilize 8-cores ATLEAST, and it's extremely beneficial to have more than that so that you can have multiple other programs running in the background without significantly effecting your game. Even when just doing general computer work without advanced productivity applications, many people have multiple programs open and multiple chrome windows with dozens of tabs per window open at the same time. I've seen it where when they press alt tab there are like 10-20 windows open to switch between . Although technically you can just be more organized, I think having 12-16 cores to not slowdown when heavily multi-tasking is very nice to have.
 

ajsdkflsdjfio

Member
Nov 20, 2024
185
132
76
But they are dead (no, seriously.). At least until Unified Core, lol.
It's extra funny given how DCAI realigned itself politically.
If clearwater forest wasn't Intel's best bet why are they heralding it as such and devoting so much resources to launching it in 2025 with diamond rapids nowhere to be seen? Also why do you think Unified Core is their current plan? If atom is so shit why create an entire architecture surrounding it?
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,360
7,534
96
If clearwater forest wasn't Intel's best bet why are they heralding it as such and devoting so much resources to launching it in 2025
2025?
with diamond rapids nowhere to be seen
That's a Diamond Rapids issue.
Also why do you think Unified Core is their current plan?
Because it's a good idea. It worked for AMD. Why won't it work for them?
If atom is so shit why create an entire architecture surrounding it?
It's not shit, just that Atom server products are not competitive enough to get hyperscaler traction. Ergo they're dead.
Like SRF-SP was made for Meta and Meta just bought Bergamo instead. gg no re
 

ajsdkflsdjfio

Member
Nov 20, 2024
185
132
76
Because it's a good idea. It worked for AMD. Why won't it work for them?
No I'm not asking why Intel is shifting to a singular core architecture, I'm asking why they are shifting to a singular core architecture based on Atom and not P-core or even Royal Core?
It's not shit, just that Atom server products are not competitive enough to get hyperscaler traction. Ergo they're dead.
Like SRF-SP was made for Meta and Meta just bought Bergamo instead. gg no re
SFR uses Crestmont which is not at all comparable to Skymont/Darkmont. Also in general it was a super late product. Crestmont is only an incremental improvement over gracemont which was released in what... 2021? 2021 e-cores in 2024 is likely not going to be a massive success, especially when you consider that the performance of crestmont/gracemont was a lot weaker compared to their P-core counterparts. Skymont/darkmont is a different story. You have a design based on 2024 e-cores which are much more competitive both compared to their contemporary P-cores and compared to competing Zen cores of the same release cadence.
That's a Diamond Rapids issue.
I won't argue that Diamond Rapids doesn't have issues, but for the past couple years Intel has been talking about clearwater forest and not diamond rapids, before any major engineering issues might've taken place. I think they simply realized their e-core architecture was going to be their main attraction going forward.
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,360
7,534
96
I'm asking why they are shifting to a singular core architecture based on Atom and not P-core or even Royal Core?
Because Atom guys aren't washed and output competitive PPA, and Royal Core was an absolute mess and that team is poof anyway.
SFR uses Crestmont which is not at all comparable to Skymont/Darkmont.
ughhh. Well it's a throughput part first and foremost.
Also in general it was a super late product.
Not really, went from H2'23 to H1'24. Fine.
2021 e-cores in 2024 is likely not going to be a massive success, especially when you consider that the performance of crestmont/gracemont was a lot weaker compared to their P-core counterparts
It. Is. a. throughput product. Made for favelas.
144 good enough things on a single die were supposed to sell something but they did, in fact, not.
You have a design based on 2024 e-cores which are much more competitive both compared to their contemporary P-cores and compared to competing Zen cores of the same release cadence.
If you say so.
I think they simply realized their e-core architecture was going to be their main attraction going forward.
Which is why Atom xeons are dead?
 

ajsdkflsdjfio

Member
Nov 20, 2024
185
132
76
Because Atom guys aren't washed and output competitive PPA, and Royal Core was an absolute mess and that team is poof anyway.
Which is exactly what I'm trying to say?
ughhh. Well it's a throughput part first and foremost.
It. Is. a. throughput product. Made for favelas.
144 good enough things on a single die were supposed to sell something but they did, in fact, not.
So why are you making the comparison between SRF and CWF? Skymont is more than "good enough things" and in my opinion useful for more than just throughput with good IPC in many workloads, the aforementioned AVX-10 being pushed out for improved SIMD performance, and plus the fact that it is power efficient enough so that its performance isn't as kneecapped in server scenarios making it even most competitive versus other cores.

Even in a situation where CWF is only useful for throughput, it would be a vastly superior product to SRF and therefore much more attractive even if just in smaller market.
If you say so
Gracemont was Skylake (2015) IPC in 2021, Skymont is Zen4 (2022) IPC in 2024.

Which is why Atom xeons are dead?
My statement that "their e-core architecture was going to be their main attraction going forward" includes e-cores in Xeon, not just their unified architecture.
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,360
7,534
96
So why are you making the comparison between SRF and CWF?
same swimlane. same target customer.
the aforementioned AVX-10 being pushed out for improved SIMD performance
years and years and years away.
Even in a situation where CWF is only useful for throughput, it would be a vastly superior product to SRF and therefore much more attractive even if just in smaller market.
You forgot the part where CWF is much, much more expensive aka loses half the reason of Atom Xeon even exist (they're cheapo).
Gracemont was Skylake (2015) IPC in 2021, Skymont is Zen4 (2022) IPC in 2024.
These parts are not defined by 1t PPC, they're defined by socket-level throughput ISO power.
My statement that "their e-core architecture was going to be their main attraction going forward" includes e-cores in Xeon, not just their unified architecture.
Well the point is that there are no e-cores in Xeon anymore. Dead.
 

ajsdkflsdjfio

Member
Nov 20, 2024
185
132
76
same swimlane. same target customer.
Wrong for the many reasons I pointed out.
years and years and years away.
You are right about this, but also darkmont and consequently its server versions most definitely improve on vector performance making it more viable than skymont.
You forgot the part where CWF is much, much more expensive aka loses half the reason of Atom Xeon even exist (they're cheapo).
Maybe because the "Atom" cores are going to perform more like P-cores instead? Why devote all this packaging cost for a core that's "just an e-core" like all previous e-cores.
These parts are not defined by 1t PPC, they're defined by socket-level throughput ISO power.
Not solely defined by 1t PPC, but if all other things are equal, a 1t PPC improvement would translate 1:1 with socket-level throughput ISO-power.

All other things aren't infact equal but that's a more complicated discussion, which ultimately results in the same conclusion. Skymont is ALSO much more performant for its area/power.
Well the point is that there are no e-cores in Xeon anymore. Dead.
If you define e-cores as cores similar in function/perf to atom-cores before skymont sure. I define e-cores as cores designed significantly more area/power efficient than P-cores by the atom team. With your definition sure E-cores are dead in Xeons, since these new e-cores are so much better that they play a role larger than traditional e-cores.
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,360
7,534
96
Wrong for the many reasons I pointed out.
It's literally the same product lane. tf are you on?
You are right about this, but also darkmont and consequently its server versions most definitely improve on vector performance making it more viable than skymont.
It's still a joke.
Maybe because the "Atom" cores are going to perform more like P-cores instead?
No?
Why devote all this packaging cost for a core that's "just an e-core" like all previous e-cores.
18A is kinda expensive and SRAM scaling there is a joke. It's not "all that packaging cost", it's an attempt to ship an immature node early at a reasonable (reasonable enough) price.
a 1t PPC improvement would translate 1:1 with socket-level throughput ISO-power.
I have some major news for you, but IPC isn't free.
Skymont is ALSO much more performant for its area/power.
you haven't seen a single SKT implementation on the same node as GRT.
With your definition sure E-cores are dead in Xeons, since these new e-cores are so much better that they play a role larger than traditional e-cores.
They're dead because the products are dead.
CWF-AF is dodo and RRF also seems not there.
 

ajsdkflsdjfio

Member
Nov 20, 2024
185
132
76
It's literally the same product lane. tf are you on?
Same product family =/= same customers. Same product lane =/= same competitiveness either.
It's still a joke.
Okay then it's a joke and CWF fails, Intel DC/AI takes further losses putting them that much closer to Intel failing.
Yes? I just compared gracemont and skymont to equivalent level P-cores ipc. Skymont is clearly closer to contemporary P-cores than gracemont meaning it performs more of the functions of a P-core.
18A is kinda expensive and SRAM scaling there is a joke. It's not "all that packaging cost", it's an attempt to ship an immature node early at a reasonable (reasonable enough) price.
Right, the ONLY purpose of CWF is to ship 18a out. Not like it's going to be their most competitive offering in the past decade.
I have some major news for you, but IPC isn't free.
I addressed the issue of IPC not being free by saying that even considering area/power increases, Skymont IPC is still massively improved. IPC is indeed not free, but it's 30% ipc gain is not reflected in an equivalent power/area increase.

Relative to their matching P-cores, they are about the same size. OFC the P-cores and subsequently E-cores increased size from ADL->ARL but even so Lion cove had a 9% (if even) IPC gain while skymont had 30%+ gain with a similar increase in die size.
you haven't seen a single SKT implementation on the same node as GRT.
We've seen GLC vs LNC, and GRT vs SKT. So we can extrapolate enough data there so show that SKT is indeed much more performant all things considered.
They're dead because the products are dead.
CWF-AF is dodo and RRF also seems not there.
Right... because you said so.
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,360
7,534
96
Same product family =/= same customers.
Yeah it is. A favela part is a favela part and boy does it have favela customers.
Okay then it's a joke and CWF fails, Intel DC/AI takes further losses putting them that much closer to Intel failing.
Not getting it cap'n.
Skymont is clearly closer to contemporary P-cores than gracemont meaning it performs more of the functions of a P-core.
I'm sorry to disappoint you but there's more to a big core than SIR2017 1t rate.
Right, the ONLY purpose of CWF is to ship 18a out.
YES. Pat spent eons talking about i18a and how CWF is their first product on it!
I addressed the issue of IPC not being free by saying that even considering area/power increases, Skymont IPC is still massively improved. IPC is indeed not free, but it's 30% ipc gain is not reflected in an equivalent power/area increase.
IT'S A SHRINK. THERE IS NO ISO NODE COMPARISON BETWEEN GRACEMONT AND SKYMONT.
We've seen GLC vs LNC, and SKT vs GRT. So we can extrapolate enough data there so show that SKT is indeed much more performant all things considered.
what does that mean
Right... because you said so.
yeah.
 

ajsdkflsdjfio

Member
Nov 20, 2024
185
132
76
what does that mean
We can extrapolate node differences by showing GLC->LNC improvements vs GRT->SKT improvements. Both comparisons are intel 7 vs N3B and have similar size increases(ISO node), SKT gains much more performance than GLC to the effect of triple the amount.

Whatever, if you genuinely think Skymont is an uncompetitive as GRT/Crestmont after all this, then I can't help you.
 

ajsdkflsdjfio

Member
Nov 20, 2024
185
132
76
you can, in fact, NOT.
They're not even the same foundry ffs.
That's not what I'm talking about. You say that there is no ISO node comparison between gracemont and skymont which is true.

I counter by saying we can estimate ISO node comparisons by comparing Golden cove/Lion cove to gracemont/skymont. Both comparisons have the same node shrink from Intel 7 to N3B, both comparisons have similar size increases(ISO node) with e-cores having around 30% area of their respective P-cores. With these things being relatively equal, Lion cove gains much less performance than Skymont does with the same node shrink and core size increase (ISO node).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |