Question Intel Mont thread

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

gdansk

Diamond Member
Feb 8, 2011
3,110
4,826
136
I think it is 8+16 (HX), 6+8 (H) and 2+8 (U).
They could cut an HX to 4+16 but... why?

It's sad that because Arrow Lake is on TSMC there will be no ADL-N successor. A simple 8 core Skymont would have been quite nice and they could cut it to 6 cores instead of 4 like N100.
 
Reactions: igor_kavinski

DavidC1

Golden Member
Dec 29, 2023
1,078
1,725
96
What I would be interested in is a 2 P core + 24 e core design with two major changes: double the L2 to 8MB for the Skymont clusters and increase the L2 on the P cores to 4MB. That would hide the L3 cache even more and make the E cores more load agnostic.
If they were going to do that, a better option would be changing Skymont to a dual core cluster, which would reduce contention on the L2 cache.
It's sad that because Arrow Lake is on TSMC there will be no ADL-N successor. A simple 8 core Skymont would have been quite nice and they could cut it to 6 cores instead of 4 like N100.
Since it's a quad core cluster 6 cores means they have to disable two cores from one cluster or 1 from each cluster.
 

DrMrLordX

Lifer
Apr 27, 2000
22,117
11,783
136
That indeed is the specs for ARL-H. 6+8.
That's kind of weird. You'd think the 8+16 compute tile would work well enough if they just scaled down the power (unlike Alder Lake and Raptor Lake-HX which were power hogs no matter what you did). Unless taping out the smaller tile just saves them more money in the long run by using less silicon area.
 

DavidC1

Golden Member
Dec 29, 2023
1,078
1,725
96
That's kind of weird. You'd think the 8+16 compute tile would work well enough if they just scaled down the power (unlike Alder Lake and Raptor Lake-HX which were power hogs no matter what you did). Unless taping out the smaller tile just saves them more money in the long run by using less silicon area.
They aren't exactly identical. The -H silicon is more optimized for mobile with lower leakage and lower power at lower frequencies, but coming at a sacrifice of clock/w efficiency at the high end.

So the HX is a desktop chip with minimum changes to put it into mobile. They could have boosted -H to 8+12 or something I guess.
 

LightningZ71

Golden Member
Mar 10, 2017
1,827
2,203
136
If they were going to do that, a better option would be changing Skymont to a dual core cluster, which would reduce contention on the L2 cache.

Since it's a quad core cluster 6 cores means they have to disable two cores from one cluster or 1 from each cluster.
My idea was to also get the ring bus down to 8 compute node stops to allow it to possibly clock higher...
 

reaperrr3

Junior Member
May 31, 2024
18
40
51
If anything, I believe Intel went a bit overboard on the caches, specifically that "L1.5" and massive L2.

LNC effectively has nearly as much SRAM per core as Zen cores with V-Cache...

I wonder if it would've been smarter to go back to Alder's cache config and just scale up from there a little, 1.5MB L2 per P-Core (ideally with slightly lower latency than on ADL/Raptor) and 4MB L3 per core/cluster. At least in terms of PPA, that might've worked better.
Normalized to the same process, LNC's PPA is atrocious vs. both Zen4 and Zen5, and the bonkers cache splurging on the P-Cores contributes a lot to that.
 

LightningZ71

Golden Member
Mar 10, 2017
1,827
2,203
136
With Arrow Lake in it's current state, anything that they can do to hide L3 and ring bus latency is important. The larger L2 is all about that. With them dropping Fmax as opposed to Raptor, they can afford to use a larger L2 and retain their current access latencies to it.
 

LightningZ71

Golden Member
Mar 10, 2017
1,827
2,203
136
Do we think that Intel has the tribal knowledge to do a specific high performance Skymont quad instead of a P core? They could do a 32 E-cores chip with 7 of 8 clusters done like they currently are, and one that's done with relaxed circuit density for higher clocks and an 8MB L2? I don't expect it to clock insanely high, but achieving a boost to 5.4 Ghz (where we're currently seeing a few examples hitting 5.2Ghz without exotic cooling).

That's a ringbus with 8 compute stops. They could increase L3 slice size to 4MB to keep total L3 the same. You might trade a bit of ST, but the MT should be considerable.
 
Reactions: igor_kavinski

DavidC1

Golden Member
Dec 29, 2023
1,078
1,725
96
My idea was to also get the ring bus down to 8 compute node stops to allow it to possibly clock higher...
I think it's also the unknown low level details that are contributing to the problem, not just high level ones. There's a reason shared caches are usually LLC, that's why I suggested a dual core cluster. At the point of contention, the bandwidth drops to zero.

The internals of the company is chaotic, so the engineers aren't performing as they should. Apparently also Gelsinger isn't as good as some were expecting. He made a lot of big mistakes.
 

LightningZ71

Golden Member
Mar 10, 2017
1,827
2,203
136
I think it's also the unknown low level details that are contributing to the problem, not just high level ones. There's a reason shared caches are usually LLC, that's why I suggested a dual core cluster. At the point of contention, the bandwidth drops to zero.

The internals of the company is chaotic, so the engineers aren't performing as they should. Apparently also Gelsinger isn't as good as some were expecting. He made a lot of big mistakes.
While I understand what you're getting at, and agree that it's not an unrealistic ask, but I don't particularly like Intel's ring bus with more than 8 compute stops. I know that they had 12 in Raptor, but that's proving to be problematic.

Maybe, instead of 8 clusters of what they already have, do 6 clusters like that for 24 e cores and then do two dual Skymont clusters with 4MB of L2 and relaxed density to reach about 5.4 Ghz? Still 8 stops, still 4 MB of L2 per cluster, just two are faster but fewer cores.
 

511

Senior member
Jul 12, 2024
743
659
96
And we will be having a 288C/288T Darkmont coming Q3.It is going to 5-6% higher performance per clock than Skymont and better Power and Performance characteristics and 6-8% higher IPC than Skymont Itself with Hybrid Bonding and Stacked L3 and IMC
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |