Question Intel Mont thread

igor_kavinski · Feb 20, 2023

https://store.acer.com/en-us/aspire-3-laptop-a315-510p-3905

First Gracemont laptop available for sale.

Anybody got disposable $500 to buy and test this laptop?

igor_kavinski · Saturday at 6:02 PM

DrMrLordX said:
They would need to tape out an entirely new compute chiplet for that.

They probably already have a 4P+16E for mobile H chiplet. But it will be limited to lower power and lower all core clocks.

gdansk · Saturday at 6:05 PM

I think it is 8+16 (HX), 6+8 (H) and 2+8 (U).
They could cut an HX to 4+16 but... why?

It's sad that because Arrow Lake is on TSMC there will be no ADL-N successor. A simple 8 core Skymont would have been quite nice and they could cut it to 6 cores instead of 4 like N100.

igor_kavinski · Saturday at 6:07 PM

gdansk said:
They could cut an HX to 4+16 but... why?

Lower power, especially if they wanted to put it in a 14 inch gaming laptop.

DavidC1 · Saturday at 10:32 PM

LightningZ71 said:
What I would be interested in is a 2 P core + 24 e core design with two major changes: double the L2 to 8MB for the Skymont clusters and increase the L2 on the P cores to 4MB. That would hide the L3 cache even more and make the E cores more load agnostic.

If they were going to do that, a better option would be changing Skymont to a dual core cluster, which would reduce contention on the L2 cache.

gdansk said:
It's sad that because Arrow Lake is on TSMC there will be no ADL-N successor. A simple 8 core Skymont would have been quite nice and they could cut it to 6 cores instead of 4 like N100.

Since it's a quad core cluster 6 cores means they have to disable two cores from one cluster or 1 from each cluster.

adroc_thurston · Saturday at 10:37 PM

LightningZ71 said:
double the L2 to 8MB for the Skymont clusters and increase the L2 on the P cores to 4MB. That would hide the L3 cache even more and make the E cores more load agnostic.

Well you see, SRAM isn't free.

DrMrLordX · Saturday at 11:24 PM

gdansk said:
I think it is 8+16 (HX), 6+8 (H) and 2+8 (U).

Arrow Lake-U is probably going to be Meteor Lake on Intel 3. Do you think they'll really restrict -H to 6+8 again?

gdansk said:
They could cut an HX to 4+16 but... why?

Or -H, assuming they don't go 6+8 again. But I'm not sure 4+16 would make a significantly better mobile chip than 6+8...

DavidC1 · Saturday at 11:27 PM

DrMrLordX said:
Arrow Lake-U is probably going to be Meteor Lake on Intel 3. Do you think they'll really restrict -H to 6+8 again?

That indeed is the specs for ARL-H. 6+8.

DrMrLordX · Saturday at 11:31 PM

DavidC1 said:
That indeed is the specs for ARL-H. 6+8.

That's kind of weird. You'd think the 8+16 compute tile would work well enough if they just scaled down the power (unlike Alder Lake and Raptor Lake-HX which were power hogs no matter what you did). Unless taping out the smaller tile just saves them more money in the long run by using less silicon area.

DavidC1 · Saturday at 11:42 PM

DrMrLordX said:
That's kind of weird. You'd think the 8+16 compute tile would work well enough if they just scaled down the power (unlike Alder Lake and Raptor Lake-HX which were power hogs no matter what you did). Unless taping out the smaller tile just saves them more money in the long run by using less silicon area.

They aren't exactly identical. The -H silicon is more optimized for mobile with lower leakage and lower power at lower frequencies, but coming at a sacrifice of clock/w efficiency at the high end.

So the HX is a desktop chip with minimum changes to put it into mobile. They could have boosted -H to 8+12 or something I guess.

LightningZ71 · Sunday at 2:43 AM

adroc_thurston said:
Well you see, SRAM isn't free.

It's cheaper than the 6 p-cores that I axed... Eyeballing it, 2+24 with the proposed caches would be ROUGHLY a similar size to existing...

adroc_thurston · Sunday at 2:43 AM

LightningZ71 said:
It's cheaper than the 6 p-cores that I axed...

Oh hell no. If there's anything N3b is good for, it's logic-dense cores.

LightningZ71 · Sunday at 2:46 AM

DavidC1 said:
If they were going to do that, a better option would be changing Skymont to a dual core cluster, which would reduce contention on the L2 cache.

Since it's a quad core cluster 6 cores means they have to disable two cores from one cluster or 1 from each cluster.

My idea was to also get the ring bus down to 8 compute node stops to allow it to possibly clock higher...

LightningZ71 · Sunday at 2:48 AM

adroc_thurston said:
Oh hell no. If there's anything N3b is good for, it's logic-dense cores.

You'd think a homogeneous, repeating structure would be easier to imprint that complex, irregular HP logic...

adroc_thurston · Sunday at 2:56 AM

LightningZ71 said:
You'd think a homogeneous, repeating structure would be easier to imprint that complex, irregular HP logic...

The actual bitcell just isn't getting any smaller anymore, so complex HP logic wins the day. See M4, just in case. 3-2 FF, N3e, all that jazz.

LightningZ71 · Sunday at 3:00 AM

We're in agreement there. Sram scaling for N3 is minimal. I didn't believe that the sram would be free.

reaperrr3 · Sunday at 9:24 AM

If anything, I believe Intel went a bit overboard on the caches, specifically that "L1.5" and massive L2.

LNC effectively has nearly as much SRAM per core as Zen cores with V-Cache...

I wonder if it would've been smarter to go back to Alder's cache config and just scale up from there a little, 1.5MB L2 per P-Core (ideally with slightly lower latency than on ADL/Raptor) and 4MB L3 per core/cluster. At least in terms of PPA, that might've worked better.
Normalized to the same process, LNC's PPA is atrocious vs. both Zen4 and Zen5, and the bonkers cache splurging on the P-Cores contributes a lot to that.

LightningZ71 · Sunday at 10:19 AM

With Arrow Lake in it's current state, anything that they can do to hide L3 and ring bus latency is important. The larger L2 is all about that. With them dropping Fmax as opposed to Raptor, they can afford to use a larger L2 and retain their current access latencies to it.

LightningZ71 · Sunday at 4:19 PM

Do we think that Intel has the tribal knowledge to do a specific high performance Skymont quad instead of a P core? They could do a 32 E-cores chip with 7 of 8 clusters done like they currently are, and one that's done with relaxed circuit density for higher clocks and an 8MB L2? I don't expect it to clock insanely high, but achieving a boost to 5.4 Ghz (where we're currently seeing a few examples hitting 5.2Ghz without exotic cooling).

That's a ringbus with 8 compute stops. They could increase L3 slice size to 4MB to keep total L3 the same. You might trade a bit of ST, but the MT should be considerable.

DavidC1 · Sunday at 6:24 PM

LightningZ71 said:
My idea was to also get the ring bus down to 8 compute node stops to allow it to possibly clock higher...

I think it's also the unknown low level details that are contributing to the problem, not just high level ones. There's a reason shared caches are usually LLC, that's why I suggested a dual core cluster. At the point of contention, the bandwidth drops to zero.

The internals of the company is chaotic, so the engineers aren't performing as they should. Apparently also Gelsinger isn't as good as some were expecting. He made a lot of big mistakes.

LightningZ71 · Sunday at 6:47 PM

DavidC1 said:
I think it's also the unknown low level details that are contributing to the problem, not just high level ones. There's a reason shared caches are usually LLC, that's why I suggested a dual core cluster. At the point of contention, the bandwidth drops to zero.

The internals of the company is chaotic, so the engineers aren't performing as they should. Apparently also Gelsinger isn't as good as some were expecting. He made a lot of big mistakes.

While I understand what you're getting at, and agree that it's not an unrealistic ask, but I don't particularly like Intel's ring bus with more than 8 compute stops. I know that they had 12 in Raptor, but that's proving to be problematic.

Maybe, instead of 8 clusters of what they already have, do 6 clusters like that for 24 e cores and then do two dual Skymont clusters with 4MB of L2 and relaxed density to reach about 5.4 Ghz? Still 8 stops, still 4 MB of L2 per cluster, just two are faster but fewer cores.

511 · Wednesday at 1:02 PM

How the hell did i only found this thread now 😂😂 thanks whoever you were who linked this thread

511 · Wednesday at 1:33 PM

And we will be having a 288C/288T Darkmont coming Q3.It is going to 5-6% higher performance per clock than Skymont and better Power and Performance characteristics and 6-8% higher IPC than Skymont Itself with Hybrid Bonding and Stacked L3 and IMC

Question Intel Mont thread

Lifer

Lifer

Diamond Member

Lifer

Golden Member

Diamond Member

Lifer

Golden Member

Lifer

Golden Member

Golden Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Golden Member

Junior Member

Golden Member

Golden Member

Golden Member

Golden Member

Senior member

Senior member