- Jul 27, 2020
- 20,902
- 14,489
- 146
First Gracemont laptop available for sale.
Anybody got disposable $500 to buy and test this laptop?
They probably already have a 4P+16E for mobile H chiplet. But it will be limited to lower power and lower all core clocks.They would need to tape out an entirely new compute chiplet for that.
Lower power, especially if they wanted to put it in a 14 inch gaming laptop.They could cut an HX to 4+16 but... why?
If they were going to do that, a better option would be changing Skymont to a dual core cluster, which would reduce contention on the L2 cache.What I would be interested in is a 2 P core + 24 e core design with two major changes: double the L2 to 8MB for the Skymont clusters and increase the L2 on the P cores to 4MB. That would hide the L3 cache even more and make the E cores more load agnostic.
Since it's a quad core cluster 6 cores means they have to disable two cores from one cluster or 1 from each cluster.It's sad that because Arrow Lake is on TSMC there will be no ADL-N successor. A simple 8 core Skymont would have been quite nice and they could cut it to 6 cores instead of 4 like N100.
Well you see, SRAM isn't free.double the L2 to 8MB for the Skymont clusters and increase the L2 on the P cores to 4MB. That would hide the L3 cache even more and make the E cores more load agnostic.
I think it is 8+16 (HX), 6+8 (H) and 2+8 (U).
Or -H, assuming they don't go 6+8 again. But I'm not sure 4+16 would make a significantly better mobile chip than 6+8...They could cut an HX to 4+16 but... why?
That indeed is the specs for ARL-H. 6+8.Arrow Lake-U is probably going to be Meteor Lake on Intel 3. Do you think they'll really restrict -H to 6+8 again?
That's kind of weird. You'd think the 8+16 compute tile would work well enough if they just scaled down the power (unlike Alder Lake and Raptor Lake-HX which were power hogs no matter what you did). Unless taping out the smaller tile just saves them more money in the long run by using less silicon area.That indeed is the specs for ARL-H. 6+8.
They aren't exactly identical. The -H silicon is more optimized for mobile with lower leakage and lower power at lower frequencies, but coming at a sacrifice of clock/w efficiency at the high end.That's kind of weird. You'd think the 8+16 compute tile would work well enough if they just scaled down the power (unlike Alder Lake and Raptor Lake-HX which were power hogs no matter what you did). Unless taping out the smaller tile just saves them more money in the long run by using less silicon area.
It's cheaper than the 6 p-cores that I axed... Eyeballing it, 2+24 with the proposed caches would be ROUGHLY a similar size to existing...Well you see, SRAM isn't free.
Oh hell no. If there's anything N3b is good for, it's logic-dense cores.It's cheaper than the 6 p-cores that I axed...
My idea was to also get the ring bus down to 8 compute node stops to allow it to possibly clock higher...If they were going to do that, a better option would be changing Skymont to a dual core cluster, which would reduce contention on the L2 cache.
Since it's a quad core cluster 6 cores means they have to disable two cores from one cluster or 1 from each cluster.
You'd think a homogeneous, repeating structure would be easier to imprint that complex, irregular HP logic...Oh hell no. If there's anything N3b is good for, it's logic-dense cores.
The actual bitcell just isn't getting any smaller anymore, so complex HP logic wins the day. See M4, just in case. 3-2 FF, N3e, all that jazz.You'd think a homogeneous, repeating structure would be easier to imprint that complex, irregular HP logic...
I think it's also the unknown low level details that are contributing to the problem, not just high level ones. There's a reason shared caches are usually LLC, that's why I suggested a dual core cluster. At the point of contention, the bandwidth drops to zero.My idea was to also get the ring bus down to 8 compute node stops to allow it to possibly clock higher...
While I understand what you're getting at, and agree that it's not an unrealistic ask, but I don't particularly like Intel's ring bus with more than 8 compute stops. I know that they had 12 in Raptor, but that's proving to be problematic.I think it's also the unknown low level details that are contributing to the problem, not just high level ones. There's a reason shared caches are usually LLC, that's why I suggested a dual core cluster. At the point of contention, the bandwidth drops to zero.
The internals of the company is chaotic, so the engineers aren't performing as they should. Apparently also Gelsinger isn't as good as some were expecting. He made a lot of big mistakes.