Discussion Intel current and future Lakes & Rapids thread

IntelUser2000 · May 22, 2022

Carfax83 said:
But Golden Coves cache bandwidth is far greater than Sky Lake's right? Usually higher bandwidth also means higher latency, though correct me if I'm wrong.

There isn't a direct relation.

However, you could optimize in a certain direction. Sacrificing latency to get higher bandwidth is a switch in mindset, from single thread to multi-thread optimization. So when they moved from L2 being the LLC to L2 being private and L3 being the LLC with Nehalem, it was a push towards better multi-threaded performance. L3 on Nehalem would be slower than L2 on Core 2 in single threaded applications for example.

With multi-threaded applications, you need the bandwidth of the caches to scale with cores to keep it scaling.

They are on a transitional phase to move away from the endless Skylake stagnation so there will be a lot of improvements, but on theory what I said would apply.

moinmoin said:
That's the crux. Intel had some catching up to do there (all the Skylake clones had excellent cache latency) so it's important it does just that.

Another thing is that Skylake likely wasn't meant to scale above certain core counts. The greatest benefit of the ring bus was it's simplicity. So for relatively low core counts it beat other implementations even though in theory it didn't sound so fast.

The greatest single ring stop they've done was 12 I think? Not so simple anymore. In Raptorlake they'll reach that number again. Also the push for ridiculous frequency doesn't help.

Carfax83 · May 22, 2022

IntelUser2000 said:
There isn't a direct relation.

However, you could optimize in a certain direction. Sacrificing latency to get higher bandwidth is a switch in mindset, from single thread to multi-thread optimization. So when they moved from L2 being the LLC to L2 being private and L3 being the LLC with Nehalem, it was a push towards better multi-threaded performance. L3 on Nehalem would be slower than L2 on Core 2 in single threaded applications for example.

With multi-threaded applications, you need the bandwidth of the caches to scale with cores to keep it scaling.

They are on a transitional phase to move away from the endless Skylake stagnation so there will be a lot of improvements, but on theory what I said would apply.

The reason why I thought that was because in the Chips and Cheese deep dive article, the author stated:

High bandwidth at high clocks doesn’t come for free. At all cache levels, Golden Cove has to cope with more latency than Zen 3. In exchange, Golden Cove’s L1 and L2 caches are larger than AMD’s, and deliver more bandwidth.

This to me kind of implies Intel made a trade off between having big, high bandwidth caches at the expense of some additional latency, which they mitigated with the much bigger ROB. But perhaps the extra latency is more due to the bigger size of the cache rather than the enormous bandwidth.

JoeRambo · May 22, 2022

Intel was already running in to the wall with rings during Haswell-Broadwell generation. The largest server CPUs had two rings of 12 cores each and each ring had up to 5 more additional stops ( 2 for inter-ring, 1 for I/O, 1 for QPI IO, 1 for IMC/HA ).

It's a long story, but two fundamental things about Nehalem->SKL (client)) architectures worked to wreck this setup:

1) L3 cache was inclusive, that means everything in L2 had duplicate cache lines in L3, so any reads into L2 from memory or various "writes" were hitting L3
2) L3 was organized in slices on ring next to cores and slice to house certain cache line are selected by address bits. Basically if a core does a read or write, data request goes to certain slice on ring.

You can already see how that is very wrong when two rings are involved: CPU has cache lines all over two rings with varying levels of latency, IMCs are spread over two rings, so you touch both rings with read/write requests.
What is more, several cores with intensive memory access patterns can kill L3 caching (by evicting L3 cache lines for other cores) and saturate ring with request for whole chip,. And these chips all had 256KB of L2, they were super depended on LLC to carry the day.

Intel had mitigation of "Cluster on Die" -> basically splitting two rings into two NUMA nodes and going even further to split LLC cache domains, to contain LLC slices in "home" ring only. And they had a truckload of various monitoring tools, LLC allocation for VM "schemes" etc. Band aids at best and actively counterproductive in some cases.
And during HSW/BDW era cloud was taking off, noone wanted a system where a "rogue" client with 1-2 thread allocation, running say JVM type of load would kill performance for whole 48T chip?

So Intel set out to solve this problem in two ways:
1) Skylake server chips did away with 256KB of L2 and went to 1MB of L2 cache
2) L3 cache was no longer inclusive and die area was eaten by 4x larger L2 and massive AVX512 FMA units and supporting 512bit register arrays.

So amount of dependency on L3 and inter core bandwidth requirement during normal operation were cut big time due to these two factors and cloud guys (including server chimp like me) were happy. Of course Intel had to equip that chip with most anemic mesh and LLC scheme the world has ever seen (might have been beaten since by those ARM 80-128 core monsters that have kilobytes of LLC per core) . The performance scaling with more L3 was non-existant, total L3 bandwidth was horrible, resulting memory latency was okayish but as chip was basically relying on larger L2 to carry it.

I don't have any problems with ring on Alder Lake nor 2 more stops on Raptor Lake will "ruin" it. The ring has become bidirectional and wider, cores have 1.25-2MB of private L2, mitigating the achiles heel of Skylake stuff.
What is the problem, is horrible L3 latency and that is where Intel can make improvements. And it seems they are doing them in Raptor Lake.

DrMrLordX · May 22, 2022

@JoeRambo

Is the mesh on IceLake-SP substantially better than Skylake/Cascade Lake? It's too early to speculate on Sapphire Rapids.

moinmoin · May 22, 2022

Carfax83 said:
But Golden Coves cache bandwidth is far greater than Sky Lake's right? Usually higher bandwidth also means higher latency, though correct me if I'm wrong.

The hardware design choice usually is between latency and cache size: The bigger the cache, the more cache lines, the more checks for hits and misses, the higher the latency. That's why cache sizes are so much smaller than memory and storage sizes.

The choice between high bandwidth and low latency on the other hand is one that can also be done at runtime. For a coarse example traditionally CPUs need low latency, whereas GPUs need bandwidth. Starting with Renoir (Ryzen 4000 series) AMD APUs contain an IMC that changes priority depending on which unit requested data. Though so far that has been more of an energy efficiency optimization to not unnecessarily having to deliver both high bandwidth and low latency at once in all circumstances, something that matters less on desktop so far.

JoeRambo · May 22, 2022

DrMrLordX said:
@JoeRambo

Is the mesh on IceLake-SP substantially better than Skylake/Cascade Lake? It's too early to speculate on Sapphire Rapids.

From reviews it looked a bit tighter, but given the disastrous 10nm process and low core clocks in general, i doubt Intel was clocking it much higher than 2.4Ghz in Skylake-SP generation. Without proper workstation ICL chip (like 9980XE we have in house) it's hard to investigate.
But we are skipping ICL-SP anyway, I doubt anyone will get fired for it when AMD's stuff is so good and SR is around the corner (ten blocks away)?

jpiniero · May 22, 2022

Intel is going to talk about Meteor Lake and Arrow Lake at Hot Chips.

Advance Program

A Symposium on High Performance Chips

hotchips.org

yuri69 · May 22, 2022

jpiniero said:
Intel is going to talk about Meteor Lake and Arrow Lake at Hot Chips.

It's great that Intel keeps talking about unreleased chips in detail. AMD switched to a silent mode with Zen and more or less keeps reiterating the old launch slides (a few previously omitted details pops).

jpiniero · May 22, 2022

yuri69 said:
It's great that Intel keeps talking about unreleased chips in detail. AMD switched to a silent mode with Zen and more or less keeps reiterating the old launch slides (a few previously omitted details pops).

Now that you mentioned it, that is kind of weird. Last year they talked at HC about Alder Lake. Year before that, Tiger Lake. I guess this implies that Meteor Lake will be released before next August. You would think that if they would talk about anything it would be Raptor Lake but too boring presumably.

yuri69 · May 22, 2022

jpiniero said:
Now that you mentioned it, that is kind of weird. Last year they talked at HC about Alder Lake. Year before that, Tiger Lake. I guess this implies that Meteor Lake will be released before next August. You would think that if they would talk about anything it would be Raptor Lake but too boring presumably.

Raptor is just a refresh. Meteor is the next big thing. H2 2023 sounds fine

CHADBOGA · May 22, 2022

jpiniero said:
Intel is going to talk about Meteor Lake and Arrow Lake at Hot Chips.

Advance Program

A Symposium on High Performance Chips

hotchips.org

But will MLID be there to have hot wings with Jim Keller and Raja Koduri?

ashFTW · May 22, 2022

Intel is also presenting the Intel 4 process shortly:

Intel to present Intel 4 process at the VLSI Technology Symposium - Semiwiki

The VLSI Symposium on Technology & Circuits will be held in Hawaii from June 12th to June 17th. You can register for the conference here. The tip sheet for the conference has been released and one thing that caught my eye is some data from the Intel 4 paper that Intel will be presenting at…

semiwiki.com

From the VLSI schedule:

”A new advanced CMOS FinFET technology, Intel 4, is introduced that extends Moore’s law by offering 2X area scaling of the high performance logic library and greater than 20% performance gain at iso-power over Intel 7.”

Henry swagger · May 24, 2022

Can't wait for meteor lake looks impressive packaging wise😁

dullard · May 24, 2022

yuri69 said:
Raptor is just a refresh. Meteor is the next big thing. H2 2023 sounds fine

There are insignificant refreshes (Comet Lake Refresh was at best a 100 MHz boost on some chips). And there are significant refreshes (Coffee Lake Refresh added cores and a whole new i9 line). The Raptor Lake rumors seem to put it right in between--possibly leaning a bit towards significant since it could fix much of the background performance problems of Alder Lake if it adds more E cores on the i7 and i5 lines. I do agree with @jpiniero that Intel has been oddly silent on Raptor Lake so far.

igor_kavinski · May 24, 2022

dullard said:
I do agree with @jpiniero that Intel has been oddly silent on Raptor Lake so far.

What more could they say than what they have already revealed? Up to double digit performance increase, compatibility with ADL mobos, new voltage circuitry, increased caches and more E-cores. That's what I can think of from memory. A great and pleasant surprise would be hitting 6 GHz. Another surprise would be bringing back AVX-512.

gdansk · May 24, 2022

I'm pretty optimistic on Raptor Lake being the best buy again.
With 8+16 they have a lot of possible configurations to compete with 16/12/8/6 Zen 4. Unlike AMD they aren't using a more expensive manufacturing process.
And LGA1700 motherboards will probably be cheaper than AM5 (at least for some time).

dullard · May 24, 2022

igor_kavinski said:
What more could they say than what they have already revealed? Up to double digit performance increase, compatibility with ADL mobos, new voltage circuitry, increased caches and more E-cores. That's what I can think of from memory. A great and pleasant surprise would be hitting 6 GHz. Another surprise would be bringing back AVX-512.

You are correct that Intel has done the minimum needed. I'll add to your list that they say enhanced overclocking and an AI M.2 module. https://www.techpowerup.com/292128/intel-raptor-lake-with-24-cores-and-32-threads-demoed They have done one demo as well (see the link above).

But compared to the all-out press with Meteor Lake it just feels like Raptor Lake is being forgotten by Intel. I understand that they want to drum up business for Intel 4 and excitement for their packaging. But all the smiling faces holding Meteor Lake wafers and chips, CNET doing a photo tour about Meteor Lake, etc just is so much more than the little pieces here and there about Raptor Lake.

6 GHz might be included in their enhanced overclocking, but I am not counting on it. AVX-512 would be great if they could get more software buy-in (encoding, encrypting, photo/video processing). To me though, it all comes down to segmentation. If the 24 core part is i9 only and they don't bring more E cores to the i7 and i5 lines then Raptor Lake is not interesting to me. But, if they bring the E cores out blazing, then Raptor Lake could be a very compelling chip. And for competition reasons, I understand them keeping that information secret.

igor_kavinski · May 24, 2022

Best thing about RPL would be i5-13400 for budget conscious gamers. Doubt AMD would have something similar in the Zen 4 line-up. Now I know what AMD fans are gonna say. "Oh, but there are so many cheap Zen 3 CPUs for that!". Well, none of them work with DDR5. What if someone wants to buy a value CPU with DDR5 so they can do a socket upgrade to a halo model tomorrow? Only option for such users is ADL/RPL.

nicalandia · May 24, 2022

dullard said:
But compared to the all-out press with Meteor Lake it just feels like Raptor Lake is being forgotten by Intel.

Also the Monolithic Die HEDT Sapphire Rapids-X/UHEDT Processors seem to be forgotten by Intel

IntelUser2000 · May 24, 2022

dullard said:
If the 24 core part is i9 only and they don't bring more E cores to the i7 and i5 lines then Raptor Lake is not interesting to me. But, if they bring the E cores out blazing, then Raptor Lake could be a very compelling chip. And for competition reasons, I understand them keeping that information secret.

No, the core count information is public. Should be quite easy for you to decide whether you want i7 and i5 chips. The top i7 is 8 E cores, which is a boost from current 4 for example.

igor_kavinski said:
new voltage circuitry, increased caches and more E-cores.

Regarding the new voltage regulator, it's absolutely useless for desktop. Even in the research paper they said the gains are practically zero after 50A. Pretty much all U/P thing, not even H.

dullard · May 24, 2022

IntelUser2000 said:
No, the core count information is public. Should be quite easy for you to decide whether you want i7 and i5 chips. The top i7 is 8 E cores, which is a boost from current 4 for example.

Regarding the new voltage regulator, it's absolutely useless for desktop. Even in the research paper they said the gains are practically zero after 50A. Pretty much all U/P thing, not even H.

Which of the public core counts is the correct one? The AdoredTV leak?

While the new voltage regulator would not be that important for desktop, it still has uses. The 35 W desktop T chips could certainly get a lot out of it. Plus, lower idle (or low level work) power means turbo is possible for longer if you have a typical OEM computer (i.e. crappy heat sink/fan with a dusty case, inside a computer desk not the brand new, clean review computers sitting out in the open). You know, like most computer users. Doesn't mean Intel will put it into the desktop chips, but it could certainly help them.

According to the patent that was being discussed, the power savings are there up to about 70 A (not 50 A). See figure 5B here: https://www.freepatentsonline.com/20210208656.pdf . That was just one example, there might be other possible examples with higher currents that have power savings. That would cover even the 65 W non-K desktop chips.

The biggest gains would come if they separate the voltages from the P and E cores. That would allow more efficiency at the low end and higher possible overclocking at the high end.

eek2121 · May 24, 2022

Carfax83 said:
@IntelUser2000 and @JoeRambo already stated why this is incorrect in regards to the cache. Also, Raptor Lake should get a decent IPC uplift, higher clock speeds and a better IMC which is capable of using higher DDR5 frequencies off the bat compared with Zen 4.

I think 10% is on the low end, but as with anything it will depend on the workload. It's conceivable that in some workloads, the performance gain could be less or greater than 10%.

Honestly either way, the consumer wins. I'm sure Raptor Lake won't just walk over Zen 4, and Zen 4 won't stomp Raptor Lake into the ground. The more competitive they are with each other, the better it will be.

Intel said "up to" double digit performance gains. What that translates to is single digit ST gains in most workloads, with maybe 1-2 outliers that see 10% or more.

Further, doubling the e-cores and bumping clocks slightly isn't going to significantly add to multicore performance. All those cores have to compete for power and thermals. We are likely looking at around a 20-25% multicore improvement and a 6-8% single core improvement. Their 12900ks chip can't even reliably hit 5.5ghz single core, so I really don't see much in the way of higher clocks (except on the e cores).

Hopefully we'll get some teasers about meteor lake and arrow lake.

IntelUser2000 · May 24, 2022

dullard said:
According to the patent that was being discussed, the power savings are there up to about 70 A (not 50 A). See figure 5B here: https://www.freepatentsonline.com/20210208656.pdf . That was just one example, there might be other possible examples with higher currents that have power savings. That would cover even the 65 W non-K desktop chips.

We also know from the earlier leaked Intel roadmaps that DLVR is listed as a notebook feature, not desktop.

You are right about 70A, but the gains are pretty minimal at that point. The short duration Turbo of 28W chips are going to get pretty close to it.

I am counting mostly on that the P cores won't have to clock as high in MT workloads due to presence of more E cores, and refreshes generally coming in with incremental improvements. Based on how Alderlake scales, something like 5% drop in performance for P cores will free up lot more for the extra 8 E cores. Right now the spread is something like 200W for 8 P cores and 50W for 8 E cores.

eek2121 said:
Intel said "up to" double digit performance gains. What that translates to is single digit ST gains in most workloads, with maybe 1-2 outliers that see 10% or more.

https://www.windows10xnews.com/wp-content/uploads/2021/03/intel_alder_lake_architecture.jpg

"Up to 20% Single Thread Performance"

It really can go either way. Same arguments are going on in the Zen 4 thread.

I wonder if both Zen 4 and Raptorlake will miss expectations so competitiveness-wise it's same, but less gains compared to predecessors than expected.

eek2121 · May 24, 2022

IntelUser2000 said:
We also know from the earlier leaked Intel roadmaps that DLVR is listed as a notebook feature, not desktop.

You are right about 70A, but the gains are pretty minimal at that point. The short duration Turbo of 28W chips are going to get pretty close to it.

I am counting mostly on that the P cores won't have to clock as high in MT workloads due to presence of more E cores, and refreshes generally coming in with incremental improvements. Based on how Alderlake scales, something like 5% drop in performance for P cores will free up lot more for the extra 8 E cores. Right now the spread is something like 200W for 8 P cores and 50W for 8 E cores.

https://www.windows10xnews.com/wp-content/uploads/2021/03/intel_alder_lake_architecture.jpg

"Up to 20% Single Thread Performance"

It really can go either way. Same arguments are going on in the Zen 4 thread.

I wonder if both Zen 4 and Raptorlake will miss expectations so competitiveness-wise it's same, but less gains compared to predecessors than expected.

That slide you linked was for Alder Lake.

Raptor Lake is 'up to double digit performance'. Go back a few pages and look at the slide.

Hitman928 · May 24, 2022

eek2121 said:
That slide you linked was for Alder Lake.

Raptor Lake is 'up to double digit performance'. Go back a few pages and look at the slide.

His point is that they said, "Up to 20% ST performance" improvement for Alderlake and the average ended up around 19%. So the up to statement was really the upper range for the average single thread improvement. If the same holds true for RPL, then the average single thread improvement will be 10+%. We'll have to wait and see how it goes.

Discussion Intel current and future Lakes & Rapids thread

Elite Member

Diamond Member

Golden Member

Lifer

Diamond Member

Golden Member

Lifer

Senior member

Lifer

Senior member

Platinum Member

Senior member

Senior member

Elite Member

Lifer

Platinum Member

Elite Member

Lifer

Diamond Member

Elite Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Diamond Member