Discussion Intel current and future Lakes & Rapids thread

lightisgood · Oct 25, 2022

nicalandia said:
So either Emerald Rapids is Ramping Up Sooner than expected or...Sapphire Rapids is Ramping Up Later as in late 2023?

Rumor has it that EMRs might be scheduled in early 2023.
Now, I'm putting a simple interpretation on SPRs/EMRs.

SPRs : low-end ( 600~700sqmm die area?).
EMRs : mid-range ( ~800sqmm die area).
SPRs with EMIB: high-end ( ~1600sqmm die area ).
EMRs with EMIB: extream ( >=1600sqmm die area).

nicalandia · Oct 25, 2022

In Any Event, Genoa's NDA ends in just a few days STH will be releasing Benchmarks on that day and poor Ice Lake will have to pay the price of Intel Delays and will be the Victim of Genoa's Might.

LightningZ71 · Oct 25, 2022

I'm interested to see if AMD will go for higher Vcache stacks on Genoa as compared to MilanX to better compete against HBM emerald rapids. We saw that there was uefi support for 4 hi stacks early on in Milan development. If AMD wanted, its possible that we could see 4hi genoa 96core products with 288MB of L3 per CCD. That would be a very interesting comparison to a 2P EMR-HBM system.

moinmoin · Oct 26, 2022

lightisgood said:
Rumor has it that EMRs might be scheduled in early 2023.
Now, I'm putting a simple interpretation on SPRs/EMRs.

SPRs : low-end ( 600~700sqmm die area?).
EMRs : mid-range ( ~800sqmm die area).
SPRs with EMIB: high-end ( ~1600sqmm die area ).
EMRs with EMIB: extream ( >=1600sqmm die area).

Is EMR confirmed to be even bigger than SPR? What would make its development go so much smoother then for both to converge? A lack of SDSi? And what would be each others' unique selling point if EMR is not replacing SPR outright?

nicalandia · Oct 26, 2022

LightningZ71 said:
I'm interested to see if AMD will go for higher Vcache stacks on Genoa as compared to MilanX to better compete against HBM emerald rapids. We saw that there was uefi support for 4 hi stacks early on in Milan development. If AMD wanted, its possible that we could see 4hi genoa 96core products with 288MB of L3 per CCD. That would be a very interesting comparison to a 2P EMR-HBM system.

3D Stack Cache and HBM2e are different approaches that achieve similar results, AMD does not need to reach more than 96MiB per CCD to stack well to Intel's 64GiB of on package HBM2e

nicalandia · Oct 26, 2022

STH did a Review of Sapphire Rapids-SP, but not on general purpose benchmarks, but specifically on Accelerators, also they said that it will be "A Few Months" before the official Launch

From the article:

"Today, we get to show the first hands-on with the 4th Generation Intel Xeon Scalable, codenamed “Sapphire Rapids.” We are not going to get to show you everything. Intel has specifically only allowed us to show some of the acceleration performance of the new chips. Since it is going to be a few months until these officially launch"

Hands-on Benchmarking with Intel Sapphire Rapids Xeon Accelerators

We get hands-on with the 4th Gen Intel Xeon Scalable "Sapphire Rapids" and show some accelerator performance and discuss the future landscape

www.servethehome.com

dttprofessor · Oct 26, 2022

SPR for 4s/8s ?
EMR for 2s ?
SPR for HEDT？

nicalandia · Oct 26, 2022

dttprofessor said:
SPR for 4s/8s ?
EMR for 2s ?
SPR for HEDT？

Sapphire Rapids-SP will support up to 8 Sockets
Emerald Rapids-SP will support up to 8 Sockets
Intel next HEDT is a mystery since Intel as abandoned the HEDT market long ago, but they shifted their market to the Professional Workstation Market. Those are only one socket and we have seen a few SKUs based on Sapphire Rapids 4 compute Tiles....

Strangely enough Intel has kept silent on a possible return to the HEDT segment with Monolithic CPUs, we have seen the Silicon Wafer showing a 34 Core monolithic CPUs, no info from Intel so perhaps they have an ACE under their sleeve?

moinmoin · Oct 26, 2022

nicalandia said:
3D Stack Cache and HBM2e are different approaches that achieve similar results

While their effect may overlap some, the cause is distinctly different. HBM2e is, as its name points out, all about bandwidth and throughput. L3$ extended with V-Cache on the other hand is all about low latency. HBM is known to have similar to slightly worse latency than standard DRAM memory. So in latency sensitive workloads V-Cache expands the data able to reside in L3$ with its significantly lower latency before having to access much higher latency memory. HBM's advantage on the other hand is that once data is being accessed it pushes all data at much higher speed that memory will ever be able to do.

Typically CPU workloads are considered latency sensitive whereas GPU workloads are considered bandwidth sensitive. This is why HBM has been mostly used on products of the latter kind.

I'd expect workloads that use heavy vector computation on big data to profit the most of SPR-HBM/EMT-HBM, but others may know better and be able to offer more insight into its potentials.

Saylick · Oct 26, 2022

jpiniero said:
If Cost of Power is a first order of factor, shouldn't customers be buying the other guys?

Yeah, but kickbacks are a zeroth order factor.

/s

Exist50 · Oct 26, 2022

moinmoin said:
While their effect may overlap some, the cause is distinctly different. HBM2e is, as its name points out, all about bandwidth and throughput. L3$ extended with V-Cache on the other hand is all about low latency. HBM is known to have similar to slightly worse latency than standard DRAM memory. So in latency sensitive workloads V-Cache expands the data able to reside in L3$ with its significantly lower latency before having to access much higher latency memory. HBM's advantage on the other hand is that once data is being accessed it pushes all data at much higher speed that memory will ever be able to do.

Typically CPU workloads are considered latency sensitive whereas GPU workloads are considered bandwidth sensitive. This is why HBM has been mostly used on products of the latter kind.

I'd expect workloads that use heavy vector computation on big data to profit the most of SPR-HBM/EMT-HBM, but others may know better and be able to offer more insight into its potentials.

It clearly exists to feed AMX for AI workloads. Maybe HPC as well, but yeah, HBM only makes sense with heavy vector/matrix compute.

nicalandia · Oct 26, 2022

Exist50 said:
It clearly exists to feed AMX for AI workloads. Maybe HPC as well, but yeah, HBM only makes sense with heavy vector/matrix compute.

If I had the $$, I would get one of those and Run the OS on HBM2e alone, no DDR5, OC 8 Cores and see how games perform..!

LightningZ71 · Oct 26, 2022

Given how tightly coupled the HBM2E chips are to the CPU chiplets, I suspect that the average access latency for the HBM2E memory will not be worse than the regular DDR5 Registered ECC modules that will be in most servers. Similar or better latency with gobs more bandwidth will certainly help some workloads a great deal.

igor_kavinski · Oct 26, 2022

nicalandia said:
If I had the $$, I would get one of those and Run the OS on HBM2e alone, no DDR5, OC 8 Cores and see how games perform..!

Imagine the lead architect of that CPU facepalming on reading that

nicalandia · Oct 26, 2022

igor_kavinski said:
Imagine the lead architect of that CPU facepalming on reading that

Yes, it's not for game and yes it will cost about 10K per CPU(non-HBM will be like 5K a pop), but I also would like to get my hands on the Milan-X with 1GiB of L3$ and run a OS on Cache(probably small tiny Linux)......

Just to be clear. You can Run SPR-HBM without DDR RAM. and No you can't run Linux on stacked L3$

coercitiv · Oct 26, 2022

nicalandia said:
No you can't run Linux on stacked L3$

Bummer.

moinmoin · Oct 26, 2022

coercitiv said:
Bummer.

Indeed. All Zen chips are affected by this btw. Has to do with the chip itself (or rather its PSP) using cache for booting the secure firmware from cache first essentially as POST.

Cache as RAM (CAR) is the name as used by Coreboot. Would not be that useful in the end considering most hardware uses DMA which again accesses actual RAM and doesn't work if that doesn't exist.

Anyway Coreboot documents the behavior of Zen processors:

AMD Family 17h in coreboot — coreboot 24.05-288-g72e7585e6d documentation

"Unlike any other x86 device in coreboot, a Picasso system has DRAM online prior to the first instruction fetch.

Cache-as-RAM (CAR) is no longer a supportable feature in AMD hardware."

Markfw · Oct 26, 2022

moinmoin said:
Indeed. All Zen chips are affected by this btw. Has to do with the chip itself (or rather its PSP) using cache for booting the secure firmware from cache first essentially as POST.

Cache as RAM (CAR) is the name as used by Coreboot. Would not be that useful in the end considering most hardware uses DMA which again accesses actual RAM and doesn't work if that doesn't exist.

Anyway Coreboot documents the behavior of Zen processors:

AMD Family 17h in coreboot — coreboot 24.05-288-g72e7585e6d documentation

"Unlike any other x86 device in coreboot, a Picasso system has DRAM online prior to the first instruction fetch.

Cache-as-RAM (CAR) is no longer a supportable feature in AMD hardware."

To not derail this thread, could you post a reply explaining what affect this has in one of the Zen threads ? I want to know why all my EPYC and most of my Zen2 and Zen3 and Zen 4 boxes all run linux just fine. (and are faster than windows for what I do)

moinmoin · Oct 27, 2022

Markfw said:
To not derail this thread, could you post a reply explaining what affect this has in one of the Zen threads ? I want to know why all my EPYC and most of my Zen2 and Zen3 and Zen 4 boxes all run linux just fine. (and are faster than windows for what I do)

This all has zero bearing on normal usage, and isn't specific to Linux either. All this is about is: With caches getting bigger and bigger an OS could theoretically reside completely in that cache and boot without the system having any DRAM installed. With Zen this is not possible due to the aforementioned changes in boot behavior. Intel chips on the other hand afaik allow this to this day. That's all there is to it.

igor_kavinski · Oct 27, 2022

moinmoin said:
Intel chips on the other hand afaik allow this to this day. That's all there is to it.

Is there any small DOS type shell that can boot without RAM and reside entirely in cache on an Intel CPU?

moinmoin · Oct 27, 2022

igor_kavinski said:
Is there any small DOS type shell that can boot without RAM and reside entirely in cache on an Intel CPU?

Outside of Coreboot itself not that I'm aware of currently.

It also appears the existence of DRAM is so ingrained on many levels that a lot of software would need to be rewritten or recompiled. This is how Coreboot managed it (as of 2011):

"Running C code is a bit more difficult. The problem is that the code generated by any standard C compiler for the x86 CPU will heavily depend on RAM. System RAM is not enabled yet and the code to enable it is so complex that you really want to do it in C. Two solutions have been devised and both of these are in use (one or the other depending on the hardware).

Use a special C compiler (romcc) that does not make use of RAM, but keeps all data in registers. As the register set of the x86 is quite small (only 8 general purpose 32-bit registers), this severely limits the things that your C program can do. As the CALL and RET instructions cannot be used (they always use the stack in RAM), all C functions have to be inlined. Use the CPU cache as a data RAM. This requires some special tricks to pretend that all cache lines contain valid data and to prevent them from being evicted. Which tricks are exactly required, depends on the exact model of CPU, but it can be done. The Cache As RAM trick (CAR) yields at least 16kB of usable RAM, sufficient as stack space for a simple C program. All recent hardware ports use this trick."

More insightful reading material:

Weekly Coreboot Column

TiPH · Oct 30, 2022

https://twitter.com/x/status/1585808977757057025

Posting images/links without personal commentary is against the rules.

esquared
Anandtech Forum Director

igor_kavinski · Oct 30, 2022

That's gotta have a 400W TDP minimum.

nicalandia · Oct 30, 2022

TiPH said:
https://twitter.com/x/status/1585808977757057025

The Xeon W9 3495 have been confirmed by many source to be a 56C/1125

Intel (Xeon) W9-3495 is both the tongue twister and 56-core Sapphire Rapids HEDT CPU - VideoCardz.com

Intel Sapphire Rapids-X for workstation A new processor has been discovered by InstLatX64 in Kernel.org boot logs. The name of this processor has not been seen before. Intel W9-3495 (is probably missing Xeon) is a Sapphire Rapids processor, most likely part of the X-series. The boot log appears...

videocardz.com

Intel Xeon W9 HEDT Sapphire Rapids Preview & Benchmarks – SiSoftware

www.sisoftware.co.uk

jpiniero · Oct 30, 2022

nicalandia said:
The Xeon W9 3495 have been confirmed by many source to be a 56C/1125

Ah, but there's a difference. One is the 3495... the other is the 3495X.

Discussion Intel current and future Lakes & Rapids thread

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Lifer

Diamond Member

Junior Member

Lifer

Diamond Member

Lifer