Discussion Intel current and future Lakes & Rapids thread

Page 709 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

lightisgood

Senior member
May 27, 2022
211
97
71
So either Emerald Rapids is Ramping Up Sooner than expected or...Sapphire Rapids is Ramping Up Later as in late 2023?

Rumor has it that EMRs might be scheduled in early 2023.
Now, I'm putting a simple interpretation on SPRs/EMRs.

SPRs : low-end ( 600~700sqmm die area?).
EMRs : mid-range ( ~800sqmm die area).
SPRs with EMIB: high-end ( ~1600sqmm die area ).
EMRs with EMIB: extream ( >=1600sqmm die area).
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
In Any Event, Genoa's NDA ends in just a few days STH will be releasing Benchmarks on that day and poor Ice Lake will have to pay the price of Intel Delays and will be the Victim of Genoa's Might.
 

LightningZ71

Golden Member
Mar 10, 2017
1,783
2,139
136
I'm interested to see if AMD will go for higher Vcache stacks on Genoa as compared to MilanX to better compete against HBM emerald rapids. We saw that there was uefi support for 4 hi stacks early on in Milan development. If AMD wanted, its possible that we could see 4hi genoa 96core products with 288MB of L3 per CCD. That would be a very interesting comparison to a 2P EMR-HBM system.
 

moinmoin

Diamond Member
Jun 1, 2017
5,063
8,025
136
Rumor has it that EMRs might be scheduled in early 2023.
Now, I'm putting a simple interpretation on SPRs/EMRs.

SPRs : low-end ( 600~700sqmm die area?).
EMRs : mid-range ( ~800sqmm die area).
SPRs with EMIB: high-end ( ~1600sqmm die area ).
EMRs with EMIB: extream ( >=1600sqmm die area).
Is EMR confirmed to be even bigger than SPR? What would make its development go so much smoother then for both to converge? A lack of SDSi? And what would be each others' unique selling point if EMR is not replacing SPR outright?
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
I'm interested to see if AMD will go for higher Vcache stacks on Genoa as compared to MilanX to better compete against HBM emerald rapids. We saw that there was uefi support for 4 hi stacks early on in Milan development. If AMD wanted, its possible that we could see 4hi genoa 96core products with 288MB of L3 per CCD. That would be a very interesting comparison to a 2P EMR-HBM system.
3D Stack Cache and HBM2e are different approaches that achieve similar results, AMD does not need to reach more than 96MiB per CCD to stack well to Intel's 64GiB of on package HBM2e
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
STH did a Review of Sapphire Rapids-SP, but not on general purpose benchmarks, but specifically on Accelerators, also they said that it will be "A Few Months" before the official Launch

From the article:

"Today, we get to show the first hands-on with the 4th Generation Intel Xeon Scalable, codenamed “Sapphire Rapids.” We are not going to get to show you everything. Intel has specifically only allowed us to show some of the acceleration performance of the new chips. Since it is going to be a few months until these officially launch"

 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
SPR for 4s/8s ?
EMR for 2s ?
SPR for HEDT?
Sapphire Rapids-SP will support up to 8 Sockets
Emerald Rapids-SP will support up to 8 Sockets
Intel next HEDT is a mystery since Intel as abandoned the HEDT market long ago, but they shifted their market to the Professional Workstation Market. Those are only one socket and we have seen a few SKUs based on Sapphire Rapids 4 compute Tiles....

Strangely enough Intel has kept silent on a possible return to the HEDT segment with Monolithic CPUs, we have seen the Silicon Wafer showing a 34 Core monolithic CPUs, no info from Intel so perhaps they have an ACE under their sleeve?
 

moinmoin

Diamond Member
Jun 1, 2017
5,063
8,025
136
3D Stack Cache and HBM2e are different approaches that achieve similar results
While their effect may overlap some, the cause is distinctly different. HBM2e is, as its name points out, all about bandwidth and throughput. L3$ extended with V-Cache on the other hand is all about low latency. HBM is known to have similar to slightly worse latency than standard DRAM memory. So in latency sensitive workloads V-Cache expands the data able to reside in L3$ with its significantly lower latency before having to access much higher latency memory. HBM's advantage on the other hand is that once data is being accessed it pushes all data at much higher speed that memory will ever be able to do.

Typically CPU workloads are considered latency sensitive whereas GPU workloads are considered bandwidth sensitive. This is why HBM has been mostly used on products of the latter kind.

I'd expect workloads that use heavy vector computation on big data to profit the most of SPR-HBM/EMT-HBM, but others may know better and be able to offer more insight into its potentials.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
While their effect may overlap some, the cause is distinctly different. HBM2e is, as its name points out, all about bandwidth and throughput. L3$ extended with V-Cache on the other hand is all about low latency. HBM is known to have similar to slightly worse latency than standard DRAM memory. So in latency sensitive workloads V-Cache expands the data able to reside in L3$ with its significantly lower latency before having to access much higher latency memory. HBM's advantage on the other hand is that once data is being accessed it pushes all data at much higher speed that memory will ever be able to do.

Typically CPU workloads are considered latency sensitive whereas GPU workloads are considered bandwidth sensitive. This is why HBM has been mostly used on products of the latter kind.

I'd expect workloads that use heavy vector computation on big data to profit the most of SPR-HBM/EMT-HBM, but others may know better and be able to offer more insight into its potentials.
It clearly exists to feed AMX for AI workloads. Maybe HPC as well, but yeah, HBM only makes sense with heavy vector/matrix compute.
 
Reactions: Tlh97 and moinmoin

LightningZ71

Golden Member
Mar 10, 2017
1,783
2,139
136
Given how tightly coupled the HBM2E chips are to the CPU chiplets, I suspect that the average access latency for the HBM2E memory will not be worse than the regular DDR5 Registered ECC modules that will be in most servers. Similar or better latency with gobs more bandwidth will certainly help some workloads a great deal.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Imagine the lead architect of that CPU facepalming on reading that
Yes, it's not for game and yes it will cost about 10K per CPU(non-HBM will be like 5K a pop), but I also would like to get my hands on the Milan-X with 1GiB of L3$ and run a OS on Cache(probably small tiny Linux)......


Just to be clear. You can Run SPR-HBM without DDR RAM. and No you can't run Linux on stacked L3$
 

moinmoin

Diamond Member
Jun 1, 2017
5,063
8,025
136
Indeed. All Zen chips are affected by this btw. Has to do with the chip itself (or rather its PSP) using cache for booting the secure firmware from cache first essentially as POST.

Cache as RAM (CAR) is the name as used by Coreboot. Would not be that useful in the end considering most hardware uses DMA which again accesses actual RAM and doesn't work if that doesn't exist.

Anyway Coreboot documents the behavior of Zen processors:

"Unlike any other x86 device in coreboot, a Picasso system has DRAM online prior to the first instruction fetch.

Cache-as-RAM (CAR) is no longer a supportable feature in AMD hardware.
"
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,065
15,204
136
Indeed. All Zen chips are affected by this btw. Has to do with the chip itself (or rather its PSP) using cache for booting the secure firmware from cache first essentially as POST.

Cache as RAM (CAR) is the name as used by Coreboot. Would not be that useful in the end considering most hardware uses DMA which again accesses actual RAM and doesn't work if that doesn't exist.

Anyway Coreboot documents the behavior of Zen processors:

"Unlike any other x86 device in coreboot, a Picasso system has DRAM online prior to the first instruction fetch.

Cache-as-RAM (CAR) is no longer a supportable feature in AMD hardware.
"
To not derail this thread, could you post a reply explaining what affect this has in one of the Zen threads ? I want to know why all my EPYC and most of my Zen2 and Zen3 and Zen 4 boxes all run linux just fine. (and are faster than windows for what I do)
 

moinmoin

Diamond Member
Jun 1, 2017
5,063
8,025
136
To not derail this thread, could you post a reply explaining what affect this has in one of the Zen threads ? I want to know why all my EPYC and most of my Zen2 and Zen3 and Zen 4 boxes all run linux just fine. (and are faster than windows for what I do)
This all has zero bearing on normal usage, and isn't specific to Linux either. All this is about is: With caches getting bigger and bigger an OS could theoretically reside completely in that cache and boot without the system having any DRAM installed. With Zen this is not possible due to the aforementioned changes in boot behavior. Intel chips on the other hand afaik allow this to this day. That's all there is to it.
 

moinmoin

Diamond Member
Jun 1, 2017
5,063
8,025
136
Is there any small DOS type shell that can boot without RAM and reside entirely in cache on an Intel CPU?
Outside of Coreboot itself not that I'm aware of currently.

It also appears the existence of DRAM is so ingrained on many levels that a lot of software would need to be rewritten or recompiled. This is how Coreboot managed it (as of 2011):

"Running C code is a bit more difficult. The problem is that the code generated by any standard C compiler for the x86 CPU will heavily depend on RAM. System RAM is not enabled yet and the code to enable it is so complex that you really want to do it in C. Two solutions have been devised and both of these are in use (one or the other depending on the hardware).

Use a special C compiler (romcc) that does not make use of RAM, but keeps all data in registers. As the register set of the x86 is quite small (only 8 general purpose 32-bit registers), this severely limits the things that your C program can do. As the CALL and RET instructions cannot be used (they always use the stack in RAM), all C functions have to be inlined. Use the CPU cache as a data RAM. This requires some special tricks to pretend that all cache lines contain valid data and to prevent them from being evicted. Which tricks are exactly required, depends on the exact model of CPU, but it can be done. The Cache As RAM trick (CAR) yields at least 16kB of usable RAM, sufficient as stack space for a simple C program. All recent hardware ports use this trick.
"

More insightful reading material:
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
The Xeon W9 3495 have been confirmed by many source to be a 56C/1125


 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |