Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 125 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
805
1,394
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).



What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!
 
Last edited:
Reactions: richardllewis_01

Panino Manino

Senior member
Jan 28, 2017
846
1,061
136
It's a shame that I may not be alive to see it, but I really want to see how Xilinx will add to AMD portfolio. Zen 4 was too late, but maybe Zen 5 generation will already come with some silicon by Xilinx? I wonder how they will change AMD's chips.
 

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
832
136
It's a shame that I may not be alive to see it, but I really want to see how Xilinx will add to AMD portfolio. Zen 4 was too late, but maybe Zen 5 generation will already come with some silicon by Xilinx? I wonder how they will change AMD's chips.
How long do you think you have left?
 

jamescox

Senior member
Nov 11, 2009
640
1,104
136
I don't think it is the higher operating frequencies it is the smaller capacitors. It isn't needed for DDR5 today but DDR5's roadmap extends all the way to 64 Gb chips - 4x more dense than today's 16 Gb DDR5 chips.

Seems like it would make sense to pursue multilayer designs like NAND did when the cells got too small, which allowed them to use much bigger cells and avoid the issues. I don't know enough about how DRAM is produced to know how feasible that is, obviously if it was easy they would already be doing it...
Smaller capacitors are more sensitive to cosmic rays and more sensitive to thermal effects so it is getting more likely to suffer occasional bit flips.
 
Reactions: Tlh97 and Vattila

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,221
136
One item I noticed from the leaked manual is support for TSX (HLE)



Interesting, considering the fact that Intel disabled them due to security bugs and Power10 removed the support. (I think SPR is going to add them again, so I guess they might be working in GC)

Reading it again, I think it was a mistake on my part, it is Fixed 0, no support

 

Ajay

Lifer
Jan 8, 2001
16,078
8,104
136
One item I noticed from the leaked manual is support for TSX (HLE)

View attachment 53719

Interesting, considering the fact that Intel disabled them due to security bugs and Power10 removed the support. (I think SPR is going to add them again, so I guess they might be working in GC)

Reading it again, I think it was a mistake on my part, it is Fixed 0, no support

Hard to find info on this, but I wonder if it will be more like Intel's TSXLDTRK instructions in SPR. I assume it is TSX redesigned to prevent side channel attacks that plagued Intel's first two implementations.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,221
136
First Major patch for Zen4

A raft of RAS features for GMI, the GMI interface seems to be serial one like IFIS, of course not sure how many lines being used and what is the corresponding PHY. Similar RAS like XGMI. Hopefully a low energy wider serial PHY with repeaters, unlike current gen dual 32 bit unidirectional PHY.
Based on patents I suppose there will be compression of data transfer as well

This mysterious MPDMA is a huge IP block with connection to many things
+ "Main SRAM [31:0] bank ECC or parity error",
+ "Main SRAM [63:32] bank ECC or parity error",
+ "Main SRAM [95:64] bank ECC or parity error",
+ "Main SRAM [127:96] bank ECC or parity error",
+ "Data Cache Bank A ECC or parity error",
+ "Data Cache Bank B ECC or parity error",
+ "Data Tag Cache Bank A ECC or parity error",
+ "Data Tag Cache Bank B ECC or parity error",
+ "Instruction Cache Bank A ECC or parity error",
+ "Instruction Cache Bank B ECC or parity error",
+ "Instruction Tag Cache Bank A ECC or parity error",
+ "Instruction Tag Cache Bank B ECC or parity error",
+ "Data Cache Bank A ECC or parity error",
+ "Data Cache Bank B ECC or parity error",
+ "Data Tag Cache Bank A ECC or parity error",
+ "Data Tag Cache Bank B ECC or parity error",
+ "Instruction Cache Bank A ECC or parity error",
+ "Instruction Cache Bank B ECC or parity error",
+ "Instruction Tag Cache Bank A ECC or parity error",
+ "Instruction Tag Cache Bank B ECC or parity error",
+ "Data Cache Bank A ECC or parity error",
+ "Data Cache Bank B ECC or parity error",
+ "Data Tag Cache Bank A ECC or parity error",
+ "Data Tag Cache Bank B ECC or parity error",
+ "Instruction Cache Bank A ECC or parity error",
+ "Instruction Cache Bank B ECC or parity error",
+ "Instruction Tag Cache Bank A ECC or parity error",
+ "Instruction Tag Cache Bank B ECC or parity error",
+ "System Hub Read Buffer ECC or parity error",
+ "MPDMA TVF DVSEC Memory ECC or parity error",
+ "MPDMA TVF MMIO Mailbox0 ECC or parity error",
+ "MPDMA TVF MMIO Mailbox1 ECC or parity error",
+ "MPDMA TVF Doorbell Memory ECC or parity error",
+ "MPDMA TVF SDP Slave Memory 0 ECC or parity error",
+ "MPDMA TVF SDP Slave Memory 1 ECC or parity error",
+ "MPDMA TVF SDP Slave Memory 2 ECC or parity error",
+ "MPDMA TVF SDP Master Memory 0 ECC or parity error",
+ "MPDMA TVF SDP Master Memory 1 ECC or parity error",
+ "MPDMA TVF SDP Master Memory 2 ECC or parity error",
+ "MPDMA TVF SDP Master Memory 3 ECC or parity error",
+ "MPDMA TVF SDP Master Memory 4 ECC or parity error",
+ "MPDMA TVF SDP Master Memory 5 ECC or parity error",
+ "MPDMA TVF SDP Master Memory 6 ECC or parity error",
+ "MPDMA PTE Command FIFO ECC or parity error",
+ "MPDMA PTE Hub Data FIFO ECC or parity error",
+ "MPDMA PTE Internal Data FIFO ECC or parity error",
+ "MPDMA PTE Command Memory DMA ECC or parity error",
+ "MPDMA PTE Command Memory Internal ECC or parity error",
+ "MPDMA PTE DMA Completion FIFO ECC or parity error",
+ "MPDMA PTE Tablewalk Completion FIFO ECC or parity error",
+ "MPDMA PTE Descriptor Completion FIFO ECC or parity error",
+ "MPDMA PTE ReadOnly Completion FIFO ECC or parity error",
+ "MPDMA PTE DirectWrite Completion FIFO ECC or parity error",
+ "SDP Watchdog Timer expired",
HW mitigations for a whole bunch of vulnerabilities like STIBP, IBRS, SSB, Upper Address, Secure TSC (SNP) and VMSA protection (SNP)... found in Volume 2 of PPR version 3.33
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,221
136
As for the acronym, it's probably MPsoc DMA, and we're getting Xilinx IP integration early!
It could be some form of it but the RAS messages indicate a far more sophisticated block.
Like IAmChester from Chips and Cheese fame is saying, this block could be migrating pages to and from SCM to DRAM.
It is doing Page table walking and migrating pages across memory, which the Xilinx PSoC does not seem to be doing anything similar besides performing DMA without CPU intervention.
 

Ajay

Lifer
Jan 8, 2001
16,078
8,104
136
Great find!

As for the acronym, it's probably MPsoc DMA, and we're getting Xilinx IP integration early!

edit: Okay but seriously, anyone have guesses?

FromYazen Ghannam <>
Subject[PATCH 0/3] AMD SMCA Updates
DateFri, 3 Dec 2021 02:00:14 +0000
Hi all,

This set adds supports for SMCA changes in future AMD systems.

Patch 1 adds an "unknown" bank type so that sysfs initialization issues
can be avoided on systems with new bank types.

Patch 2 adds new bank types and error descriptions used in future AMD
systems.

Patch 3 adjusts how SMCA bank information is cached. Future AMD systems
will have different bank type layouts between logical CPUs. So having a
single system-wide cache of the layout won't be correct.

Thanks,
Yazen

Yazen Ghannam (3):
x86/MCE/AMD: Provide an "Unknown" MCA bank type
x86/MCE/AMD, EDAC/mce_amd: Add new SMCA Bank Types
x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type
enumeration

arch/x86/include/asm/mce.h | 26 ++---
arch/x86/kernel/cpu/mce/amd.c | 114 +++++++++++++-----
drivers/edac/mce_amd.c | 148 +++++++++++++++++++++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
4 files changed, 228 insertions(+), 62 deletions(-)

--
2.25.1

From: https://lkml.org/lkml/2021/12/2/1098

Above my paygrade ATM
This appears to be the work, in fact, of and AMD engineer with RAS & Linux background:
Have fun!
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,221
136
Patch 3 adjusts how SMCA bank information is cached. Future AMD systems
will have different bank type layouts between logical CPUs
. So having a
single system-wide cache of the layout won't be correct.
I could hazard a guess that it indicates that all the cores are not the same (but could be fully feature compatible for example). Otherwise no other explanation.
I would suppose if AMD's big.BIGGER hybrid cores are real I imagine they can just use the CPPC2 to let the OS manage it normally like how they do now for preferred cores because all cores can handle same instruction set and are feature compatible, just that peak perf will be decided by the CPPC2 preferred cores for that power plan. All cores/L3s can snoop and maintain coherency like regular non hybrid CPUs.
Windows can handle this well and if you check your System event logs you can already see this in action, Surprisingly there is a new patch to introduce CPPC2 scheduling in Linux for AMD processors called amd-pstate.
Only thing strange with this guess though is that it is a bit unexpected for AMD to do this at this stage. Or they could be having Multi socket configs with different CPUs?
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,221
136
HW mitigations for a whole bunch of vulnerabilities like STIBP, IBRS, SSB, Upper Address, Secure TSC (SNP) and VMSA protection (SNP)... found in Volume 2 of PPR version 3.33
Reading the security Bulletin related to CVE-2020-12966, I found it strange that VMSA protections was declared fixed with microcode update which introduce a new feature flag. That is nuts I was not never aware you can add new CPUID flags using microcode.

Hard to find info on this, but I wonder if it will be more like Intel's TSXLDTRK instructions in SPR. I assume it is TSX redesigned to prevent side channel attacks that plagued Intel's first two implementations.
Yes it is hard. But there is this patent if you wanna read something related to AMD HLE feature.

Processor with accelerated lock instruction operation
 

Ajay

Lifer
Jan 8, 2001
16,078
8,104
136
I could hazard a guess that it indicates that all the cores are not the same (but could be fully feature compatible for example). Otherwise no other explanation.
I would suppose if AMD's big.BIGGER hybrid cores are real I imagine they can just use the CPPC2 to let the OS manage it normally like how they do now for preferred cores because all cores can handle same instruction set and are feature compatible, just that peak perf will be decided by the CPPC2 preferred cores for that power plan. All cores/L3s can snoop and maintain coherency like regular non hybrid CPUs.
Windows can handle this well and if you check your System event logs you can already see this in action, Surprisingly there is a new patch to introduce CPPC2 scheduling in Linux for AMD processors called amd-pstate.
Only thing strange with this guess though is that it is a bit unexpected for AMD to do this at this stage. Or they could be having Multi socket configs with different CPUs?
I'm at bit lost, atm, since I can't find the meaning of MSCA banks. Obviously related to machine checks. Everything I look up points me back to Linux kernel code .
Then we have 'different bank type layouts', physical layouts or logical layouts???


Yes it is hard. But there is this patent if you wanna read something related to AMD HLE feature.

Processor with accelerated lock instruction operation
Thanks, starting to go blind bouncing around Linux Kernel code (with some useful info from phoronix - god bless the guy who runs that site!).
Time to watch Formula1 race practice or play a video game .
 

Ajay

Lifer
Jan 8, 2001
16,078
8,104
136
Reactions: lightmanek

moinmoin

Diamond Member
Jun 1, 2017
4,993
7,763
136
Surprisingly there is a new patch to introduce CPPC2 scheduling in Linux for AMD processors called amd-pstate.
[PATCH v5 22/22] Documentation: amd-pstate: add amd-pstate driver introduction - Huang Rui Only thing strange with this guess though is that it is a bit unexpected for AMD to do this at this stage. Or they could be having Multi socket configs with different CPUs?
Thanks for the reference. The initial mail for that patch set even lists some performance per watts benchmarks showing that amd-pstates fares worse than current acpi-cpufreq (only 'performance' is superior, but still below current 'ondemand'), so that may have been a reason AMD had no urge to port CPPC2 support over (it did its job under Windows and wasn't necessary under Linux). I guess they port it now since support for it becomes more important in the coming CPU gens.
 

Ajay

Lifer
Jan 8, 2001
16,078
8,104
136
Thanks for the reference. The initial mail for that patch set even lists some performance per watts benchmarks showing that amd-pstates fares worse than current acpi-cpufreq (only 'performance' is superior, but still below current 'ondemand'), so that may have been a reason AMD had no urge to port CPPC2 support over (it did its job under Windows and wasn't necessary under Linux). I guess they port it now since support for it becomes more important in the coming CPU gens.
Wow, I wish we had the relevant emails for many of these new features - very helpful (source code headers are less often useful**). Nice to see code names like Raphael being used, we are past the anonymous 'next gen cpu' and the like.

** As a former developer, I should know
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,221
136
Seems like N3 potentially might be worth it if you're willing to put in the effort with DTCO?

I don't like how TSMC didn't include N7 and N5 DTCO charts here though, we've already seen how mich of effect it can have (RDNA1 -> RDNA2).

Is DTCO even something that can be compared at node level? From what I'm aware of DTCO (Design and Technology Co-optimization) is a feedback progress the customer has to be willing to apply during silicon design to make the most of the node. With today's costly nodes I rather have to wonder who still doesn't do that at least to some degree.

Moving this to here to avoid boring the non x86 folks. It seems the problem with DTCO is longer lead time from physical design to bring up.

Zen2 took a longer time to market because AMD spent a lot of time optimizing their device, metal layers etc, Radeon VII on the other hand is fairly quick.
Even then full optimization did not happen until Zen3 when AMD was able to extract almost 5 GHz from a process which was not intended to run beyond 4.2 GHz. (from N7 Shmoo plot)
Fairly obvious when trying to feed more power to Zen3 does not land any meaningful perf gain. The cost or rather the tradeoff of this optimization is density and power.

Going forward I don't know if AMD (or other CPU designer with super long pipelines/high clocking design) would continue to do this, they better stick to tweaking a few knobs here and there, "optimized for HPC", but nothing more. (If you have not seen TSMC data, real N5 HPC flavor has 2x leakage over standard N5 )
N5 should really help with clocks without needing super deep optimizations. Question is how high would the standard N5 clock.
If AMD can land 5+ GHz frequency without deep optimizations and tradeoffs it would be greatly help with density and power efficiency.
Going way beyond 5 GHz is not going to work with heat density/thermal hotspots being a problem and parasitics degrading efficiency. (GAA advertized to solve the parasitics problem eventually, topic for another time)

AMD is moving to a new SAPR concept to reduce time to market after the high level design is done. It is faster to do architectural iterations with RTL simulation than to optimize during physical design
Also I believe Zen5 on N3 (according to rumors as early as 2H2023) being fast is probably because of lesser process/device optimizations and extensive use of highly automated SAPR.
This should help with aligning launches to yearly OEM updates.

Zen4 therefore is very interesting in this regard, it is going to give an idea how high clocking designs will look like in terms of efficiency/density with upcoming nodes.
This slide is therefore very interesting, N7-->N5 (efficiency with perf gain) while 14LPP-->N7 (efficiency at same perf)
 

LightningZ71

Golden Member
Mar 10, 2017
1,658
1,939
136
The above is why I am of the opinion that AMD has an opportunity to compete better here by having separate mobile and desktop/server products. AMD can focus on getting the logic right and in production on a new product quickly by pushing out a first iteration on the desktop and server where power and efficiency isn't quite as crucial as it is in mobile. Then, follow up with a mobile design that has been iterated at the process level enough to have better power/efficiency characteristics than the desktop. Finally, in moving to the next generation or products, the desktop part can be mildly tweaked and iterated on the existing or slightly improved node to offer additional, more desirable SKUs for older platforms while the next generation product is pushed out on a new process node. We've seen elements of this in the recent past, but, I wonder if that's their targeted cycle?
 

Ajay

Lifer
Jan 8, 2001
16,078
8,104
136
Moving this to here to avoid boring the non x86 folks. It seems the problem with DTCO is longer lead time from physical design to bring up.

Okay, apparently I don't understand DTCO yet.

AMD is moving to a new SAPR concept to reduce time to market after the high level design is done. It is faster to do architectural iterations with RTL simulation than to optimize during physical design

This has always been the case. The change is that larger scale HPC systems can iterate faster and more accurately than in years past. Don't know what SAPR stands for.
 

moinmoin

Diamond Member
Jun 1, 2017
4,993
7,763
136
AMD is moving to a new SAPR concept to reduce time to market after the high level design is done. It is faster to do architectural iterations with RTL simulation than to optimize during physical design
Regarding the latter sentence, isn't AMD doing architectural iterations with RTL simulation already? I'm sure there's always room to automatize and optimize even more processes, but that's one step I thought AMD already did.

The above is why I am of the opinion that AMD has an opportunity to compete better here by having separate mobile and desktop/server products.
But that's already the case (currently APUs vs CPUs)?

Don't know what SAPR stands for.
Synthesis Auto Place & Route
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,221
136
Regarding the latter sentence, isn't AMD doing architectural iterations with RTL simulation already? I'm sure there's always room to automatize and optimize even more processes, but that's one step I thought AMD already did.
Yes, anybody designing some circuit will do simulation.

What I meant is that you can improve perf by doing quick design iterations using RTL simulations to improve perf from architecture (of course provided you are running your device within the best range of the shmoo plot) rather than sit and optimize physical design for few extra 100MHz trading off efficiency and density greatly.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |