Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
805
1,394
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).



What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!
 
Last edited:
Reactions: richardllewis_01

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,210
136
PCIe 5 is pretty much irrelevant for consumers right now and will remain that way for many years. All it does is increase motherboard costs. The only thing it possibly could affect is storage, because GPUs aren't going to come close to saturating a PCI4x16 link any time soon.
PCIe 5.0 is used as the physical layer for CXL and is supposed to bring in a new era of cache coherent accelerators.

I hope this is not the case.
Otherwise it means buying some Zen4 based TR Pro to try out some CXL based accelerators.
Only option on AM5 would be to go A+A, which is kind of against AMD open and inclusive philosophy. Issue with this is that AMD only got GPUs at the moment, so if you wanna try out FPGA based CXL capable accelerators like the ones from Xilinx you would be out of luck on AM5
It makes no sense to not have it, if they already have the PCIe 5.0 IP on Genoa. They could just have the support in the IOD and the chipset and let the Board OEMs take the cost on the high end boards.
Either that or this generation has no CXL support which is going to be an issue with developers wanting to try out CXL based accelerators.

Not happy with this.
 
Reactions: Tlh97 and Kepler_L2

Gideon

Golden Member
Nov 27, 2007
1,706
3,914
136
I don't know, and to be frank, I don't entirely care. He's said enough bollocks for me to know that he's more than happy to either make stuff up or trust things from absolutely anyone.

100% this. Usually just pure informed speculation leads to much more accurate facts that what these leakers claim.

So many of these leaks go blatantly against common industry facts that it hurts (and this goes against all of MLID, Adored and Coreteks), Things like:
  • Claiming not having working silicon in the labs less than year before release
  • Claiming things that would require changes to silicon (other than respins) less than a year before release.
  • Claiming something has been designed but might not be released - This happens, but very rarely, as the R&D money has already been spent, it would literally have to be unsellable to get canned. (Things such as designing a 24 core Genoa and not announcing it while releasing a 16 core one makes 0 sense)
  • And the big one: Knowing SKUs and pricing 6+ months pre-release (when these are the last things that get decided. Especially the pricing as it's the only thing that can be changed easily, even hours before release)

But what really grinds my gears is if they get something wrong, they almost never admit that it was (someone's) poor speculation. Near always there is the excuse of "oh it must have been canned/postponed/changed last minute".

I still remember Adored being hell-bent that Navi will release in January Q1 2019, up to late December 2018. And when it didn't happen it was just casually "postponed due to yields". It ended up "being postponed" for 7 months. I'm sure AMD had no idea of the state their yields a month before release
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,210
136
AMD HSA is here


AMD is building a system architecture for the Frontier supercomputer with
a coherent interconnect between CPUs and GPUs. This hardware architecture
allows the CPUs to coherently access GPU device memory. We have hardware
in our labs and we are working with our partner HPE on the BIOS, firmware
and software for delivery to the DOE.

The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu driver
looks it up with lookup_resource and registers it with devmap as
MEMORY_DEVICE_GENERIC using devm_memremap_pages.

Now we're trying to migrate data to and from that memory using the
migrate_vma_* helpers so we can support page-based migration in our
unified memory allocations, while also supporting CPU access to those
pages.
 

Gideon

Golden Member
Nov 27, 2007
1,706
3,914
136
Having something like 250 GB/s of IO bandwidth with 128 pci-express 4.0 links seems like it would have been the deciding factor.
That wasn't really true for this case.

Aurora was supposed to be ready earlier, was won by Intel and is being built with Sapphire Rapid chiplets that have PCIe 5.0, "Rambo cache" chiplets, HMB2 on package if needed (and it looks like similar unified-memory-space software). The problem is it's using micro-bumps for stacking (well it's also very late, but that wasn't certain when Frontier was announced). So if anything Intel had the I/O advantage.

There had to be some secret sauce in AMDs offerings to win Frontier like they did. This is certainly one key differentiator. Bear in mind the V-cache solution actually most likely has two layers (as it sits on top of 32MB L3 and is exactly as big on the same process). There is nothing stopping AMD from adding more layers for some server CPUs and I'm convinced now CDNA2 has this stacking as well.

And while all of this is only possible because of AMD's engineering prowess, keep in mind that this is also TSMC's win as much as it's AMDs. They're the only foundry that has anything like that ready in this time-frame. The hoops TSMC had to go through to make this work (and be producible at scale) are also enormous.

All in all ever since Zen 2 it looks like it's the trifecta of execution (Synopsis + AMD + TSMC) that is to be congratulated. AMD couldn't just do it alone.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,470
4,028
136
Chatting on an internet forum doesn't need most of the instruction sets modern day CPUs provide. Why power all that silicon? Playing a game requires a number of instruction sets that aren't normally used. During that time, the small cores can be put to sleep, giving the big cores more headroom (by way of TDP) to run.


That's completely wrong. You think posting to Anandtech doesn't use SIMD instructions? Check out whatever is responsible in your OS kernel for zeroing pages when a new page is needed, it probably uses AVX2 in some circumstances - and that's the tip of the iceberg. You think floating point isn't needed? Sorry, all math in Javascript is done in floating point, there's no way to avoid it if you are running a browser.

I doubt there's anything you can do with a modern PC or smartphone that would allow any worthwhile reduction of instruction set coverage. Not even running an "idle loop" (which is a halt instruction these days) because there are always background/housekeeping processes running at times so the scheduler, I/O dispatch, filesystem, and other parts of the kernel will remain active.

I don't think you can usefully cut out any instructions from a small core other than 1) AVX512 (and that's only true on x86 because Intel didn't provide for variable SIMD width capability like SVE2) and 2) virtualization. Anything else you cut out will mean almost every thread will be forced onto big cores before long.
 

Doug S

Platinum Member
Feb 8, 2020
2,470
4,028
136
…and if I am running with javascript disabled? what if i am writing code in vim? What if the machine is a simple file sharing machine? There are plenty of opportunities to use a small core over a big one. Even something as basic as tracking a mouse pointer doesn’t need to use a big core.

OK sure if you are one of the niche cases of people who disable Javascript or run CLI stuff in console mode, fine I'll grant you that. The overwhelming majority of PC/smartphone users don't do stuff like that.

Tracking a mouse pointer doesn't need the performance of a big core, but it will almost certainly exercise your whole instruction set. Do you have any idea of the size of the hot code footprint tracking a mouse pointer on an otherwise idle system these days? A modern GUI is multiple layers of libraries.

A typical person who will leave Javascript enabled will exercise floating point if that mouse cursor moves in any browser window. When the pointer moves between windows, window expose events will exercise stuff like bcopy/memset that uses AVX2, and so on.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,210
136
If we go a step back, Ryzen 5000/Zen3 is a tradeoff across so many things

Zen 3 MTr/mm is around ~51, MI100 is around ~66, 30% higher. Zen 3 had to trade off 30% density to achieve the high clocks. (RDNA2 as well had similar MTr/mm2 like Zen3, because GPU team learnt from CPU team according to Suzanne Plummer )
Why the high clocks, in my opinion:
Because the Core +L2 (around ~204MTr) is not that wide and much smaller than Intel's for example (Sunny Cove is ~283MTr) and Firestorm (~502MTr).
(But they needed to improve the efficiency by making it small to run at such high clocks, so it is kind of a vicious cycle)
Because original design of Zen (1,2,3 at least) is to make the die small for cost, defects, yield etc because AMD cannot charge whatever they want.

When Zen2 was introduced, they needed to add the GAMECACHE, because they are getting hammered by Intel in a key workload in the Windows World, Gaming, but in my opinion was an improvisation and not what was envisioned during the Architectural work 4+ years ago.
What is good though, is that there is not going to be an increase of L3 in Genoa
Increasing L3 size can cause regression in IPC if the increase comes with more cycles and of course there is power involved. V Cache comes with "minimal cost of latency" as per AMD, this means it will cause a minor regression in some workloads. But hitrate is massively increased for workloads like gaming. Thankfully the V Cache can be power gated.
In the end Zen3 Core + L2 + L3 turned out to be big, to address the gaming load. There are other benefits as well in the HPC space, but the effect is profoundly highlighted in the Windows world

Operating range at the very extreme of the Shmoo plot is not exactly going to make the chip efficient

For Zen4
I did mention before that I would prefer AMD don't scale up the frequencies again, otherwise again this same cycle would take effect, but in the PR some days ago Hallock alluded to increasing clocks again soooo

On N5P, there is a lot more room to maneuver if they dont go for the absolute frequency.
The process inherently offers a lot more speed (20% over N7 at same power) with HD cells they could make small adjustments to hit clock targets, assuming their frequency targets are not so high
This can allow a to minimize the tradeoff of density for speed, means they can pack more transistors per mm2. This means more logic.
Also means they are not operating at the very extreme of the Shmoo plot and can greatly control the efficiency.

If AMD only take minor speed improvements, say 5%, they can put all gain into efficiency plus cram more transistors because there to no need to go for absolute tradeoff for frequency.
Putting more logic, in the end, can increase "IPC" because you can have more logic blocks, register file, ROB, etc., improving the perf/watt


As per TSMC ~4.1GHz is the best range to run the CPU, and probably around 4.3GHz for N5P which AMD will use
So there is a lot of opportunities made available by the process, but it is very interesting indeed what choices AMD will make this time again.

What is known at this point is the die size, 72mm2, at this size, keeping L3 same, the Core+L2 for Zen4 is going to be quite small, slightly higher MTr than Sunny Cove at best.

When you think about this, Sony in PS5 SoC still want to remove blocks from the Core, smh.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,004
1,594
136
That's a lot more xtors than needed just for the AVX-512 registers, pipelines, etc. Now I'm really curious what's going on. I think those who said this will be like the Zen1 to Zen2 improvement may be correct. There's the usual suspects like op cache, retire buffer, TLBs, etc. But, how about a larger L2? I wish the Zen3 Wikichip page had as much detail as he had for Zen1.

I think the Gigabyte leaks on AM5 mainboards already revealed that Zen4 will have 1Mbyte of L2 cache.

 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,993
7,763
136
If one looks at the changes in the cores from Zen to Zen 2 and the ones from Zen 2 to 3 one can notice that the latter makes mostly architectural changes while the former does mostly size changes (wider, larger, more, needing more die area). I'm expecting Zen 4 to follow the pattern of the former.

The rhythm seems to be:
- Ground up re-design, same node optimization. (~Zen, Zen 3)
- Same design optimized and extended to make good use of the additional area afforded by new smaller node. (Zen 2, Zen 4?)

That'd make Mike Clark's excitement about Zen 5 understandable as well considering that's the next ground up re-design in the queue, the first with AMD being the healthy company it is nowadays.

Btw.
Mike Clark said:
So every three years, we're pretty much redesigning it all.
New Zen gen only every 18 months confirmed. @DrMrLordX vindicated
(The interview is actually a little fuzzy on that since later on they talk about another three years later being Zen 8, not 7. But that's by Ian and Clark just seems to play along without really confirming or denying it.)
 
Last edited:

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
You seem to have taken my point as a lack of faith in AMD's drive to succeed.

All I did was make a purely logical deduction about the necessary time to recoup R&D costs on Zen4 and likely availability of fab capacity on N3 nodes.

Zen4 is unlikely to land before late Q3 2022 - more likely Q4 to prevent it cannabilising Zen3D sales.

Therefore the likelihood of any Zen5 chip launching in 2023 seems low to me.

I could be wrong of course - especially if Zen5 is in fact using some advanced variant of TSMC's 5nm based processes.
Plus Alder lake and Raptor Lake are not the Core2Duo comeback that some were expecting. Zen3D will even things out until the Zen 4 shows up to dominate. Zen4 will dominate gaming and general tasks
 

SteinFG

Senior member
Dec 29, 2021
517
608
106
View attachment 55740

Very interesting pic there. I hope Dylan is right on the packaging tech. Either that or I just saved myself from another subscription. (But if he is right I will sub him and pay)
Because, I cannot see any hint of any fancy packaging tech in use there, granted the grey structure obscured everything else. I cannot even see the LGA pattern.
My current theory is that amd is using fan out package just on the IO die in order to decrease the cost of its manufacturing.
Looking at die shots of a 12nm server io die, most of it is taken up by connectors, about 1/3 is logic, and a little bit of dead space. Thу dead space is probably a giveaway that the IO die is at the limit.
Moving the IO to 7nm decreases the bump pitch from 150 to 130 micron, which gives about 33% increase in IO density. So, connectors can be 33% smaller. And we actually see this when looking at die shots of Raven Ridge(12nm) vs Renoir(7nm).
But this is not enough: While IO area decreases by 33%, logic is decreasing by over 55%. This will introduce even more dead space.
Moving to fan out will decrese the bump pitch to 40 micron (I'm using TSMC info), which will shrink the connectors by up to 93%.
Optimistically, the IO die will shrink by 70-80%, and what we see at the center of this xray is a rectangular fan out package with a small die at the center of it. But because of bad quality it's impossible to see the die itself.

edit: I haven't thought about how those RDL wires will carry the signal through the narrow fan out plane, but assuming they are 2/2 micron thick, it's solvable probably
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,791
11,131
136
It looks a bit jarring and kiddish with those bright colors.

Wellllll


Also this subject is only peripherally related to Zen4, and I'll ding myself for not being able to veer back onto the subject matter within the context of this subconversation! We're all awful people! Or something. Um, hmm. Yeah just waiting for September now folks.
 

desrever

Member
Nov 6, 2021
122
302
106
Finally seeing the "leakers" getting their redactedexposed is great.

Also Zen 4 will be great still. 15% single thread upgrade is quite good for a gen no matter if it comes from IPC or clocks. Alder lake did about 20% with a much wider core. Does mean AMD will likely need to widen their cores in Zen 5. Multithreaded gains are also looking really good.

I think Zen4D might be even better than Zen3D by combining >5ghz clocks with the power of 3D cache. Might be what I wait for for my upgrade.





esquared
Anandtech Forum Director
 
Last edited by a moderator:

FangBLade

Member
Apr 13, 2022
199
395
106
Why, the TDP of raptor and zen4 is almost identical, why would you not expect similar power use for top models?
How you think they managed to push 5.5Ghz with multicore load where 5950x for me drop frequency way under 4Ghz with ~140w use, sure they must match intel speed and will push power use to the brink same as intel

You here dram a lies and fud a lot, ask yourself why non leak show power use for zen4
For someone who is new in this topic, you are very offensive, can i suggest you to calm down a little bit? Life is much more than PC hardware, try it.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,414
2,906
136
We got a leak from WCCftech regarding pricing (bad news). According to the leak, AMD will be asking 450USD for 16T part (very bad decision on their part, if true).

Another leak regarding the launch of CPUs and motherboards: https://wccftech.com/amd-ryzen-7000...otherboards-27th-september-b650-10th-october/
I am not surprised.
Zen 4 won't be cheap. You will need new MB, DDR5 and a new CPU, at least this platform will be with us a few CPU generations.

Rumors put 13900K at 250W PPT/PL2.

Are you calling me a troll? If so, I guess I should report you, to you. Control yourself.
This thread was closed for a good reason by esquared, so why do you have a need to continue? Please, just stop. BTW this is not meant only to you. Just a few hours left until announcement, so let's wait patiently. Thank you.
 

Abwx

Lifer
Apr 2, 2011
11,161
3,858
136
We literally got confirmation from AMD's own chief of marketing about that number. Denying it at this point is just silly.

I'm sure there are SKUs/TDPs at which the numbers AMD gave hold true. But not even the most optimistic reading of those numbers gives you 2x vs Raptor Lake. Nor do both numbers have to be simultaneously true for the same chip at the same TDP.

AMD s representative in a forum is of no consequence if he make a blunder, but the pic below would have very costly legal consequence if the stated numbers are made up, i remind you that this was disclosed at their financial day.

Btw, how do you understand those numbers.?..
Are you only paying attention to their meaning.?.






But not even the most optimistic reading of those numbers gives you 2x vs Raptor Lake. Nor do both numbers have to be simultaneously true for the same chip at the same TDP.

5950X is within 142W in Cinebench and score 26196 pts,
12900K score 4% better but at 240W.

Hence 5950X has 1.625 x better perf/Watt than the 12900K and 7950X will have 2.03 x the perf/Watt of a 12900K, the 13900K wont be enough to just get back to the 5950X/12900K ratio.
 
Last edited:

poke01

Golden Member
Mar 8, 2022
1,347
1,519
106
We can finally have a proper node comparison with the TSMC 5nm family of nodes.

Ok the Ryzen 5 7600X scores around 2175 in Geekbench 5. M2 is around 1930 single core.

The base freq of 7600X is 4.7Ghz with a turbo of 5.3Ghz and M2 is 3.5Ghz. I would say Apple is still in the lead in PPW.
Man, Apple has really wide designs.

The 7600X is 13% faster in ST(Geekbench only) vs M2.

Intel won't even catch up to AMD in PPW until Lunar Lake.
 
Reactions: Tlh97
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |