Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 213 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
805
1,394
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).



What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!
 
Last edited:
Reactions: richardllewis_01

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
X3D doesn't stomp the KS in lots of scenarios. It's slower than the regular 5800x in non gaming scenarios, and only dominates in gaming when Alder Lake is paired with DDR4 or slow DDR5 memory. When Alder Lake has fast DDR5, it's practically on equal footing with the X3D.

Also, when you consider that Alder Lake only has 8 big cores, the fact that it can even compete with the 5950x in multithreaded apps is astonishing to me.

5800X3D is still ahead on many games even with DDR5 RAM sticks that are as expensive as the CPU itself.

8 Big Cores and 8 little cores because with 8 alone is not enough(see 12700K) and Pushed beyond sense. The 7950X with it's 16 Big Cores will be a match for Raptor Lake 8 + 16 that will also pushed beyond sense.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I own both CPUs, and in compiling tasks (I tested it compiling LibreOffice, Firefox, Blender, LuxCoreRender, Chromium and Unreal Engine) 5950X (@4.5ghz) is >15% on average faster than 12900K (P @5.0GHz / E @4.0GHz). Both with DDR4 (dont know if DDR5 would change this).

The compiling benchmarks I've seen on Phoronix show that Alder Lake is very strong there, so that's what I am going off.

In encoding tasks, 5950X is more times faster than slower in comparison with 12900K.

It depends on the encoder used. The more SIMD heavy the encoder is, the faster it will likely be on Alder Lake as Alder Lake can do 3x 256 bit loads per cycle while Zen 3 can only do 2.

HEVC and AV1 typically do very well on Alder Lake, especially with the Intel SVT version which is highly optimized.

On summary: in MT tasks, 5950X is the way to go: its clearly faster while consumes much less power than 12900K

All in all, I would agree that the 5950x is going to have higher performance per watt for heavy multithreaded applications.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
What is it exactly about SPEC that makes it a poor determinant in the performance of a CPU core?

As a layman, I'm just going by what I've seen from how Anandtech does its benchmarks and commentary from more educated posters.

For example, I've stated multiple times that Alder Lake is exceptionally strong for encoding and compilation workloads and the real world data reflects this I think, especially when you consider that it only has 8 big cores. To me this was never reflected in the SPEC benchmark which has sub benches that actually measure both encoding and compilation. Also, SPEC places the performance of both cores much closer to each other than what I see in real world applications.

On top of that, what is it about CB that makes it such a good determinant in the performance of a CPU core? I mentionned earlier how high branch prediction rates are in CB, which if anything causes a bit of an overrepresentation of the larger ROB Golden Cove brings relative to most other CPU loads.

The M1 has a larger ROB than Golden Cove, but still manages to fall behind both Zen 3 and Tiger Lake:



Thing is that Cinebench isn't even a good representation of how rendering workloads perform. Compare the 12900KS in Cinebench (20% lead over 5950X) to Blender, V-Ray or Corona and you'll find the two are matched in most reviews, or the 5950X pulls ahead.

I don't disagree here. Personally, I only use CB for the single thread performance because I think it isolates the core microarchitecture performance attributes very well. However, for determining the full CPU performance, I don't think it's a good workload because it typically mostly ignores cache and memory performance, both of which are critical to a CPU's overall performance.

The advantage you're seeing in workloads like Cinebench is down to what I mentionned before - the extremely high branch prediction rates in R20/R23. The >99% hit rate for Zen 3 comes from CnC FYI, not from an article but from discussions with them. All you're seeing is Golden Cove's 512 ROB really flex it's legs, and you're attributing that to "the core itself being faster", without realising all you're doing is measuring part of the capabilities of each of the cores.

Then how do you explain the M1 and Tiger Lake being slower than the 5950x, which has a smaller ROB than the both of them?
 

Asterox

Golden Member
May 15, 2012
1,028
1,786
136
Thing is that Cinebench isn't even a good representation of how rendering workloads perform. Compare the 12900KS in Cinebench (20% lead over 5950X) to Blender, V-Ray or Corona and you'll find the two are matched in most reviews, or the 5950X pulls ahead.

The advantage you're seeing in workloads like Cinebench is down to what I mentionned before - the extremely high branch prediction rates in R20/R23. The >99% hit rate for Zen 3 comes from CnC FYI, not from an article but from discussions with them. All you're seeing is Golden Cove's 512 ROB really flex it's legs, and you're attributing that to "the core itself being faster", without realising all you're doing is measuring part of the capabilities of each of the cores.

Cinebench is just a fast test, or you don't need to download the real normal app you use every day for work or fun etc.Cinebench does not show realistic CPU performance, which can be written in stone.

Here is one example, R5 4500 test or comparison in various normal everyday aplication.


Cinebench R23, MT i5 12400 is 16% faster vs R5 4500

Autodesk Maya/Rendering Project, i5 12400 is only 4% faster vs R4 4500
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
5800X3D is still ahead on many games even with DDR5 RAM sticks that are as expensive as the CPU itself.

The 5800x3D was found to be 1% faster than the 12900K paired with DDR5-6400 at 1080p in Techspot's 40 game round up.



8 Big Cores and 8 little cores because with 8 alone is not enough(see 12700K) and Pushed beyond sense. The 7950X with it's 16 Big Cores will be a match for Raptor Lake 8 + 16 that will also pushed beyond sense.

8 big cores and 8 efficiency cores is more than enough for consumer oriented workloads. Only in workstation type workloads does it fall behind. I guess we'll see how much help the 16 efficiency cores in Raptor Lake is.

Zen 4 has 16 big cores with SMT, while Raptor Lake will have 8 big cores with SMT and 16 efficiency cores without SMT. It will be a great fight!
 

coercitiv

Diamond Member
Jan 24, 2014
6,384
12,803
136
but I still think CB has its uses; mostly for single core performance.
The utility of CB was not in question here, but rather it's role as the better and more comprehensive benchmark over SPEC.

Remember all this started with your story about informed forum members behaving more like a cult and coming up with the seemingly baseless idea that ADL has a single digit IPC lead over Zen3. When presented with a proper basis for this claim under the form of SPEC results, you proceeded to describe SPEC as unreliable (which is ok, arguments can be made here), and then presented Cinebench as the more reliable and accurate benchmark. That's where the logic falls apart.

Again, IPC as we use it on the forums is an average of relative gains or loses. If somebody only focuses on the strengths of an architecture we're always going to have a clash between the 30-40% gains attainable with a narrow scope and a "much lower" 19% geomean.

 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,404
146
Or maybe 512 ROB + 5 ALUs each with LEA + uOP cache + wide chip overall being able to chew through instructions and branches, even if some are mispredicted? I would not rush to conclusions that "512 ROB" is sole enabler, maybe the whole core is wider and stronger?
Having benchmark that is not completely broken by more L2 or broken by changing several secondary timings adding 10% to Linpack gflops has virtues of it's own.

I have long and consistently argued against this CB bs ( both in "our Threadripper designed by morons with only half of chiplets connected to memory is 50% faster in CB" and "we ran cb23 and nothing else and found 12900K 20% faster" ), but writing it off cause it does not scale with X3D or Zen4 might or might not catch GC is as stupid.
It's awesome, as well characterized workload that is not easily beaten and heavy enough to run into TDP.
When did I say it should be ruled out entirely lol? I have no qualms with it being one of a suite of benchmarks, but I don't believe it is good as the sole representation of the performance of a CPU core. SPEC and Geekbench can take that role as they both include a full suite of benchmarks for varying different workloads. Cinebench isn't even representative of other 3D rendering applications, let alone entirely different CPU benchmarks on top.

To suggest that Cinebench is more representative of the performance of a CPU than SPEC is beyond ridiculous. Simple as.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
To suggest that Cinebench is more representative of the performance of a CPU than SPEC is beyond ridiculous. Simple as.

Even more so when you think Zen 3 saw a 13% increase over Zen 2 in CB R23 ST.

The CB uplift + the blender uplift being couched as '31% less time than' rather than '45% faster than' just tells me AMD are keeping cards close their chest still.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
The CB uplift + the blender uplift being couched as '31% less time than' rather than '45% faster than' just tells me AMD are keeping cards close their chest still.

That is a Given. Even if the IPC gains are single digits, it's obvious that Ryzen 7000 will beat 5800X3D in gaming just by speed alone(4.5 Ghz v 5.5+ Ghz while gaming)

They did not talk about Gaming because the Release date of these 7000 products because that will lower the sales of the 5800X3D Gaming King
 
Last edited:

Heartbreaker

Diamond Member
Apr 3, 2006
4,260
5,257
136
If zen 4 is really just 15% thats too low.. they need to leave efficiency back and unlock it like intel 200watts to top cinebench charts lol

You can really only do the unlimited power thing once, and then you are stuck. Because the next chip you release won't look impressive unless you also peg it's power to the limit.

Better to leave a healthy reserve in there.

Also 15% is fine. Really it would be very impressive if they can keep giving 15% each generation.
 

deasd

Senior member
Dec 31, 2013
551
865
136
They didnt say anything about the Raphael IGP right? weird that it was this much secrecy around it.

There's so much that AMD didn't talk about, like IPC, DDR5(IMC), IGP, official performance summary(like Zen3vsZen2)... since there's still months from launch, no reason to release any more details to public.
I can feel there's an 'info war' between AMD&Intel, what is coincidence is Intel didn't release any Raptorlake info at Computex neither(they rather going to talk about MTL/ARL at Hotchips....)
 
Reactions: lightmanek
Jul 27, 2020
17,808
11,606
106
AMD also likely doesn't want their current CPU sales to come to a grinding halt in anticipation for Zen 4. Wish we knew how many more Zen 3 sales were prompted by this keynote, where people went, "Only 15% IPC increase? Bah! Not worth waiting for. Imma gonna get me a nice Zen 3/ADL system NOW".
 

MarkPost

Senior member
Mar 1, 2017
239
345
136
The compiling benchmarks I've seen on Phoronix show that Alder Lake is very strong there, so that's what I am going off.

Dont know about Linux, never used it, just Windows. These are my compiling benchmarks (windows 11):

Unreal Engine (VS2019-MSBuild), 5950X 28% faster


Chromium (Ninja), 5950X 14% faster


LibreOffice (Make), 5950X 35% faster


LuxCoreRender (CMake), 5950X 6% faster


Firefox (Clang++), 5950X 6% faster


Blender (CMake), 5950X 14% faster


It depends on the encoder used. The more SIMD heavy the encoder is, the faster it will likely be on Alder Lake as Alder Lake can do 3x 256 bit loads per cycle while Zen 3 can only do 2.

HEVC and AV1 typically do very well on Alder Lake, especially with the Intel SVT version which is highly optimized.

Well AV1 is a really poor MT codec till now. And in my experience 5950X is faster encoding to x265. On the other hand encoding to x264 is a mix, probably faster 12900K but for some reason when that happens I see very low core % use in 5950X. I have to see if thats due to Win11 (didnt test in Win10 yet) or not.
 
Jul 27, 2020
17,808
11,606
106
Unreal Engine (VS2019-MSBuild), 5950X 28% faster
View attachment 62026View attachment 62032

Chromium (Ninja), 5950X 14% faster
View attachment 62027View attachment 62033

LibreOffice (Make), 5950X 35% faster
View attachment 62028View attachment 62034

LuxCoreRender (CMake), 5950X 6% faster
View attachment 62029View attachment 62035

Firefox (Clang++), 5950X 6% faster
View attachment 62031View attachment 62036

Blender (CMake), 5950X 14% faster
Wow. Those workloads seem to be killing the 12900K.

Curious questions:

1) Are both systems using the same memory brand/model with identical timings?

2) Did you try forcing 1T command rate on the 12900K?

3) What about disabling the E-cores? How much faster does 5950X come out then?
 
Reactions: nicalandia

Hitman928

Diamond Member
Apr 15, 2012
5,593
8,767
136
Well AV1 is a really poor MT codec till now. And in my experience 5950X is faster encoding to x265. On the other hand encoding to x264 is a mix, probably faster 12900K but for some reason when that happens I see very low core % use in 5950X. I have to see if thats due to Win11 (didnt test in Win10 yet) or not.

x264 can't utilize multiple cores as effectively as x265 so it becomes a lighter threaded test which allows the 12900K's high boost speed to stay active longer as well as keeping the workload on only the P cores. When you move to x265, it uses more cores and it has to then face potentially lower frequencies due to power limits as well as bringing in some of the E cores which have far less performance in comparison to both GC and Zen3 cores.
 
Reactions: lightmanek

MarkPost

Senior member
Mar 1, 2017
239
345
136
Wow. Those workloads seem to be killing the 12900K.

Curious questions:

1) Are both systems using the same memory brand/model with identical timings?

2) Did you try forcing 1T command rate on the 12900K?

3) What about disabling the E-cores? How much faster does 5950X come out then?

Yes same memory and all timmings are set to Auto. I've not tested disabling E-cores yet
 
Reactions: Tlh97 and Ranulf

MarkPost

Senior member
Mar 1, 2017
239
345
136
x264 can't utilize multiple cores as effectively as x265 so it becomes a lighter threaded test which allows the 12900K's high boost speed to stay active longer as well as keeping the workload on only the P cores. When you move to x265, it uses more cores and it has to then face potentially lower frequencies due to power limits as well as bringing in some of the E cores which have far less performance in comparison to both GC and Zen3 cores.

yeah I set x264 threads manually (150% cpu threads) with little effect in general, specially if its at 1080p or lower res
 
Reactions: lightmanek

Tuna-Fish

Golden Member
Mar 4, 2011
1,419
1,749
136
Dont know about Linux, never used it, just Windows. These are my compiling benchmarks (windows 11):

5950X wins on Linux too, but with a much slimmer margin. I think the reason for that is that because of it's page coalescing, Zen 3 has a very substantially larger L2 data TLB than any other consumer CPU in the market. This was likely added because of the vcache, but on loads that just churn through a lot of data (and especially ones that do random access to massive buffers or trees), the core gets a nice performance boost even without the cache. On Linux, the effect is much smaller because Linux does transparent huge pages (that is, instead of always using 4kB pages unless larger ones are requested by the program, the kernel merges any 512 adjancent 4kB pages it sees into a single 2MB page). This greatly reduces the pressure on the TLBs, and so gives Alder Lake a much greater relative boost.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |