Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

yuri69 · Apr 5, 2023

biostud said:
Wouldn't we expect generational improvements to be in the range 20-30% single core?

Not really. Zen 3 is also a new gen based on a new architecture. The quoted IPC gain figure is 19%. Note, this <20% increase was made compared to Zen 2 which is still a 1st gen product in terms of the CCX & L3 topology.

biostud · Apr 5, 2023

yuri69 said:
Not really. Zen 3 is also a new gen based on a new architecture. The quoted IPC gain figure is 19%. Note, this <20% increase was made compared to Zen 2 which is still a 1st gen product in terms of the CCX & L3 topology.

I'm not talking about IPC, but rather IPC+clock increase.

yuri69 · Apr 5, 2023

Oh, my bad.

But still, even 6.5GHz would mean only ~11% above the current 5.85GHz. Is such figure viable on the 4nm TSMC?

Beating the 6GHz mark means ~3%.

DisEnchantment · Apr 5, 2023

1800X --> 4GHz
1800X --> 3950X (15% IPC * 15% Clocks @ 4.6 GHz) --> 32% perf
3950X --> 5950X (19% IPC * 6% Clocks @ 4.9 GHz) --> 27% perf
5950X --> 7950X (13% IPC * 16% Clocks @ 5.7 GHz) --> 31% perf
Zen 3 just a bit behind in generational gain.

From above trends +30% perf seems possible but it is just a projection not guaranteed.

yuri69 said:
But still, even 6.5GHz would mean only ~11% above the current 5.85GHz. Is such figure viable on the 4nm TSMC?

Beating the 6GHz mark means ~3%.

I believe 4~5% at best.

Exist50 · Apr 5, 2023

cortexa99 said:
BREAKING: RISC-V Conference held by Tenstorrent accidentally leak Zen5 performance, also include NVIDIA Grace which is still being projected

View attachment 79037

Note they also have a typo. "Xenon" instead of (which?) "Xeon". I think the far more likely explanation is that the presentation is just a bit shoddily put together, and the Zen 5 number is a projection, not insider knowledge.

inf64 · Apr 5, 2023

Geddagod said:
This 'projected' Zen 5 shows a ~25% IPC gain, and a bump in frequency. Hmm

AMD was targeting 40%+ IPC bump from the Zen 1 core . Excavator => Zen 1 was ~50%; Zen 1 => Zen 3 was ~41%; Zen 3 => Zen 5 could be ~40% which puts it at around 23-25% higher IPC vs vanilla Zen 4.

A/// · Apr 6, 2023

inf64 said:
AMD was targeting 40%+ IPC bump from the Zen 1 core . Excavator => Zen 1 was ~50%; Zen 1 => Zen 3 was ~41%; Zen 3 => Zen 5 could be ~40% which puts it at around 23-25% higher IPC vs vanilla Zen 4.

This reads right. The amd fan in me hopes they can hit a higher number. It's why I'm hesitant to buy a few 7950X's but I can't bear another summer with hot Intel systems even with two sources of ac to cool things down.

Joe NYC · Apr 6, 2023

Exist50 said:
Note they also have a typo. "Xenon" instead of (which?) "Xeon". I think the far more likely explanation is that the presentation is just a bit shoddily put together, and the Zen 5 number is a projection, not insider knowledge.

Yes.

And I would also be curious about the numbers for Grace, which I also don't believe.

NostaSeronx · Apr 6, 2023

Zen5 might be insider knowledge...

Personal information removed

They might have yoinked the Zen5 team from India.

The Zen5 numbers might be actual. Beating 6 ALU designs from ARM's/NVIDIA's V2/Grace and Tenstorrent's Ascalon in "Scalar Competition Landscape."

So, I assume it has a better front-end/load-store and at least 6 ALUs to get better "Scalar" scores than any architecture listed.

DisEnchantment · Apr 6, 2023

NostaSeronx said:
Zen5 might be insider knowledge...

They might have yoinked the Zen5 team from India.

The Zen5 numbers might be actual. Beating 6 ALU designs from ARM's/NVIDIA's V2/Grace and Tenstorrent's Ascalon in "Scalar Competition Landscape."
View attachment 79101
So, I assume it has a better front-end/load-store and at least 6 ALUs to get better "Scalar" scores than any architecture listed.

Just some anecdote to share, there is a high level manager from AMD (I don't want to put some names) working out of Bangalore India, who went to Tenstorrent and he took few guys from his team with him.

This is the state of the progressions of AMD Zen Cores (I have CPUs from all the Zen generations, 1700X, 3400G, 3900X, 5950X, 7950X and I know how they perform relative to each other.)
1800X --> 3950X (15% IPC * 15% Clocks @ 4.6 GHz) --> 32% perf
3950X --> 5950X (19% IPC * 6% Clocks @ 4.9 GHz) --> 27% perf
5950X --> 7950X (13% IPC * 16% Clocks @ 5.7 GHz) --> 31% perf

For Server I have the 7571 (Planning to go to Genoa) and below is what I calculate for EPYC
7601 (3.2G) --> 7742 (3.4G)(15% IPC * 6% Clocks) --> 21% perf
7742 (3.4G) --> 7763 (3.5G)(19% IPC * 3% Clocks) --> 22% perf
7763 (3.5G) --> 9554 (3.75G)(13% IPC * 7% Clocks) --> 23% perf
These are standard SKUs, then there are the F SKUs which are clocked much higher.

However, from Tenstorrent slides
Zen 1/Naples (4.30) --> Zen 2/Rome (4.56) --> 6% Spec2017 Int perf?
Zen 2/Rome (4.56) --> Zen 3/Milan (5.91) --> 29 % Spec2017 Int perf
Zen 3/Milan (5.91) --> Zen 4/Genoa (6.8) --> 15% Spec2017 Int perf?

You can see numbers are all over the place.

While I am excited for RISC-V this slide looks like a big marketing nothing.
And like I mentioned before V2 does not come close to Milan nor is Genoa getting beaten by SPR.
And how comes Zen 5 is not 'projected' performance but Grace Performance is projected?
When NV themselves provided the SPECrate2017_int_base estimate for it already.

EPYC Milan 7763 with 64 Cores is much higher (>400) than NV Single Grace CPU with 72 Cores. (This result is with AOCC compiler, but many older AOCC optimizations made it to GCC/LLVM now)

CPU2017 Integer Rate Result: ASUSTeK Computer Inc. ASUS RS520A-E11(KMPA-U16) Server System 2.45 GHz, AMD EPYC 7763

CINT2017 result for ASUS RS520A-E11(KMPA-U16) Server System 2.45 GHz, AMD EPYC 7763; SPECrate2017_int_base: 436; SPECrate2017_int_peak: 465

www.spec.org

I would not look too much into this disclosure.

NostaSeronx · Apr 6, 2023

DisEnchantment said:
I would not look too much into this disclosure.

The reason why results might differ is this a pure mid-core/"Scalar" benchmark rather than SIMD/Vector.

So, the benchmark given is only for front-end/mid-core/back-end, and not FP/Vec units that can do Packed Integer. Which ARM/Intel optimized in GCC 12,etc. The benchmark is pure base instruction sets. Since RVV isn't 2.0, thus isn't fully frozen and they would probably be trounced in SIMD benchmarks.
2x Vec256 (Ascalon)
vs
2x Vec512(3x Vec256) (GoldencoveX)
vs
6x Vec256(dual-pump Vec512) (Zen4)

Contextual clues with "Scalar" in the title clearly implies that they aren't using SIMD in this specint_rate bench.

DisEnchantment said:
And how comes Zen 5 is not 'projected' performance but Grace Performance is projected?

Tenstorrent have architects from Zen5 at Tenstorrent, but they didn't bother testing Graviton3(Neoverse V1)/Grace(Neoverse V2), both of which are projected.

DisEnchantment · Apr 6, 2023

NostaSeronx said:
2x Vec256 (Ascalon)
vs
2x Vec512(3x Vec256) (GoldencoveX)
vs
6x Vec256(dual-pump Vec512) (Zen4)

Are you sure SPEC CPU®2017 Integer has the wide vector test otherwise (outside of this slide), if that is the case AVX CPUs like SPR/Genoa would decimate everything else.
Other incoherent thing is that Genoa has a stronger int than float vs SPR but not in this slide apparently.

NostaSeronx · Apr 6, 2023

DisEnchantment said:
Are you sure SPEC CPU®2017 Integer has the wide vector test otherwise (outside of this slide), if that is the case AVX CPUs like SPR/Genoa would decimate everything else.
Other incoherent thing is that Genoa has a stronger int than float vs SPR but not in this slide apparently.

https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/gcc-12 (Rate=1)
https://www.intel.com/content/www/u...ng-innovation-and-performance-with-gcc12.html (not Rate=1)

Biggest improvement of better SIMD/vectorization support shows up in:

520.omnetpp_r

525.x264_r

548.exchange2_r

For Tenstorrent to get the SIMD Auto-vec stuff they would need this patch and/or GCC 14(too late for GCC 13) => https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613260.html

The only way to get 1:1 benchmarks is to turn off SIMD.

Which is why this doesn't include Vector(SIMD), hence focusing on Scalar(Superscalar) performance instead.

x86-64 has auto-vec, ARM has auto-vec, RISC-V does not have auto-vec. Most likely can't run packed SIMDs within RVV since the ISA for it isn't frozen. Doesn't seem best to project SIMD performance till some basis to get SIMD perf is there. Which is not here or there or anywhere yet. Given "Scalar" it makes the most sense that it is only using the Integer units in the mid-core and nothing in FPU's for packed int for each architecture. The highest scores are the ones with more ALUs, not more SIMD units.

Zen2(17h v2) -> Zen3(19h v1) ~~ 1.296x <==
Zen1(17h v1) -> Zen3(19h v1) ~~ 1.374x <--
Zen4(19h v2) -> Zen5(1Ah v1) ~~ 1.3x <==
Zen3(19h v1) -> Zen5(1Ah v1) ~~ 1.496x <--
It is well within the margin of prior improved core[V2 => V1]. It however exceeds the gen1 to gen1[V1 -> V1].

Edit: Maybe, it does use SIMDs... https://www.anandtech.com/show/16778/amd-epyc-milan-review-part-2/6
Xeon score is the same as 8380
Milan score is the same as 7763
Rome score is the same as 7742
Graviton2 doesn't seem to be based on Anandtech.

¯\_(ツ)_/¯

DisEnchantment · Apr 6, 2023

Is this Strix's LLC/IFC?
Quite intriguing that they designed the LLC with CCS interface. It can plug right into a SDP instead of UMC or can plug into a GMI interface.
Really interesting as well to see if console APUs go for this.
Could be an adaptation of MCD?

20230105709 : CACHE ALLOCATION POLICY

CACHE ALLOCATION POLICY - Advanced Micro Devices, Inc.

<div p-id="p-0001">A cache includes an upstream port, a downstream port, a cache memory, and a control circuit. The control circuit temporarily stores memory access requests received from the upstream

www.freepatentsonline.com

Doug S · Apr 6, 2023

DisEnchantment said:
However, from Tenstorrent slides
Zen 1/Naples (4.30) --> Zen 2/Rome (4.56) --> 6% Spec2017 Int perf?
Zen 2/Rome (4.56) --> Zen 3/Milan (5.91) --> 29 % Spec2017 Int perf
Zen 3/Milan (5.91) --> Zen 4/Genoa (6.8) --> 15% Spec2017 Int perf?

You can see numbers are all over the place.

Typically companies use the latest and greatest compiler (their own or whatever their standard is) to produce SPEC benchmarks. However, they typically don't re-run the benchmarks on old hardware with newer compilers. So the benchmarks tend to include compiler advances along with hardware improvements. Most of the time this will affect FP and SIMD more than generic integer code.

The amount of effort they put forth to find optimal settings matters too, at least for peak results (I always ignore those and look at base) so the results might not be as good as they could be if they went all out. Back in the RISC days vendors used to consider SPEC the gold standard benchmark and put in a lot of work to making the results look as good as possible. Nowadays I don't think Intel & AMD care all that much about SPEC, they put most of their effort into other benchmarks.

Some will say that benchmarks using the same code for everything like Geekbench (at least within versions like 5.0, 6.0, etc.) is superior to SPEC for this reason, but when there are major changes in an architecture (like P4 to Core, Bulldozer to Zen, x86-32 to x86-64) you often need to update the compiler to fully exploit them and you just won't see the full effect with that - though to be fair, also won't see the full effect until the applications you care about have to updated.

A/// · Apr 6, 2023

things should get interesting now that raja is on the board @ tenstorrent.

BorisTheBlade82 · Apr 6, 2023

DisEnchantment said:
View attachment 79110
View attachment 79111

Is this Strix's LLC/IFC?
Quite intriguing that they designed the LLC with CCS interface. It can plug right into a SDP instead of UMC or can plug into a GMI interface.
Really interesting as well to see if console APUs go for this.
Could be an adaptation of MCD?

20230105709 : CACHE ALLOCATION POLICY

CACHE ALLOCATION POLICY - Advanced Micro Devices, Inc.

<div p-id="p-0001">A cache includes an upstream port, a downstream port, a cache memory, and a control circuit. The control circuit temporarily stores memory access requests received from the upstream

www.freepatentsonline.com

That looks much different to what I would have expected. So am I seeing this right that only one of the MCs leads to/from the LLC? What implications does this have? What is this about?

BorisTheBlade82 · Apr 6, 2023

A/// said:
things should get interesting now that raja is on the board @ tenstorrent.

Really? Hasn't it been officially stated that he wanted to found his own company?

Exist50 · Apr 6, 2023

BorisTheBlade82 said:
Really? Hasn't it been officially stated that he wanted to found his own company?

They're not mutually exclusive. Really, quite common for execs at one company to sit on the board of another. In practice, doesn't mean much for Tenstorrent.

DisEnchantment · Apr 6, 2023

BorisTheBlade82 said:
That looks much different to what I would have expected. So am I seeing this right that only one of the MCs leads to/from the LLC? What implications does this have? What is this about?

Just a typical patent thing trying to cover as much scenarios as possible, but as stated somewhere within, the other memory interfaces may or may not be present.

BorisTheBlade82 · Apr 6, 2023

DisEnchantment said:
Just a typical patent thing trying to cover as much scenarios as possible, but as stated somewhere within, the other memory interfaces may or may not be present.

Yep, thanks for clarifying. I sometimes forget that parents try to achieve severely different goals than actual implementations.

Thunder 57 · Apr 6, 2023

A/// said:
things should get interesting now that raja is on the board @ tenstorrent.

As I have said elsewhere, that guy is the epitome of failing upwards.

A/// · Apr 6, 2023

Thunder 57 said:
As I have said elsewhere, that guy is the epitome of failing upwards.

I was hesitant posting that news this morning. Raja, for a long time, has been a figure that either garners hatred or praise, and sometimes indifference. He's done his own good and bad. I would argue that RDNA was his brainchild due to how long it takes for a GPU or CPU to come to fruition. Without having access to internal documents it's difficult to say what he had his fingers in. Jim Keller to me seems like a no BS type of guy and he wouldn't have had Raja on his board if he was a tool or useless individual like so many claim he is. The infamy surrounding Raja travels with him heavy like the scent of a moonflower in the summer evening intoxicating all around it with its simplicity through uniqueness.

A/// · Apr 7, 2023

Hasan's trash heap has an interesting article pointing at Jim Keller as giving some introspect at some Zen 5 performance estimates. It's in line with the estimates here being 25% in IPC over vanilla Zen 4, hopefully a little more. It would be like Zen 2 to Zen 3, but far greater and placing yet another boot on Intel's throat not allowing them breathing room. Going by what I remember from the Intel rumors Zen 5 should leave Intel's server, workstation, mainstream and mobile dead in the water unless they can pull off one big surprise.

nicalandia · Apr 7, 2023

Zen4 already crushes Sapphire Rapids in SPEC 2017,

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Lifer

Senior member

Golden Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Attachments

Golden Member

Diamond Member

Golden Member

Diamond Member

¯\_(ツ)_/¯​

Golden Member

Platinum Member

Diamond Member

Senior member

Senior member

Platinum Member

Golden Member

Senior member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

¯\_(ツ)_/¯