Discussion Zen 5 Architecture & Technical discussion

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ryanjagtap

Member
Sep 25, 2021
134
158
96
I think that both the Zen 5 and Zen 5c cores in STX are designed to have double pumped AVX-512 (2x256). We might see a bigger delta in performance between Granite Ridge and Strix Point.
 

Nothingness

Diamond Member
Jul 3, 2013
3,137
2,153
136
They can get more use out of the same capacity. The raw capacity has been lowered from 6.75K to 6.0K, but given this, it may actually not be a regression.
Yes, my understanding is that macro-op are very large.
The change also means that either the conversion to macro-op is done post op-cache, or macro-ops as they exist in Zen4 just disappeared. I’m not familiar enough with Zen uarch to deduce anything from that change.
 

DavidC1

Golden Member
Dec 29, 2023
1,211
1,933
96
They can get more use out of the same capacity. The raw capacity has been lowered from 6.75K to 6.0K, but given this, it may actually not be a regression.
So,

-4 to 8-wide decode/fetch and 33% increased dispatch
-Zero bubble branch + larger L1 BTB(11x!)
-Wider OpCache associativity and 33% more bandwidth
-Much larger ROB, PRF, and scheduler entries
-33% increased Load and 2x increased Store
-32KB 8-way to 48KB 12-way L1 cache
-Doubled L2 cache associativity and bandwidth

16% gains. That's why people are disappointed.
 

gdansk

Diamond Member
Feb 8, 2011
3,276
5,186
136
So,

-4 to 8-wide decode/fetch and 33% increased dispatch
-Zero bubble branch + larger L1 BTB(11x!)
-Wider OpCache associativity and 33% more bandwidth
-Much larger ROB, PRF, and scheduler entries
-33% increased Load and 2x increased Store
-32KB 8-way to 48KB 12-way L1 cache
-Doubled L2 cache associativity and bandwidth

16% gains. That's why people are disappointed.
It all seems fine to me with the caveat that N4P isn't a major improvement over N5 and the die size remained the same.
The technical disappointment is its the first Zen that didn't come with a clock rate improvement.

But the real reason people are disappointed is mainly not technical (some people extrapolated from Turin or made stuff up).
 

Nothingness

Diamond Member
Jul 3, 2013
3,137
2,153
136
So,

-4 to 8-wide decode/fetch and 33% increased dispatch
-Zero bubble branch + larger L1 BTB(11x!)
-Wider OpCache associativity and 33% more bandwidth
-Much larger ROB, PRF, and scheduler entries
-33% increased Load and 2x increased Store
-32KB 8-way to 48KB 12-way L1 cache
-Doubled L2 cache associativity and bandwidth

16% gains. That's why people are disappointed.
That seems to be the first iteration of a new design (not clean sheet, but still lots of changes). The next iteration(s) will pick the low hanging fruits from that point, along with, hopefully, improved processes.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
So,

-4 to 8-wide decode/fetch and 33% increased dispatch
-Zero bubble branch + larger L1 BTB(11x!)
-Wider OpCache associativity and 33% more bandwidth
-Much larger ROB, PRF, and scheduler entries
-33% increased Load and 2x increased Store
-32KB 8-way to 48KB 12-way L1 cache
-Doubled L2 cache associativity and bandwidth

16% gains. That's why people are disappointed.

That's how it works. Golden Cove had massive changes that bought <20% too.

Big-ticket items do not necessarily translate to huge gains on their own.
 

DavidC1

Golden Member
Dec 29, 2023
1,211
1,933
96
That's how it works. Golden Cove had massive changes that bought <20% too.

Big-ticket items do not necessarily translate to huge gains on their own.
See this is where the few % matters. At 19%, there would be lot less complaints, and the 16% includes the Geekbench SHA result, which is boosted by the AVX-512 enhancements.

The disappointment of course goes to Intel with 14% as well.
 

DavidC1

Golden Member
Dec 29, 2023
1,211
1,933
96
Reminder: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=threads/zen-5-architecture-technical-discussion.2619688/post-41253043

2.05%
-Zero bubble branch + larger L1 BTB(11x!), doubled Fetch

4.29%
-4 to 8-wide decode
-Wider OpCache associativity and 33% more bandwidth

5.38%
-Much larger ROB(448 entry), PRF, and scheduler entries
-More ALUs
-33% increased dispatch, rename, retire

4.29%
-33% increased Load and 2x increased Store
-32KB 8-way to 48KB 12-way L1 cache
-Doubled L2 cache associativity and bandwidth
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
See this is where the few % matters. At 19%, there would be lot less complaints, and the 16% includes the Geekbench SHA result, which is boosted by the AVX-512 enhancements.

The disappointment of course goes to Intel with 14% as well.

If there's one thing I have learned in the semi industry, it's that there is no such thing as a free lunch.

AMD did a 15%-ish perf bump with no major shrink and seemingly no major increase in core power. That's a good thing.
 

DavidC1

Golden Member
Dec 29, 2023
1,211
1,933
96
If there's one thing I have learned in the semi industry, it's that there is no such thing as a free lunch.

AMD did a 15%-ish perf bump with no major shrink and seemingly no major increase in core power. That's a good thing.
Yea, I was skeptical of large predictions in the beginning. It is a good advancement in itself.

Another thing to know is: Learn to get used to disappointments.
 

StefanR5R

Elite Member
Dec 10, 2016
6,058
9,107
136
Reminder: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=threads/zen-5-architecture-technical-discussion.2619688/post-41253043

2.05%
-Zero bubble branch + larger L1 BTB(11x!), doubled Fetch

4.29%
-4 to 8-wide decode
-Wider OpCache associativity and 33% more bandwidth

5.38%
-Much larger ROB(448 entry), PRF, and scheduler entries
-More ALUs
-33% increased dispatch, rename, retire

4.29%
-33% increased Load and 2x increased Store
-32KB 8-way to 48KB 12-way L1 cache
-Doubled L2 cache associativity and bandwidth
@Saylick's and your interpretation of AMD's "uplift breakdown" pie chart is over-simplified. It's not as if independent gains from here and there simply add up. Rather, the various changes of different µarch components are interacting. And the end effect on performance depends on the particular workload.
 

yuri69

Senior member
Jul 16, 2013
574
1,017
136
Zen 5 is weird.

It is a core that was being worked on for over 6 years. The original Zen was done in about 5 years on a shoestring budget. The gains are not impressive given the 20+ months cadence.

Sure, its "foundation" role of a >4-wide machine is evident from cases such as:
* went back by not implementing the nop fusion due increased complexity (but it *might* come back in the future, yeah)
* wording used for the unified int scheduler as "symmetry and simplifying pick"

But still...

I'm wondering whether the growing lineup doesn't contribute to the slowdown. AMD now have 4nm/3nm CCDs, narrow/wide FPU, large/small cache, etc. The same goes for the SoCs - 12ch server, 6ch server, 2ch desktop, APU, chiplet APU, MI300/MI400 APUs, etc.
 

inf64

Diamond Member
Mar 11, 2011
3,884
4,689
136
Zen 5 is weird.

It is a core that was being worked on for over 6 years. The original Zen was done in about 5 years on a shoestring budget. The gains are not impressive given the 20+ months cadence.

Sure, its "foundation" role of a >4-wide machine is evident from cases such as:
* went back by not implementing the nop fusion due increased complexity (but it *might* come back in the future, yeah)
* wording used for the unified int scheduler as "symmetry and simplifying pick"

But still...

I'm wondering whether the growing lineup doesn't contribute to the slowdown. AMD now have 4nm/3nm CCDs, narrow/wide FPU, large/small cache, etc. The same goes for the SoCs - 12ch server, 6ch server, 2ch desktop, APU, chiplet APU, MI300/MI400 APUs, etc.
Mike Clark said in C&C interview that the Z5 core performance will only get better once software is improved to the point where it can utilize all the expanded features properly. So what we have today is baseline legacy IPC of this core in current software. I expect that once some time passes (probably after Zen 6 hits the shelves), that we will have much better optimized software that will better utilize what on paper is much better architecture.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Mike Clark said in C&C interview that the Z5 core performance will only get better once software is improved to the point where it can utilize all the expanded features properly. So what we have today is baseline legacy IPC of this core in current software. I expect that once some time passes (probably after Zen 6 hits the shelves), that we will have much better optimized software that will better utilize what on paper is much better architecture.

I think such claims are best viewed with a healthy skepticism. The history of claims of "just you wait! This will be so much faster when software is optimized!" is pretty grim.
 

inf64

Diamond Member
Mar 11, 2011
3,884
4,689
136
I think such claims are best viewed with a healthy skepticism. The history of claims of "just you wait! This will be so much faster when software is optimized!" is pretty grim.
Well one thing is (almost) for sure: the baseline performance cannot get worse with time
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |