Discussion Zen 5 Architecture & Technical discussion

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DavidC1

Golden Member
Dec 29, 2023
1,442
2,343
96
I think such claims are best viewed with a healthy skepticism. The history of claims of "just you wait! This will be so much faster when software is optimized!" is pretty grim.
Just a few examples:
-Pentium 4! With SSE optimizations it'll fly and be like magic
-Prescott: Reviewers claimed it scales better with clocks, so at 8GHz...
-Bulldozer: All we need is a perfectly multi-threaded world. Who cares about ST?
-AVX-512 makes Integer amazing parallelizable and be 8x faster.

By the time it matters, we're onto the next-next-next generation.
@Saylick's and your interpretation of AMD's "uplift breakdown" pie chart is over-simplified. It's not as if independent gains from here and there simply add up. Rather, the various changes of different µarch components are interacting. And the end effect on performance depends on the particular workload.
You must be still quite young. CPU engineers have used that kind of graphs all that time.

On a general purpose CPU, certain gains can be aimed for, because aiming specifically defeats the point.

Also I put it up as an illustration that BIG changes = small gains.
 

Saylick

Diamond Member
Sep 10, 2012
3,866
8,970
136
Mike Clark said in C&C interview that the Z5 core performance will only get better once software is improved to the point where it can utilize all the expanded features properly. So what we have today is baseline legacy IPC of this core in current software. I expect that once some time passes (probably after Zen 6 hits the shelves), that we will have much better optimized software that will better utilize what on paper is much better architecture.
Dat AMD FineWine Technology. Perfected with their GPUs, and now coming to an AMD CPU near you!
 

yuri69

Senior member
Jul 16, 2013
632
1,096
136
Well one thing is (almost) for sure: the baseline performance cannot get worse with time
That list of mitigations for processor vulnerabilities is getting long...

Anyways, Mr. Clark talked about software taking advantage of that way wider dispatch and/or ALUs. It almost sounded like describing a handtuned niche and not your standard -O2 binaries.

Just a few examples:
-Pentium 4! With SSE optimizations it'll fly and be like magic
-Prescott: Reviewers claimed it scales better with clocks, so at 8GHz...
-Bulldozer: All we need is a perfectly multi-threaded world. Who cares about ST?
-AVX-512 makes Integer amazing parallelizable and be 8x faster.

By the time it matters, we're onto the next-next-next generation.
NetBurst itself was an example of "we got small caches so recompile pls". Bulldozer was a similar case... but with FMA4 on top of it.

Over the time compilers change their defaults. They theoretically converge to a mainstream settings at that time.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Just a few examples:
-Pentium 4! With SSE optimizations it'll fly and be like magic
-Prescott: Reviewers claimed it scales better with clocks, so at 8GHz...
-Bulldozer: All we need is a perfectly multi-threaded world. Who cares about ST?
-AVX-512 makes Integer amazing parallelizable and be 8x faster.

There's another one too, but it's too painful to say out loud.
 

soresu

Diamond Member
Dec 19, 2014
3,692
3,031
136
Anyone care to speculate exactly how far AMD have things in the works on their future roadmap?

I seem to remember an interview briefly mentioning Zen5 months before the release of Zen2 in 2019, so in that context it should be safe to assume that Zen8 is already in the conceptual stages at the moment.
 

inquiss

Senior member
Oct 13, 2010
352
527
136
Anyone care to speculate exactly how far AMD have things in the works on their future roadmap?

I seem to remember an interview briefly mentioning Zen5 months before the release of Zen2 in 2019, so in that context it should be safe to assume that Zen8 is already in the conceptual stages at the moment.
Optical interconnects and glass substrates in their plans perhaps
 

soresu

Diamond Member
Dec 19, 2014
3,692
3,031
136
Optical interconnects and glass substrates in their plans perhaps
I would so love for that to be the case.

Optical IO is definitely more dependent on TSMC's long term optical plans and mainboard PCB manufacturers adding fiber waveguides to copper traces.

TSMC are definitely leaning in that direction, but I don't think it's for AMD specifically, they are just gearing up for when PCIe finally stops dickering and finally embraces optical as standard rather than an "also runs on" addendum to a copper based standard.
 
Last edited:

inquiss

Senior member
Oct 13, 2010
352
527
136
I would so love for that to be the case.

Optical IO is definitely more dependent on TSMC's long term optical plans and mainboard PCB manufacturers adding to fiber waveguides to copper traces.

TSMC are definitely leaning in that direction, but I don't think it's for AMD specifically, they are just gearing up for when PCIe finally stops dickering and finally embraces optical as standard rather than an "also runs on" addendum to a copper based standard.
Oh completely, appreciate that it's dependent on TSMC as the manufacturer, but considering their prowess in taking advantage of these packaging technologies, and generally being at the forefront of their usage I'd expect then to be an early adopter of that, at keast in higher cost areas.
 

MS_AT

Senior member
Jul 15, 2024
556
1,170
96
Based on the slides when they mention that in SMT mode each thread gets a decode pipe, it's seems that with SMT enabled in BIOS we would effectively end up with 4 wide decode for ST workloads. But even assuming decoders would not be statically partitioned but competitively shared, how often in modern OSes, with hundreds of processes in flight a single core can really execute only single instruction stream, hmm. Hopefully hot chips will bring more explanations as I find this part of the design confusing.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,616
2,375
136
The earliest leaks we had on what to expect from Zen5 said that it uses the same basic L3 design and IOD as Zen4, and then Zen6 is mostly the same core but completely overhauls everything past L2.

I'm hoping for some kind of advanced packaging that reduces the latency between CCD and other CCD/memory.
 
Reactions: Tlh97 and Elfear

StefanR5R

Elite Member
Dec 10, 2016
6,341
9,760
136
Based on the slides when they mention that in SMT mode each thread gets a decode pipe, it's seems that with SMT enabled in BIOS we would effectively end up with 4 wide decode for ST workloads. But even assuming decoders would not be statically partitioned but competitively shared, how often in modern OSes, with hundreds of processes in flight a single core can really execute only single instruction stream, hmm. Hopefully hot chips will bring more explanations as I find this part of the design confusing.
On the one hand, a large number of these processes are just sleeping most of the time, waiting for some sort of event to occur. And when they are woken up, they often don't spend much CPU time to process the event. Ditto for the kernel's interrupt handlers. At least on single-user systems (in the widest sense, including HPC servers), it is nowadays rare for more runnable tasks being present than logical CPUs.
On the other hand, while the periods after which the operating system's process scheduler preempts a running task in order to switch to another runnable task is less than a blink of an eye to a human observer, it is more like eons if expressed in terms of CPU cycles. (I wish I had a quick reference to what popular OSs' policy is in this regard...)
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Based on the slides when they mention that in SMT mode each thread gets a decode pipe, it's seems that with SMT enabled in BIOS we would effectively end up with 4 wide decode for ST workloads.

I suspect there's some "it depends" here. I would guess that there are counters in place to allow a single thread to use both frontends even if SMT is enabled if there's not significant load from a second thread on the same core.

But even assuming decoders would not be statically partitioned but competitively shared, how often in modern OSes, with hundreds of processes in flight a single core can really execute only single instruction stream, hmm. Hopefully hot chips will bring more explanations as I find this part of the design confusing.

I'm not really sure what you mean by this - can you elaborate? Most of those hundreds of processes are idle most of the time. (There's a reason that in HPC systems you sometimes see all of a node's "housekeeping" functionality bound to 1-2 cores, to reduce jitter on the rest.)
 

MS_AT

Senior member
Jul 15, 2024
556
1,170
96
I'm not really sure what you mean by this - can you elaborate? Most of those hundreds of processes are idle most of the time. (There's a reason that in HPC systems you sometimes see all of a node's "housekeeping" functionality bound to 1-2 cores, to reduce jitter on the rest.)
On the one hand, a large number of these processes are just sleeping most of the time, waiting for some sort of event to occur. And when they are woken up, they often don't spend much CPU time to process the event. Ditto for the kernel's interrupt handlers. At least on single-user systems (in the widest sense, including HPC servers), it is nowadays rare for more runnable tasks being present than logical CPUs.
I am simply concerned that if there is any algorithmic method to ensure that single thread is able to use both decoders in practice it will rarely happen due to the noise present from OS in normal day to day operations. That is at least one explanation, not saying it's a valid explanation, of the results David Huang got on ES Strix Point sample, where he could not get both decoders to work on his single thread test. Of course usual caveats apply with this being ES, not final firmware that might have somehow skewed the decode arbitration algorithm etc.
 

Saylick

Diamond Member
Sep 10, 2012
3,866
8,970
136
C&C posted an article on Zen 5's dual decoders. Enjoy!
 

MS_AT

Senior member
Jul 15, 2024
556
1,170
96
I’m not even going to pretend I know what this is talking about, but could perhaps someone with a little more know how enlighten us who are microarchitecturally challenged?
While I have yet to read the whole thing, this quote from the introduction (machine translated) is interesting
It can be seen that AMD intentionally limited the performance of the processor front-end and some instruction combinations in previous microcodes. This is why we usually cannot easily trust the test results of ES or even QS processors.
 

BorisTheBlade82

Senior member
May 1, 2020
688
1,085
136
From the above tests, we can see that Zen 5's microarchitecture has indeed undergone considerable changes, especially the most important front-end part. There are some quite amazing new designs, which can be said to lay the foundation for the future development of AMD64. At the same time, Zen 5 has achieved an improvement in peak performance without major upgrades in process and cache/memory structure and almost unchanged processor CCD area.
I guess, this is what Mike Clark was so excited about already several years ago. Zen5 was not meant as one single generation to rule them all, but might be seen as a foundation for the coming years that might reap the benefits when process technology allows them to better use that new front-end. IMHO it is quite possible, they had to scale Zen5 back a bit due to N3 not being financially feasible.
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,756
106
I guess, this is what Mike Clark was so excited about already several years ago. Zen5 was not meant as one single generation to rule them all, but might be seen as a foundation for the coming years that might reap the benefits when process technology allows them to better use that new front-end. IMHO it is quite possible, they had to scale Zen5 back a bit due to N3 not being financially feasible.
Exactly. Zen5 is the biggest shift in microarchitecture since the original Zen. It's a Revolution, not an evolution. Zen 5 lays the foundation for the next 3-4 generations of Zen.

Also, note that since Mike Clark is an engineer- a CPU architect. Unlike us enthusiasts, engineers don't just care about what end product they achieved (IPC gains, performance improvement), but also what it took to get there (microarchitectural changes, design choices). So you can see why Mike Clark was excited about Zen 5.
 

Nothingness

Diamond Member
Jul 3, 2013
3,279
2,329
136
I’m not even going to pretend I know what this is talking about, but could perhaps someone with a little more know how enlighten us who are microarchitecturally challenged?
Some surprises for me:
- increased latency of integer add for 128/512 bits; they might have homogenized the integer SIMD ALU pipe stages; I wonder what the impact will be on carefully hand tuned code (but some of the 512 bits IMUL still have a latency of 1; odd)
- 3 64 bits integer multipliers up from 1; that’s strange, these instructions rarely are a bottleneck (except outside of things like GMP, RSA); if this is confirmed, Zen5 might prove great for computational number theory.

Regarding branch prediction, nice improvements on some tests such as gcc. The improvement is lower on the usual suspects that have more random branch resolution (xz compression, leela/deepsjeng game engines). This part looks definitely better.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |