Question Zen 6 Speculation Thread

Page 54 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MS_AT

Senior member
Jul 15, 2024
448
968
96
Unfortunately I can't find a chart right now which enumerates which resources are partitioned statically vs. dynamically in AMD's current SMT implementation.
A table can be found in optimization manual, unfortunately I don't have the means to access it right now.
I don't know. The clever people left and formed their own company?
Since we are in AMD thread I meant AMD
 
Jul 27, 2020
22,298
15,554
146
How many believe that Zen 6 will have SMT4? I mean, since their SMT2 works so well (up to 40%), it's only right that they should try increasing the core resource utilization even more by going up to 60% perf uplift through SMT4.
 
Reactions: Tlh97

OneEng2

Senior member
Sep 19, 2022
385
590
106
When it comes to Zen6 wishlist it is suprising nobody wants them to allievate actual bottlenecks in Zen5 beside too small int reg file.
I actually believe that aside from adding more cores, that this is ALL AMD will do on Zen 6. They will tweak the bottlenecks that can be tweaked with little increase in transistor count, have a faster memory controller with lower latency, and put all this together to make Zen 6 IMO.
How many believe that Zen 6 will have SMT4? I mean, since their SMT2 works so well (up to 40%), it's only right that they should try increasing the core resource utilization even more by going up to 60% perf uplift through SMT4.
Naw. Going from NO SMT to SMT2 has pretty good dividends (in this case 15% more transistors for 40% more performance.

Going from SMT2 to SMT4 would require a very significant increase in transistor count and might only yield an additional 10% OVER SMT2 (as a wild guess).

IBM Power 10 has 8 way SMT. It scales really well, but has a die size of >600mm2.... and how many of those are sold? From what I have read, it is less power efficient, and less performant that Turin across the board.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,018
2,455
136
Going wider than SMT2 puts a lot of strain on the processor's caches. To get any sort of notable gains from it, you need a very wide core with excessively large caches that is notably heavy in memory utilization.
 

Win2012R2

Senior member
Dec 5, 2024
647
609
96
Is there a chance that AMD will do full 8-wide design instead of 4+4 or even 10 wide?
No chance... AMD said Zen 5 is their new foundation and they will built upon it, plus it seems that very wide Intels design is not yielding that much more, I reckon 5+5 with working dual branch prediction without SMT might be very very nice.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,018
2,455
136
There's always a possibility that anything can happen. However, I don't see AMD going 8 wide any time soon, unless it's some fancy bonding technique between both 4+4 decoders. What could happen is going to a pair of 6 + 6 wide decoders. While I wouldn't want to be any part of the team designing it, it is technically possible that both of them could be handling 2 threads each. Things start to get rapidly more complicated, you suddenly double all of your statically assigned resources and also double the demand on your shared resources, leading to a likely choice to duplicate more of them, which then starts to balloon your chip size further.

All of that for MAYBE a 10% gain in ideal situations and higher power utilization on your front end and a massively more difficult validation process. You rapidly get to a point where four simpler cores would typically be more performant AND cost effective. IBM Power is a very heavily targeted special case.
 

MS_AT

Senior member
Jul 15, 2024
448
968
96
Wider than Skymont?

What about Zen 6's front end? Is there a chance that AMD will do full 8-wide design instead of 4+4 or even 10 wide?
Zen5 is 8-wide design, as it is 8 wide at rename and if you are running from uop cache you are able to sustain 8 instr in flight

I think they should borrow some fetch tricks from Apple before they widen the decode width. To ensure they minimise waste time on decoding instructions on not taken path.
 
Reactions: lightmanek

Win2012R2

Senior member
Dec 5, 2024
647
609
96
Zen5 is 8-wide design
No, it isn't: 4+4 (especially with not working second decoder in non-SMT scenario - according to Cheese&Chips) is not the same as 8 wide fully available for single thread. It was clearly designed to push SMT with some hope it can be made work also without, but does not seem to be the case at the moment. I hope they will upgrade it to 5+5 at least, or maybe they should just go to 6 wide single and be done with it, I hate SMT with a passion.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,934
3,367
136
With Intel failing so badly why should AMD try hard with Zen 6? They might as well take it easy and keep a few tricks up their sleeve should they need them if Intel does wake up.

Unfortunately the reality of Intel falling so far behind will undoubtedly a more relaxed AMD.
 
Reactions: Tlh97 and OneEng2

Win2012R2

Senior member
Dec 5, 2024
647
609
96
With Intel failing so badly why should AMD try hard with Zen 6?
Because only paranoid survives? Seems like a good enough reason to me

They'll just charge premium for even better product, also good enough reason, plus a lot of Zen 6 key work must have been more or less finished by now
 

Joe NYC

Platinum Member
Jun 26, 2021
2,789
4,094
106
With Intel failing so badly why should AMD try hard with Zen 6? They might as well take it easy and keep a few tricks up their sleeve should they need them if Intel does wake up.

Unfortunately the reality of Intel falling so far behind will undoubtedly a more relaxed AMD.

AMD has less than 20% of client market and only about 34% of server market.

There is certainly no time to relax until those percentages are flipped.
 

JustViewing

Senior member
Aug 17, 2022
265
465
106
No, it isn't: 4+4 (especially with not working second decoder in non-SMT scenario - according to Cheese&Chips) is not the same as 8 wide fully available for single thread. It was clearly designed to push SMT with some hope it can be made work also without, but does not seem to be the case at the moment. I hope they will upgrade it to 5+5 at least, or maybe they should just go to 6 wide single and be done with it, I hate SMT with a passion.
It is 8 MacroOp wide. That means at peek it can execute 16 instructions.

For Zen 3 I was able to execute nearly 10 instructions per clock

http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...ical-discussion.2619688/page-14#post-41282969

 
Reactions: lightmanek

Win2012R2

Senior member
Dec 5, 2024
647
609
96
It is 8 MacroOp wide. That means at peek it can execute 16 instructions.
Not when per thread decoder can only decode 4, your chart says that very clearly: 1T: 4 instructions per cycle, 8 will only work in SMT, which is why Zen 5 got better than usual uplift in it.

There is certainly no time to relax until those percentages are flipped.

Intel won't survive (as it is now) if or when AMD hits 50%, which it might do in servers soon if 18A is a bust
 

MS_AT

Senior member
Jul 15, 2024
448
968
96
No, it isn't: 4+4 (especially with not working second decoder in non-SMT scenario - according to Cheese&Chips) is not the same as 8 wide fully available for single thread. It was clearly designed to push SMT with some hope it can be made work also without, but does not seem to be the case at the moment. I hope they will upgrade it to 5+5 at least, or maybe they should just go to 6 wide single and be done with it, I hate SMT with a passion.
You are wrong. Reread the articles you mention. Since most of the time code is being executed from uop cache that can dispatch 12 uop to rename, the decode throughput is less important. If you want further confirmation read Zen5 optimization manual to learn how decode and uop cache interact with each other.

And Alex Yee tested that when mixing SIMD and integer code you can in fact reach 8 inst per cycle without SMT.
 

JustViewing

Senior member
Aug 17, 2022
265
465
106
Not when per thread decoder can only decode 4, your chart says that very clearly: 1T: 4 instructions per cycle, 8 will only work in SMT, which is why Zen 5 got better than usual uplift in it.
No, 1 thread Zen 5 can work at peek throughput when instructions are coming from Op Cache which is majority of time. What you are mentioning only valid for dual decoder and not for Op Cache. As I linked above, Zen 3 can execute ~10 uOps/Cycle with an IPC of 7.7. For Zen 5 it should be higher.
 

Win2012R2

Senior member
Dec 5, 2024
647
609
96
No, 1 thread Zen 5 can work at peek throughput when instructions are coming from Op Cache which is majority of time.
Ok, it's getting above my pay grade. But I will correct you on the peAk bit though!

when mixing SIMD and integer code you can in fact reach 8 inst per cycle without SMT

How often does that happen in existing code? It does seem exotic to me

Either way I hope decoder goes up at least +1 for Zen 6
 

DrMrLordX

Lifer
Apr 27, 2000
22,368
12,175
136
With Intel failing so badly why should AMD try hard with Zen 6? They might as well take it easy and keep a few tricks up their sleeve should they need them if Intel does wake up.

Unfortunately the reality of Intel falling so far behind will undoubtedly a more relaxed AMD.
Intel let down their guard and look what happened to them. You never know who is out to get you.
 

MS_AT

Senior member
Jul 15, 2024
448
968
96
How often does that happen in existing code? It does seem exotic to me
Everytime when you are using mixed integer and floating point computations. Scalar FP is handled by the SIMD unit anyway. In SIMD heavy code you still need to use scalar int part of the core to generate addresses and handle loop control.

So probably the only time it does not happen is when running scalar int only code.

I mean in terms of opportunities to make full use of the machine width, of course waiting for data etc results in lower ipc.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |