Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 708 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Thibsie

Senior member
Apr 25, 2017
856
964
136
I hope someone at AMD sees this and redoes their abysmal naming again:


Apparently adding "AI" to your product description makes it *less* desirable to people.

They should just remove that part from the naming scheme along with "HX". Both add nothing of value anyway
Anyone remembers Athlon XP and Detonators XP just because.... windows XP ?

It's just the same idiocy.
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,773
136
I hope someone at AMD sees this and redoes their abysmal naming again:


Apparently adding "AI" to your product description makes it *less* desirable to people.

They should just remove that part from the naming scheme along with "HX". Both add nothing of value anyway
Microsoft and Google are done, then.

 

CakeMonster

Golden Member
Nov 22, 2012
1,492
653
136
Really? As a kid I thought it was cool... I didn't even use Windows XP at the time. But it sounded Xtreme.
Yeah, I also thought Windows XP was ridiculous, being used to numbers that at least referred to a feature series and progression (DOS x.x, Windows x.x, 9x, NTx). But I was maybe a bit older, and not used to aggressive branding on hardware/software that at least to me seemed start at that point.
 

Thibsie

Senior member
Apr 25, 2017
856
964
136
Yeah, I also thought Windows XP was ridiculous, being used to numbers that at least referred to a feature series and progression (DOS x.x, Windows x.x, 9x, NTx). But I was maybe a bit older, and not used to aggressive branding on hardware/software that at least to me seemed start at that point.
Yeah, XP was stupid but Vista really nailed it 🤣
Numbers are just better.
 

Jan Olšan

Senior member
Jan 12, 2017
399
683
136

I don't think you quite "support" the dual-decode feature, this is probably about tuning or compiler having a built-on model for it (that it can use when trying to optimize code for zen 5), which may not be super important...

What I wondered about the dual-decode issue is that historically processors AFAIK had worse throughput and performance when doing lots of taken branches, while non-taken branches were easier for them?
I'm not a software dev so I have only a very rough idea of this area, but aren't there profiling optimizations that try to improve performance by structuring code in such ways that that when running, the non-taken case is picked more often?

If that is so, this approach may ironically be suboptimal for Zen 5, since it can in theory benefit from higher amount of taken branches, given how they likely enable it to use its dual decode. That's a theory and I don't know if it is ever viable to recompile and restructure code to make taken branches more common. It may still harm performance overall way more than what it would help Zen 5...

In any case, I expect the the core to be able to work with current binaries normally, this likely is no secret way to get +32%, real world gains would probably be minor anyway.
 

MS_AT

Senior member
Jul 15, 2024
207
497
96
I don't think you quite "support" the dual-decode feature, this is probably about tuning or compiler having a built-on model for it (that it can use when trying to optimize code for zen 5), which may not be super important...

What I wondered about the dual-decode issue is that historically processors AFAIK had worse throughput and performance when doing lots of taken branches, while non-taken branches were easier for them?
I'm not a software dev so I have only a very rough idea of this area, but aren't there profiling optimizations that try to improve performance by structuring code in such ways that that when running, the non-taken case is picked more often?

If that is so, this approach may ironically be suboptimal for Zen 5, since it can in theory benefit from higher amount of taken branches, given how they likely enable it to use its dual decode. That's a theory and I don't know if it is ever viable to recompile and restructure code to make taken branches more common. It may still harm performance overall way more than what it would help Zen 5...

In any case, I expect the the core to be able to work with current binaries normally, this likely is no secret way to get +32%, real world gains would probably be minor anyway.
That's because historically by default forward jumps [relative to current value of program counter so going "ahead"] were assumed to be non-taken and those correctly predicted branches do not allocate into the branch target buffer[BTB] [IIRC allocation happens on first misprediction]. At the same time backwards branches were assumed to be taken as these are usually assumed to be loops. So ideally if you structured your branches in the way that the most likely outcome would fit with "not taken" you freed resources [BTB] for other parts of the code that might need it more. And you avoided initial misprediction penalty. Now, those are a basics and I might have gotten something wrong, so hopefully somebody can correct me. Oh and TAGE and modern state of the art solutions are most likely more sophisticated than this
 

Nothingness

Diamond Member
Jul 3, 2013
3,031
1,971
136
Re: GCC decoder intelligence...
Essentially, this means that the compiler isn't presenting the actual machine language instructions to the CPU in a manner that is (near) optimal for it's instruction layout. This can (theoretically) hurt it's ability to efficiently, quickly and intelligently convert machine language instructions that it is receiving into actual work functions in the core itself. According to certain absent posters here that will not be named, modern out of order processors are completely immune to anything that a compiler or programmer can throw at them and can adjust on the fly to these things and the above has absolutely zero effect on processor performance. In reality, how the machine language instructions are presented to the decoder can have a measurable effect on processor performance and efficiency.

Time and updates to GCC will show the truth. I suspect that it won't be a big change in MOST, but not all, cases.
That’s not my experience. Do you have any example where specifying a specific micro-architecture (as opposed to enable the use of new instructions) changed significantly the run time of a program?

And, again, if that kind of specifics has a significant impact on performance, then your uarch is a failure. CPU design teams use thousands of workloads and traces generated by old compilers, and try ensure the choices they made bring improvements on most, if not all, of these.
 

Nothingness

Diamond Member
Jul 3, 2013
3,031
1,971
136
No one said it did. It doesn't and that is why AMD didn't contribute anything to gcc for it. But in theory you can be a bit clever if you know that you can now chain likely-take. branches.
Compilers will have a hard time laying out code to please the 2x4 decoders of Zen5. If you do hand tuning, you might extract something, but that might be slower on all other uarch.
 

LightningZ71

Golden Member
Mar 10, 2017
1,783
2,137
136
I can't give you specifics that are current as my IC career ended well over a decade ago, but, back when I was programming regularly, there were modern processors that had support for what we referred to as legacy instructions for compatibility purposes and more modern instructions that could be used to complete the same task that had either lower latency or higher throughput, etc. Often, these were instructions that were implemented in microcode instead of hard coded in the transistors and would take multiple cycles to complete or would stall the pipe so to speak. Compiling for maximum portability across past architectures would include using the bad instruction instead of using the newer one. when you compiled with flags for modern architecture. There were several processors through the years that had cases where you would avoid certain instructions, even if it meant issuing multiple newer ones.

As for optimizing for dual decoders, aside from purely academic stuff, I wouldn't know how that gets implemented at the compiler level. Compiler coding is, from my point of view, a dark art, filled with witchcraft and deals with the dark one. It seems superficially obvious that the compiler would make more of an effort to find non-dependent instructions to group together so that when the CPU core fetches instructions (it rarely fetches a single instruction at a time these days), it's fetching two or more instructions that don't have to wait on each other and can be fed to separate variable length instruction decoders to be sent down to the various downstream functional units. In my programming day, I did very little assembly coding and essentially lived on function calls and making a few of my own libraries for what I needed to do, so I rarely bothered with this level of instruction ordering.

As for it affecting program performance, as I said before, it will be HIGHLY dependent on what the program does and how it's written. You MIGHT be able to take old programs that you have the source and libraries for and recompile them with newer flags and get SOME improvement, where some is nearly none to maybe 10%. If you went back and actually opened up the functions, took a close look at instruction ordering, and hinted to the compiler where it could improve things, you might get somewhere.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,480
2,957
136
Strix Halo is really big. Anyone hoping It It will be relatively cheap should forget about that, just look at what they ask for a Strix Point laptop.
My prediction is >2000 euro and for that only 4070 80W level of performance is not very good.
In my opinion Strix Halo is not aimed for gamers, that's just secondary. The main selling point is the 16C32T CPU paired with 64-128GB RAM.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,516
4,303
136
Strix Halo is really big. Anyone hoping It It will be relatively cheap should forget about that, just look at what they ask for a Strix Point laptop.
My prediction is >2000 euro and for that only 4070 80W level of performance is not very good.
In my opinion Strix Halo is not aimed for gamers, that's just secondary. The main selling point is the 16C32T CPU paired with 64-128GB RAM.

Strix Point laptops are expensive because there s no competition in this segment but since the chip is the same size or so as Hawk Point it will gradually be cheaper as time goes by.

Strix Halo cost more to manufacture but that s surely less than say a 8840 APU + RX 7600 chip, and AMD will cash on both the CPU and GPU, so on the mid term it will also be substancially cheaper once some RD cost is amortized.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,954
4,481
136
Hopefully Strix Halo will do well, but that seems to depend on two things. How they price it, and how much people will pay attention to it without having a certain green sticker.
 

poke01

Golden Member
Mar 8, 2022
1,995
2,534
106
Strix Halo points to the future of windows laptops. It will likely be expensive and it will be a niche product but it’s the most innovative product out of all the client Zen 5 products.

The few issues I have with it are the CCD is not 3NE and it’s using RDNA 3.5 and not 4. But it’s builds a base and I hope it’s successful because of it is we don’t have deal with crap like VRAM limitations ever on a laptop cause of the unified memory and who doesn’t like big APUs.
 
Reactions: Tlh97
Jun 1, 2024
122
169
76
Strix Halo is really big. Anyone hoping It It will be relatively cheap should forget about that, just look at what they ask for a Strix Point laptop.
My prediction is >2000 euro and for that only 4070 80W level of performance is not very good.
In my opinion Strix Halo is not aimed for gamers, that's just secondary. The main selling point is the 16C32T CPU paired with 64-128GB RAM.

yeah but what's the point when there's fire range (mobile 9950x and soon 9950x3d) ?
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,492
5,054
96
Is it like >$2500 expensive? Hopefully it’s around the $2000 mark but the 128GB will be very very expensive, yeah I see now that will be a niche SKU.
Less, the whole point is selling it for less money than a comparable CPU + dGFX combo.
Halo is a prototype for PS6 i'm sure
It's a proto-Medusa.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |