Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 675 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

SarahKerrigan

Senior member
Oct 12, 2014
735
2,034
136
You are asking a code monkey who dropped out of the 42 main program coz I couldn't figure out in one month how to write my own secure malloc() function that passed strict Valgrind checks

I have no clue TBH. But my brain says it's possible.

It's really not, for most stuff. The finer points of instruction scheduling are critical when you're on small or highly static cores. On massive OoO cores, the kinds of optimizations that matter tend to be the ones that compilers try to do anyway, and that affect most implementations of the instruction set rather than a specific one - ie, good instruction selection, minimizing unnecessary fills/spills, autovectorization, initiating loads early, etc, etc.

There's no goldmine of potential performance upside from optimization for a specific core. Most "hand optimization" wins in Free software tend to be manual vectorization, which isn't specific to any given microarchitecture.
 
Jul 27, 2020
19,613
13,477
146
What sorts of optimizations are you proposing?
Here's one idea:



Suppose a developer has done extensive profiling for Zen 4 and made changes to his application so that when Zen 4 is detected, he uses specific hot functions that matter a lot to his application's core performance.

Now suppose he uses the same tool to profile Zen 5 and sees some big differences. Some of his assumptions about Zen 4 no longer hold true with Zen 5. So he creates specific functions for Zen 5 again to make sure that his application can get the most out of the new architecture.

This isn't something unheard of in the software world. Yes, most monkey programmers won't go to all this trouble. But Linux gurus, benchmark writers, game engine developers and authors of widely used software like 7-zip and WinRAR may do that. Can't say anything about the latter since it's closed source but maybe someone can look at 7-zip source and see if there are architecture specific optimizations in that.

Hackers love to hack. If someone like that thinks there is more performance to be squeezed out of a new architecture, you bet they will love to tackle that challenge. Because that's the real fun of hacking. The feeling of satisfaction when you crack a hard problem.
 

gdansk

Platinum Member
Feb 8, 2011
2,837
4,220
136
Zen 5 is expected to offer around 16% IPC in non-tuned code.
But with its weird front end and big FPU I do expect hand-tuned assembly could be much better. But no one outside of Oak Ridge and Lawrence Livermore will even consider that.
 
Reactions: lightmanek

SarahKerrigan

Senior member
Oct 12, 2014
735
2,034
136
Here's one idea:


View attachment 103997
Suppose a developer has done extensive profiling for Zen 4 and made changes to his application so that when Zen 4 is detected, he uses specific hot functions that matter a lot to his application's core performance.

Now suppose he uses the same tool to profile Zen 5 and sees some big differences. Some of his assumptions about Zen 4 no longer hold true with Zen 5. So he creates specific functions for Zen 5 again to make sure that his application can get the most out of the new architecture.

This isn't something unheard of in the software world. Yes, most monkey programmers won't go to all this trouble. But Linux gurus, benchmark writers, game engine developers and authors of widely used software like 7-zip and WinRAR may do that. Can't say anything about the latter since it's closed source but maybe someone can look at 7-zip source and see if there are architecture specific optimizations in that.

Hackers love to hack. If someone like that thinks there is more performance to be squeezed out of a new architecture, you bet they will love to tackle that challenge. Because that's the real fun of hacking. The feeling of satisfaction when you crack a hard problem.

Oh. Well, "runtime performance bottlenecks" are totally a specific answer.

7zip has x86 optimizations. It doesn't have optimizations for any specific microarchitecture AFAIK. See for yourself: https://github.com/mcmilk/7-Zip/tree/master/Asm/x86

You continue to show yourself totally incapable of naming a specific optimization Zen5 would benefit from, and to keep leaning on magical thinking.
 
Jul 27, 2020
19,613
13,477
146
But with its weird front end and big FPU I do expect hand-tuned assembly could be much better. But no one outside of Oak Ridge and Lawrence Livermore will even consider that.
Don't forget John Carmack (his personal pet projects that he doesn't release to the world), Tim Sweeney, all the engine developers of AAA studios, Adobe and other workstation software developers.
 

gdansk

Platinum Member
Feb 8, 2011
2,837
4,220
136
Don't forget John Carmack (his personal pet projects that he doesn't release to the world), Tim Sweeney, all the engine developers of AAA studios, Adobe and other workstation software developers.
I don't know but it is possible. Epic/Rad did write extremely optimized Zen 2 specific code for Unreal. They even had an article about it but I can't find it now.
 

soresu

Diamond Member
Dec 19, 2014
3,190
2,463
136
Don't forget John Carmack (his personal pet projects that he doesn't release to the world), Tim Sweeney, all the engine developers of AAA studios, Adobe and other workstation software developers.
Don't forget about Frostbite engine's former lead dev Johan Andersson who spearheaded the Mantle work with AMD all those years ago.

He and some of the former DICE people left and formed a new company called Embark Studios, which is now a subsidiary of Nexon.
 

StinkyPinky

Diamond Member
Jul 6, 2002
6,883
1,096
126
Got my laptop today, so far very happy with it. It's the 365/24G model from Bestbuy. Looks great, great display on it. Putting it though its paces a bit and there is only very light fan noise so that's good. Definitely some warmth on the upper part of the chassis however, where the extraction cut out is.

Ram config is 4x6 so it maintains full dual channel, not flex mode which is good.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |