Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 841 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

RTX2080

Senior member
Jul 2, 2018
334
533
136

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,463
5,520
136
Big if true. A base frequency at or around the 7800X3D's max effective gaming clock could mean substantial gains compared to the previous generation from clockspeed alone... even comparing max assumed 5.5GHz boost vs 5.0GHz boost that would be 10% just in clocks. And probably more in worst-case scenarios (read: improvement in min FPS).

I'm still not sure I would bother with a 10-15% uplift to upgrade, but it's certainly much more reasonable if needing the performance versus 5% gains. Likely with a bigger productivity boost than with gaming.
 

StefanR5R

Elite Member
Dec 10, 2016
6,056
9,106
136
Yeah, Intel looks bad there.
Not really, just a different appproach.

AMD is 26% more performance at 27% more power.

Intel is 18% more perfomance at 3% more power.

Overall it's just different, I would even argue that Intels approach is better, because it's basically free performance at same power.
It's weird because other tests (i.e. Phoronix Epyc tests) show that enabling SMT has little to no impact on power draw but generally increases performance.

So I wonder what can be extrapolated from either set of tests.
The figures from David Huang's tweet are single-core results (single-thread vs. dual-thread runs on same core) at fixed clock speed,¹ relative to Vcore VRM power.

Phoronix' EPYC figures on the other hand are from highly threaded tests at variable clock speeds,² of which many tests utilized much or all of the socket power limit (that is, similar-power tests or nearly fixed-power tests), and they related socket performance with socket power use.
________
¹) and a wastefully high clock speed at that
²) speeds which were more or less close to the power efficiency sweet spot, likely closer in the "SMT on" case

[....] it's basically free performance at same power. Which makes it even more unbelievable that they removed it for new consumer CPUs.
Of the CPU makers who put two different core microarchitectures into one and the same CPU, Intel was the only one who additionally put HyperThreading into one type of these cores. As others already pointed out, how is an operating system's thread scheduler supposed to handle this? I don't know though if, or how much, this played a role in Intel's decision to remove HT in upcoming heterogeneous client CPUs (in which wide computing parallelism is not the top concern, Cinebench marketing figures aside). BTW, in another segment, small server CPUs = Xeon E, Intel did something else and disabled heterogeneity instead.

[...] Intel says "Optimized for PPA" on their slide for the Arrow Lake P-cores:
View attachment 109751
This. Doesn't. Make. Sense.
Well, at least the headline on the slide is "Lion Cove", not "Arrow Lake".
 

Timmah!

Golden Member
Jul 24, 2010
1,526
859
136
9800X3D looking to be the first X3D that will be good at everything, if not stellar at everything
That would be hypothetical 9950x3D with 2 v-cache dies

BTW, we always talk here about how using cores on the other die is detrimental for gaming, cause of the higher latency, and thats why all the core-parking shenanigans and whatnot. Now this goes for sure for games, that use 8 cores max, in which case you dont want it randomly to use one of the second die cores, fair enough. But what about games (or apps potentially benefitting from v-cache) that can use more than 8 cores? Is that latency so big, that those cores are pretty much useless and make the experience actually worse, or do the additional cores on the second die actually provide tangible benefit? There seems to be little info on this topic, granted most games/apps dont need more than 8 cores, so thats i guess, why. Anyway, i would be interested.
 
Reactions: Tlh97 and Joe NYC

CakeMonster

Golden Member
Nov 22, 2012
1,522
690
136
I'm conflicted about any 2-CCD cpu after the 7950X because of the shenanigans with scheduling. I leave a lot of stuff on and even run jobs in the background when I game, I suspect the priorities or core parking would not properly shut off (nor do I really want it to either).
 

fastandfurious6

Senior member
Jun 1, 2024
214
311
96
these SMT differences are very important in the public cloud server business

all providers sell VMs with "vCPUs" = 50/50 real/SMT cores

but vast majority of customers actually don't really know this

AMD having superior SMT performance is a huge win for preference by cloud providers
 
Reactions: Tlh97
Jul 27, 2020
20,899
14,488
146
Hah! My performance is Extremely High!! Eat that console peasants 😅
9950X 102FCLK CO1 -20 CO2 -30 MEM6120MHz PBO 220W GPU Stock
Thanks!

Well, all I can say is, you are quite behind compared to my screenshot

I'm not that well-versed in this benchmark's scores. Would the score increase with a faster GPU? How much would your score decrease if you power limit your GPU, by say, 100W?

Rest of you rich brats, where are the scores???
 

OneEng2

Senior member
Sep 19, 2022
259
356
106
The aspect of performance per power is only one of the reasons for implementing SMT. It's nice to get it without increasing power usage much, but even at linear power increase it would be desirable because that's better than the less than linear performance-per-power increase at high frequencies. The other argument for SMT is more performance per area because the additional transistors require much less area (around 5%) than they add in performance (15-30%). The source is this document, and the money quotes are:



So...

Why do I say this? Because Intel says "Optimized for PPA" on their slide for the Arrow Lake P-cores:

View attachment 109751

This. Doesn't. Make. Sense.

And sorry for this now basically being a post for the Intel thread, but we are also discussing AMD's more effective implementation of the same technology.
It doesn't make sense to me either. Essentially, Intel is going with SMP vs SMT like AMD did with Bulldozer. Furthermore, having SMT in both P and E cores provides the OS with a very simple scheduling task compared to having P cores with SMT and E cores without.
Maybe the simplest answer is correct: removing SMT simplifies validation of the design, and makes it easier to schedule threads between the P cores and the E cores. That’s it. Anything beyond that is just marketing so that the consumer doesn’t feel like they got a downgrade.
What makes validation easier is having only ONE CPU architecture stack to debug. This is truly a crazy argument IMO.
I would disagree here. I think the main reasons are "lots of design decisions made around improving SMT performance", making AMD CPUs great scalable multicore processors and not wanting to spam their CPUs with smaller cores (reasons for which I'm not entirely sure of but I think they have Mont type cores undergoing development and they are not yet ready for prime time).

We have seen many examples where AMD has shown Intel how to do things right. SMT is just one of such things where they took a security first approach to implementing secure boundaries between the physical and virtual threads. Another is AMD's mitigations against Spectre/Meltdown which Phoronix showed that turning off these mitigations actually makes AMD CPUs run a bit slower whereas doing the same on Intel CPUs makes them run faster. So Intel's mitigations are doing additional checks or preventing some optimized pathways from working whereas AMD's mitigations were built into the core design itself and they figured out optimizations to steal the performance back from these mitigations and turning the mitigations off also turns off those mitigation "mitigating" optimizations.

I think if AMD gives up SMT without ever exploring SMT4, it will be when they can have a whole frickin' sea of cores in an area smaller than Intel Monts.
I think the "whole frickin sea of cores" concept is where Skymont is going. This flies in the face of the PPA SMT is proven to provide. I think Intel will need to rethink this approach as they will certainly pay for it in DC for sure.
Whoa! Talk about a 1-2 punch! So we are going to have a ZEN5 CPU on N4P massively outpacing a newer Intel Arrow Lake processor on N3B (the most dense process TSMC currently offers using the most EUV layers.... and the most expensive process TSMC offers). That is going to sting for this cycle. Still, the more important battles will be in the laptop and DC markets, so it may not be that bad for Intel.
That would be hypothetical 9950x3D with 2 v-cache dies

BTW, we always talk here about how using cores on the other die is detrimental for gaming, cause of the higher latency, and thats why all the core-parking shenanigans and whatnot. Now this goes for sure for games, that use 8 cores max, in which case you dont want it randomly to use one of the second die cores, fair enough. But what about games (or apps potentially benefitting from v-cache) that can use more than 8 cores? Is that latency so big, that those cores are pretty much useless and make the experience actually worse, or do the additional cores on the second die actually provide tangible benefit? There seems to be little info on this topic, granted most games/apps dont need more than 8 cores, so thats i guess, why. Anyway, i would be interested.
I keep wondering when games will be developed that are designed to operate on processors with more than 8 cores. Within a year or two, I think the bottom end CPU's will likely have 16 cores minimum.
 

lightmanek

Senior member
Feb 19, 2017
476
1,092
136
Thanks!

Well, all I can say is, you are quite behind compared to my screenshot

I'm not that well-versed in this benchmark's scores. Would the score increase with a faster GPU? How much would your score decrease if you power limit your GPU, by say, 100W?

Rest of you rich brats, where are the scores???


All I had time to do before F1 was test at Maximum settings and that dropped score to 43500
 

Rheingold

Member
Aug 17, 2022
69
203
76
Well, at least the headline on the slide is "Lion Cove", not "Arrow Lake".
Yeah, "Lion Cove" is the name of the Arrow Lake P-cores. You can read the whole slide deck here.

The statement is not from the slides for Lunar Lake, which also uses Lion Cove for its P-cores:



Here the statement is basically the opposite: The performance per area with HT could have been 17.6% higher but they threw that away. They optimized the Lunar Lake P-cores for performance-per-power, not for performance-per-area. This makes sense for such a power-constrained design as Lunar Lake is.

The only thing that makes sense for Arrow Lake is that they were strapped for budget, just copy-pasted the design, didn't care to re-insert and validate HT and then pulled the nonsensical PPA statement out of their ass. This at least partially fits in with @Saylick's statements.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |