Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 711 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StefanR5R

Elite Member
Dec 10, 2016
5,892
8,763
136
Off topic,
Surely, you’ve noticed that multiple countries have begun spending hundreds of billions of dollars to ensure that they have angstrom scale fabs inside their borders. Each of those countries will massively increase spending and several other nations will join them.
No, spending on this will at best stagnate. They will all have more pressing issues to solve, even if we consider only their supply chain issues and ignore everything else that's mounting.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
I am afraid it's about bad marketing message and miscommunication. The materials were mentioning that decoders are statically partitioned in SMT mode. Now traditionally when you wanted to turn off SMT, you went to BIOS and disabled it. Now the question is, is the SMT mode static when enabled [If SMT is on in the BIOS is the core always in SMT mode] or is it dynamic like the interviews are leading us to believe.

From a technical perspective, for the Strix mobile parts, the front end could be curtailed in the interest of efficiency if the memory subsystem cannot sustain the BW needed. (e.g. smaller L3, slower memory).
In which case, if true, really supports the argument that the fabric and memory/cache hierarchy from L2 onwards need an big improvement.

David already mentioned, there are changes in the front end behavior across the BIOSes.
It can be seen that AMD intentionally limited the performance of the processor front end and some instruction combinations in the previous microcode.

We will see if this is the case with the DT parts.
 

CouncilorIrissa

Senior member
Jul 28, 2023
522
2,003
96
From a technical perspective, for the Strix mobile parts, the front end could be curtailed in the interest of efficiency if the memory subsystem cannot sustain the BW needed. (e.g. smaller L3, slower memory).
In which case, if true, really supports the argument that the fabric and memory/cache hierarchy from L2 onwards need an big improvement.
The classic cores in Strix have the same amount of L3 per core as desktop tho, 4MB.

Seriously though, is Zen "5.5" one of the targets they are working on until the launch of Turin? Which, at this time, is still communicated as "in 2H 2024" AFAIK. (3,638 hours left.)
You think they'd release the client lineup without enabling all of the core's features because they aren't ready? Seems highly unlikely, they'd rather postpone the launch I think. They don't need early reviews to be less positive than they could be.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,474
1,966
136
I am a straight up bubble huffer. I say that AI functions will be the main reason for sales of computing devices by 2040. I still find it bizarre there’s so many people on various text site forums don’t see that AI is THE future of computing in society.

I am also broadly optimistic on AI. We are definitely currently in a bubble and it will eventually burst, but AI will not die then and there will be new growth after that.

I am not optimistic on edge AI on client devices. The current offerings are clearly limited, and the only way we know to improve is grow the models. So compute demand will grow with supply, offerings stuck at a lower performance level will likely lose in the market because they will simply be worse.

And when you want to scale performance up, client has the unfixable problem of being less efficient than centralized. Because utilization will be lower (usage by a single human is bursty), and because the only way to get reasonable cache locality is batching. Meaning that it will simply cost a lot less to provide a service with equivalent quality in a centralized system with fairly thin clients, than it costs to put the AI on the client. And this will only ever end when AI is well more than "good enough", which will probably take decades.

Yes, there is less privacy on centralized systems. But most people are still using facebook and tiktok, so they clearly don't care.
 

MS_AT

Senior member
Jul 15, 2024
210
507
96
Hmm why does Zen5C have higher FP IPC than vanilla Zen5 ? (performance/GHz)

View attachment 104342

Somethings fishy with the clockspeeds i guess
To do per core test, you need to pin the test to the core. The thing is, they running under WSL2, so they are running a Linux inside Hyper-V virtual machine [this what WSL2 is], so they can pin all they want, hypervisor won't care... https://github.com/Microsoft/WSL/issues/3827 unless M$ fixed that but the issue is still open on github. In other words they may think they are running the test on the Z5c core, but in reality it might be running on either Z5 or Z5c or on both of them.

At least to the best of my knowledge. Might be there exist a way to reliably pin workload to the core then I would be grateful if somebody could share it.
 

yottabit

Golden Member
Jun 5, 2008
1,485
514
146
I really thought Strix Halo would have on-package memory for some reason. Hmm

The idea of having to actually disable SMT to get a 1t uplift on say a 9950x doesn’t seem too bad, I mean that would work well for my use cases at least

However I’m highly skeptical of something like that existing, if the tech for dynamic allocation of those decode units isn’t there I doubt disabling SMT would magically enable them to be shared on a core. We haven’t seen any evidence of that yet right?
 

LightningZ71

Golden Member
Mar 10, 2017
1,785
2,139
136
With Zen5c-256 being denser and targeted at lower clock speeds, are they achieving lower internal cache access latencies as compared to full Zen5-256? Those latencies can be important for Spec as most of it largely runs in cache. If they fine tuned it further, they might also be achieving slightly better latencies on certain complex instructions as well.

Has anyone done a full instruction latency profile on Zen5 vs Zen5c?
 

StefanR5R

Elite Member
Dec 10, 2016
5,892
8,763
136
Hmm why does Zen5C have higher FP IPC than vanilla Zen5 ? (performance/GHz)
Remember what "IPC" is.
– Actual meaning of the TLA: instructions per clock
– Terrible misuse of the TLA: iso-clock performance
– Plan 9 From Outer Space level abuse of the TLA: clock-normalized performance
The diagram labels indicate which one of these three is shown.
 

LightningZ71

Golden Member
Mar 10, 2017
1,785
2,139
136
Latency isn't just a function of cache density. Latencies are dialed in to account for worst case transistor response and transmission times. With lower target clocks, you need to wait fewer clock cycles to cover the same actual time delay, enabling the reduction in clock cycle latencies designed into the hardware.

IIRC, pne of the things we see in Apple's lower clocked cores is large caches with lower clock latencies. This is one way that they extract higher average IPC from their designs.

I've no idea if AMD tried any of this, or just left Zen5c as just a denser Zen5 with no other notable changes. Adjusting these latencies is absolutely not as trivial as it might seem.
 

StefanR5R

Elite Member
Dec 10, 2016
5,892
8,763
136
I've no idea if AMD tried any of this, or just left Zen5c as just a denser Zen5 with no other notable changes.
There are at least AnandTech's core-to-core latency measurements, in which latency (in nanoseconds, not necessarily in cycle counts) between dense cluster members is larger than between classic cluster members. Granted, the classic cluster has got only half the number of cores attached to its internal bus as the dense cluster has, IOW the classic cluster's bus is topologically smaller.

However, these measurement results are weird (to me) as they show thread siblings having the same CPU-to-CPU latency as L3$ siblings for HX 370 (in contrast to 7940HS, where thread siblings reach each other considerably quicker).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |