Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 892 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hulk

Diamond Member
Oct 9, 1999
4,701
2,863
136
I decided to do a quick run at static 5GHz 1.1V setting, all else equal to my last run. This is by no means the lowest Vcore I can use, just one I picked up looking at auto PBO voltages.
View attachment 111666

Nice score increase in multicore and much lower load power at the same time, I like. ASUS allows me to set profiles with boost still active, might play with that later
Wow! 158 Watts, 5GHz, and a score of nearly 44,000.
Impressive
 

poke01

Platinum Member
Mar 8, 2022
2,581
3,409
106
So I know this is regarding M4 Pro but Strix Halo will have similiar bandwidth.


For multi-core it's a combination of IPC, clock speed, cache and memory bandwidth improvements, so basically improved everywhere.M1 Max can only do 100GB/s DRAM read per P-cluster, while >4 cores on M4 Pro from a single cluster have full access to its ~270 GB/s bandwidth


So if Zen5 in Strix Halo can do the same then things will get interesting and fun in memory bound tests. I really hope thats the case.
 

MS_AT

Senior member
Jul 15, 2024
365
798
96
So I know this is regarding M4 Pro but Strix Halo will have similiar bandwidth.


For multi-core it's a combination of IPC, clock speed, cache and memory bandwidth improvements, so basically improved everywhere.M1 Max can only do 100GB/s DRAM read per P-cluster, while >4 cores on M4 Pro from a single cluster have full access to its ~270 GB/s bandwidth

So if Zen5 in Strix Halo can do the same then things will get interesting and fun in memory bound tests. I really hope thats the case.
We know(?) they will use 64B links at unknown frequency for CCD to IOD connection on Halo. Assuming 2000MHz as on desktop, that gives us 128 GB/s from single CCD to IOD, so the theoretical guessed maximum one CCD will be able to achieve (this is read bandwidth). For the mixed read/write the max should be 1,5 higher. But either way, 2 CCDs will be needed use full ~250GB/s BW if we consider only CPU part of Strix Halo.
 

BorisTheBlade82

Senior member
May 1, 2020
687
1,084
136
We know(?) they will use 64B links at unknown frequency for CCD to IOD connection on Halo. Assuming 2000MHz as on desktop, that gives us 128 GB/s from single CCD to IOD, so the theoretical guessed maximum one CCD will be able to achieve (this is read bandwidth). For the mixed read/write the max should be 1,5 higher. But either way, 2 CCDs will be needed use full ~250GB/s BW if we consider only CPU part of Strix Halo.
As they will completely change the underlying technology, almost nothing is known. From a pJ/bit POV we are talking about <2 pJ/bit vs. 0.15 pJ/bit. So even quadrupling bandwidth to 256 GB/s per CCD will still bring peak energy savings of more than 50% for the interconnect. Also die area saving are huge in comparison. Although it is not apples to apples, the comparison of the MI300A interconnect on Zen4 vs. IFoP gives a good indication.
 

gdansk

Diamond Member
Feb 8, 2011
3,276
5,186
136
Sorry, it was greater commentary since I have been bombarded with R23 everywhere for every cpu release recently it's like other benchmarks almost don't exist.
R23 is very predictable benchmark and usually easy to replicate. It has a use: to show how much throughput a chip can achieve in an embarrassingly parallel workload.

I'm not arguing it should influence buying behavior. Just that it has at least one use. Perhaps other benchmarks do it better but it's so easy to install and run that I use it for testing cooling.
 

MS_AT

Senior member
Jul 15, 2024
365
798
96
As they will completely change the underlying technology, almost nothing is known. From a pJ/bit POV we are talking about <2 pJ/bit vs. 0.15 pJ/bit. So even quadrupling bandwidth to 256 GB/s per CCD will still bring peak energy savings of more than 50% for the interconnect. Also die area saving are huge in comparison. Although it is not apples to apples, the comparison of the MI300A interconnect on Zen4 vs. IFoP gives a good indication.
Nope, nothing is know, but right now every CCD has 2 GMI links, of which only one per die is connected to IOD on desktop. I doubt they will widen that for HALO specific CCDs, so frequency is the more unknown unknown so to speak, at least in my opinion. The frequency they will want to rather keep lower than higher for the power saving on mobile platform. This would take the least effort from them and would provide adequate BW, seeing that the primary consumer of the SoC is supposed to be the GPU, not the CPU. Of course, I am guessing, might be wrong
 

Hulk

Diamond Member
Oct 9, 1999
4,701
2,863
136
R23 is very predictable benchmark and usually easy to replicate. It has a use: to show how much throughput a chip can achieve in an embarrassingly parallel workload.

I'm not arguing it should influence buying behavior. Just that it has at least one use. Perhaps other benchmarks do it better but it's so easy to install and run that I use it for testing cooling.
It's also nice because no installation is necessary. R23 is so much more fun to watch than R24 too.
 

Josh128

Senior member
Oct 14, 2022
511
865
106
I decided to do a quick run at static 5GHz 1.1V setting, all else equal to my last run. This is by no means the lowest Vcore I can use, just one I picked up looking at auto PBO voltages.
View attachment 111666

Nice score increase in multicore and much lower load power at the same time, I like. ASUS allows me to set profiles with boost still active, might play with that later

It's also nice because no installation is necessary. R23 is so much more fun to watch than R24 too.
Agree, R24 is incredibly boring and long.
 
Reactions: lightmanek

inquiss

Senior member
Oct 13, 2010
250
354
136
Nope, nothing is know, but right now every CCD has 2 GMI links, of which only one per die is connected to IOD on desktop. I doubt they will widen that for HALO specific CCDs, so frequency is the more unknown unknown so to speak, at least in my opinion. The frequency they will want to rather keep lower than higher for the power saving on mobile platform. This would take the least effort from them and would provide adequate BW, seeing that the primary consumer of the SoC is supposed to be the GPU, not the CPU. Of course, I am guessing, might be wrong
What if the really did increase the width of the link to the IOD though? What if they did?
 

LightningZ71

Golden Member
Mar 10, 2017
1,910
2,260
136
What if the width that was actually used was dynamic? For example, low power idle, where performance was the lowest priority, could only light up 32 bits of the channel at a low mhz frequency. Maximum usage could light up all 128 bits of it at the full data rate. Like a PCIe connection with 4 roots at 4 bits each. One root for low power, 2 roots for light but active usage, and all 4 roots for maximum effort.
 

DrMrLordX

Lifer
Apr 27, 2000
22,184
11,890
136
What if the width that was actually used was dynamic? For example, low power idle, where performance was the lowest priority, could only light up 32 bits of the channel at a low mhz frequency. Maximum usage could light up all 128 bits of it at the full data rate. Like a PCIe connection with 4 roots at 4 bits each. One root for low power, 2 roots for light but active usage, and all 4 roots for maximum effort.

Weren't the (now fixed) intercore latency problems in Granite Ridge related to attempts to throttle I/O performance as a power saving measure during idle/low loads?
 

inquiss

Senior member
Oct 13, 2010
250
354
136
It would probably increase idle power consumption, as well as tilt the maximum power budget more towards I/O and less towards other functions on the package.
So maybe this increases idle power usage, but maybe also it doesn't because it's gated off and the low power cores are the only thing active during idle...?
 

MS_AT

Senior member
Jul 15, 2024
365
798
96
What if the width that was actually used was dynamic? For example, low power idle, where performance was the lowest priority, could only light up 32 bits of the channel at a low mhz frequency. Maximum usage could light up all 128 bits of it at the full data rate. Like a PCIe connection with 4 roots at 4 bits each. One root for low power, 2 roots for light but active usage, and all 4 roots for maximum effort.
You are confusing bits with Bytes. They already have 32B/16B R/W per link and 2 links on CCD. Making it 4 links I guess would require more than trivial changes in the CCD itself, what would require 5th Zen5 CCD desgin as you also need to push that BW further into the chip. And the width should already be somewhat dynamic as on Turin slides they advertise GMI folding.
 

gaav87

Member
Apr 27, 2024
180
380
96
I got, a 9700x while waiting for 9800x3d to ship (already paid for 9800x3d 2min after launch still gone in whole EU)

I think i got some kind of platinum sample of 9700x (all SP for core-quality are 120 or 119), It launched 8200mt/s 2200fclk out of the box just clicked xmp tweaked in bios and done.
8400mt/s works also but at 1.62v with ram fan

Zen5 is bottlenecked by fclk HARD. 8400mt/s 2100fclk 2:1:1 has the same bandwidth as 8000/2200fclk 1.45v only benefit is reduced latency by 3ns

9700x stock is slower vs my max tuned for years 5800x3d 2000fclk 4000cl16 102.5bclk by ~2-5%
9700x 8000/2200 light tuned is between 5% and 40% faster depending on game. Only tested cyberpunk, wukong, final fantasy and riftbreaker
Im currently testing
4600mhz 2300fclk 2300uclk 1:1:1 - results are interesting xD
 

LightningZ71

Golden Member
Mar 10, 2017
1,910
2,260
136
You are confusing bits with Bytes. They already have 32B/16B R/W per link and 2 links on CCD. Making it 4 links I guess would require more than trivial changes in the CCD itself, what would require 5th Zen5 CCD desgin as you also need to push that BW further into the chip. And the width should already be somewhat dynamic as on Turin slides they advertise GMI folding.
It was more a hypothetical example of what they could do. Break the link into sub links that could be dynamically and individually turned off and on as well as throttled as needed.
 
Reactions: MS_AT

BorisTheBlade82

Senior member
May 1, 2020
687
1,084
136
Nope, nothing is know, but right now every CCD has 2 GMI links, of which only one per die is connected to IOD on desktop. I doubt they will widen that for HALO specific CCDs, so frequency is the more unknown unknown so to speak, at least in my opinion. The frequency they will want to rather keep lower than higher for the power saving on mobile platform. This would take the least effort from them and would provide adequate BW, seeing that the primary consumer of the SoC is supposed to be the GPU, not the CPU. Of course, I am guessing, might be wrong
Just as with MI300A, I am expecting AMD to use fully separate function blocks on the CCD for the interconnect on Halo.
History repeating itself: Zen3 already had TSVs for 3D$ that we only found out about later on.
Zen4 already had the SoIC blocks for MI300A that we only found out about later on.
And Zen5 pretty surely also already has these new blocks as well. The GMI links will be inactive on Halo - at least, that is my educated guess.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |