Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Hulk · Nov 16, 2024

lightmanek said:
I decided to do a quick run at static 5GHz 1.1V setting, all else equal to my last run. This is by no means the lowest Vcore I can use, just one I picked up looking at auto PBO voltages.
View attachment 111666

Nice score increase in multicore and much lower load power at the same time, I like. ASUS allows me to set profiles with boost still active, might play with that later

Wow! 158 Watts, 5GHz, and a score of nearly 44,000.
Impressive

lightmanek · Nov 16, 2024

Hulk said:
Wow! 158 Watts, 5GHz, and a score of nearly 44,000.
Impressive

Scaling is good as at 180W I've hit over 45000

poke01 · Nov 16, 2024

So I know this is regarding M4 Pro but Strix Halo will have similiar bandwidth.

https://twitter.com/x/status/1857823012600558061

For multi-core it's a combination of IPC, clock speed, cache and memory bandwidth improvements, so basically improved everywhere.M1 Max can only do 100GB/s DRAM read per P-cluster, while >4 cores on M4 Pro from a single cluster have full access to its ~270 GB/s bandwidth

So if Zen5 in Strix Halo can do the same then things will get interesting and fun in memory bound tests. I really hope thats the case.

MS_AT · Nov 16, 2024

poke01 said:
So I know this is regarding M4 Pro but Strix Halo will have similiar bandwidth.

https://twitter.com/x/status/1857823012600558061

For multi-core it's a combination of IPC, clock speed, cache and memory bandwidth improvements, so basically improved everywhere.M1 Max can only do 100GB/s DRAM read per P-cluster, while >4 cores on M4 Pro from a single cluster have full access to its ~270 GB/s bandwidth

So if Zen5 in Strix Halo can do the same then things will get interesting and fun in memory bound tests. I really hope thats the case.

We know(?) they will use 64B links at unknown frequency for CCD to IOD connection on Halo. Assuming 2000MHz as on desktop, that gives us 128 GB/s from single CCD to IOD, so the theoretical guessed maximum one CCD will be able to achieve (this is read bandwidth). For the mixed read/write the max should be 1,5 higher. But either way, 2 CCDs will be needed use full ~250GB/s BW if we consider only CPU part of Strix Halo.

branch_suggestion · Nov 16, 2024

MS_AT said:
Assuming 2000MHz as on desktop

Could be higher, could be even wider.

BorisTheBlade82 · Nov 17, 2024

MS_AT said:
We know(?) they will use 64B links at unknown frequency for CCD to IOD connection on Halo. Assuming 2000MHz as on desktop, that gives us 128 GB/s from single CCD to IOD, so the theoretical guessed maximum one CCD will be able to achieve (this is read bandwidth). For the mixed read/write the max should be 1,5 higher. But either way, 2 CCDs will be needed use full ~250GB/s BW if we consider only CPU part of Strix Halo.

As they will completely change the underlying technology, almost nothing is known. From a pJ/bit POV we are talking about <2 pJ/bit vs. 0.15 pJ/bit. So even quadrupling bandwidth to 256 GB/s per CCD will still bring peak energy savings of more than 50% for the interconnect. Also die area saving are huge in comparison. Although it is not apples to apples, the comparison of the MI300A interconnect on Zen4 vs. IFoP gives a good indication.

Hulk · Nov 17, 2024

Not completely sure about ARL but close enough.

Hail The Brain Slug · Nov 17, 2024

Hulk said:
Not completely sure about ARL but close enough.

View attachment 111738

Why do we keep idolizing single benchmarks as some sort of truth about the relative power of a design

Hulk · Nov 17, 2024

Not idolizing, just putting up some data?

Hail The Brain Slug · Nov 17, 2024

Hulk said:
Not idolizing, just putting up some data?

Sorry, it was greater commentary since I have been bombarded with R23 everywhere for every cpu release recently it's like other benchmarks almost don't exist.

gdansk · Nov 17, 2024

Hail The Brain Slug said:
Sorry, it was greater commentary since I have been bombarded with R23 everywhere for every cpu release recently it's like other benchmarks almost don't exist.

R23 is very predictable benchmark and usually easy to replicate. It has a use: to show how much throughput a chip can achieve in an embarrassingly parallel workload.

I'm not arguing it should influence buying behavior. Just that it has at least one use. Perhaps other benchmarks do it better but it's so easy to install and run that I use it for testing cooling.

MS_AT · Nov 17, 2024

BorisTheBlade82 said:
As they will completely change the underlying technology, almost nothing is known. From a pJ/bit POV we are talking about <2 pJ/bit vs. 0.15 pJ/bit. So even quadrupling bandwidth to 256 GB/s per CCD will still bring peak energy savings of more than 50% for the interconnect. Also die area saving are huge in comparison. Although it is not apples to apples, the comparison of the MI300A interconnect on Zen4 vs. IFoP gives a good indication.

Nope, nothing is know, but right now every CCD has 2 GMI links, of which only one per die is connected to IOD on desktop. I doubt they will widen that for HALO specific CCDs, so frequency is the more unknown unknown so to speak, at least in my opinion. The frequency they will want to rather keep lower than higher for the power saving on mobile platform. This would take the least effort from them and would provide adequate BW, seeing that the primary consumer of the SoC is supposed to be the GPU, not the CPU. Of course, I am guessing, might be wrong

Hulk · Nov 17, 2024

gdansk said:
R23 is very predictable benchmark and usually easy to replicate. It has a use: to show how much throughput a chip can achieve in an embarrassingly parallel workload.

I'm not arguing it should influence buying behavior. Just that it has at least one use. Perhaps other benchmarks do it better but it's so easy to install and run that I use it for testing cooling.

It's also nice because no installation is necessary. R23 is so much more fun to watch than R24 too.

Josh128 · Nov 17, 2024

lightmanek said:
I decided to do a quick run at static 5GHz 1.1V setting, all else equal to my last run. This is by no means the lowest Vcore I can use, just one I picked up looking at auto PBO voltages.
View attachment 111666

Nice score increase in multicore and much lower load power at the same time, I like. ASUS allows me to set profiles with boost still active, might play with that later

Hulk said:
It's also nice because no installation is necessary. R23 is so much more fun to watch than R24 too.

Agree, R24 is incredibly boring and long.

poke01 · Nov 17, 2024

Josh128 said:
Agree, R24 is incredibly boring and long.

One good about R24 is the included redshift GPU test. It’s a quick way of measuring RT performance

inquiss · Nov 17, 2024

MS_AT said:
Nope, nothing is know, but right now every CCD has 2 GMI links, of which only one per die is connected to IOD on desktop. I doubt they will widen that for HALO specific CCDs, so frequency is the more unknown unknown so to speak, at least in my opinion. The frequency they will want to rather keep lower than higher for the power saving on mobile platform. This would take the least effort from them and would provide adequate BW, seeing that the primary consumer of the SoC is supposed to be the GPU, not the CPU. Of course, I am guessing, might be wrong

What if the really did increase the width of the link to the IOD though? What if they did?

DrMrLordX · Nov 17, 2024

inquiss said:
What if the really did increase the width of the link to the IOD though? What if they did?

It would probably increase idle power consumption, as well as tilt the maximum power budget more towards I/O and less towards other functions on the package.

LightningZ71 · Nov 17, 2024

What if the width that was actually used was dynamic? For example, low power idle, where performance was the lowest priority, could only light up 32 bits of the channel at a low mhz frequency. Maximum usage could light up all 128 bits of it at the full data rate. Like a PCIe connection with 4 roots at 4 bits each. One root for low power, 2 roots for light but active usage, and all 4 roots for maximum effort.

DrMrLordX · Nov 17, 2024

LightningZ71 said:
What if the width that was actually used was dynamic? For example, low power idle, where performance was the lowest priority, could only light up 32 bits of the channel at a low mhz frequency. Maximum usage could light up all 128 bits of it at the full data rate. Like a PCIe connection with 4 roots at 4 bits each. One root for low power, 2 roots for light but active usage, and all 4 roots for maximum effort.

Weren't the (now fixed) intercore latency problems in Granite Ridge related to attempts to throttle I/O performance as a power saving measure during idle/low loads?

LightningZ71 · Nov 18, 2024

Dunno, but if it was, maybe next time they get it right.

inquiss · Nov 18, 2024

DrMrLordX said:
It would probably increase idle power consumption, as well as tilt the maximum power budget more towards I/O and less towards other functions on the package.

So maybe this increases idle power usage, but maybe also it doesn't because it's gated off and the low power cores are the only thing active during idle...?

MS_AT · Nov 18, 2024

LightningZ71 said:
What if the width that was actually used was dynamic? For example, low power idle, where performance was the lowest priority, could only light up 32 bits of the channel at a low mhz frequency. Maximum usage could light up all 128 bits of it at the full data rate. Like a PCIe connection with 4 roots at 4 bits each. One root for low power, 2 roots for light but active usage, and all 4 roots for maximum effort.

You are confusing bits with Bytes. They already have 32B/16B R/W per link and 2 links on CCD. Making it 4 links I guess would require more than trivial changes in the CCD itself, what would require 5th Zen5 CCD desgin as you also need to push that BW further into the chip. And the width should already be somewhat dynamic as on Turin slides they advertise GMI folding.

gaav87 · Nov 18, 2024

I got, a 9700x while waiting for 9800x3d to ship (already paid for 9800x3d 2min after launch still gone in whole EU)

I think i got some kind of platinum sample of 9700x (all SP for core-quality are 120 or 119), It launched 8200mt/s 2200fclk out of the box just clicked xmp tweaked in bios and done.
8400mt/s works also but at 1.62v with ram fan

Zen5 is bottlenecked by fclk HARD. 8400mt/s 2100fclk 2:1:1 has the same bandwidth as 8000/2200fclk 1.45v only benefit is reduced latency by 3ns

9700x stock is slower vs my max tuned for years 5800x3d 2000fclk 4000cl16 102.5bclk by ~2-5%
9700x 8000/2200 light tuned is between 5% and 40% faster depending on game. Only tested cyberpunk, wukong, final fantasy and riftbreaker
Im currently testing
4600mhz 2300fclk 2300uclk 1:1:1 - results are interesting xD

LightningZ71 · Nov 18, 2024

MS_AT said:
You are confusing bits with Bytes. They already have 32B/16B R/W per link and 2 links on CCD. Making it 4 links I guess would require more than trivial changes in the CCD itself, what would require 5th Zen5 CCD desgin as you also need to push that BW further into the chip. And the width should already be somewhat dynamic as on Turin slides they advertise GMI folding.

It was more a hypothetical example of what they could do. Break the link into sub links that could be dynamically and individually turned off and on as well as throttled as needed.

BorisTheBlade82 · Nov 18, 2024

MS_AT said:
Nope, nothing is know, but right now every CCD has 2 GMI links, of which only one per die is connected to IOD on desktop. I doubt they will widen that for HALO specific CCDs, so frequency is the more unknown unknown so to speak, at least in my opinion. The frequency they will want to rather keep lower than higher for the power saving on mobile platform. This would take the least effort from them and would provide adequate BW, seeing that the primary consumer of the SoC is supposed to be the GPU, not the CPU. Of course, I am guessing, might be wrong

Just as with MI300A, I am expecting AMD to use fully separate function blocks on the CCD for the interconnect on Halo.
History repeating itself: Zen3 already had TSVs for 3D$ that we only found out about later on.
Zen4 already had the SoIC blocks for MI300A that we only found out about later on.
And Zen5 pretty surely also already has these new blocks as well. The GMI links will be inactive on Halo - at least, that is my educated guess.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Senior member

Platinum Member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Platinum Member

Senior member

Lifer

Golden Member

Lifer

Golden Member

Senior member

Senior member

Member

Golden Member

Senior member