- Mar 3, 2017
- 1,749
- 6,614
- 136
Wow! 158 Watts, 5GHz, and a score of nearly 44,000.I decided to do a quick run at static 5GHz 1.1V setting, all else equal to my last run. This is by no means the lowest Vcore I can use, just one I picked up looking at auto PBO voltages.
View attachment 111666
Nice score increase in multicore and much lower load power at the same time, I like. ASUS allows me to set profiles with boost still active, might play with that later
We know(?) they will use 64B links at unknown frequency for CCD to IOD connection on Halo. Assuming 2000MHz as on desktop, that gives us 128 GB/s from single CCD to IOD, so the theoretical guessed maximum one CCD will be able to achieve (this is read bandwidth). For the mixed read/write the max should be 1,5 higher. But either way, 2 CCDs will be needed use full ~250GB/s BW if we consider only CPU part of Strix Halo.So I know this is regarding M4 Pro but Strix Halo will have similiar bandwidth.
For multi-core it's a combination of IPC, clock speed, cache and memory bandwidth improvements, so basically improved everywhere.M1 Max can only do 100GB/s DRAM read per P-cluster, while >4 cores on M4 Pro from a single cluster have full access to its ~270 GB/s bandwidth
So if Zen5 in Strix Halo can do the same then things will get interesting and fun in memory bound tests. I really hope thats the case.
Could be higher, could be even wider.Assuming 2000MHz as on desktop
As they will completely change the underlying technology, almost nothing is known. From a pJ/bit POV we are talking about <2 pJ/bit vs. 0.15 pJ/bit. So even quadrupling bandwidth to 256 GB/s per CCD will still bring peak energy savings of more than 50% for the interconnect. Also die area saving are huge in comparison. Although it is not apples to apples, the comparison of the MI300A interconnect on Zen4 vs. IFoP gives a good indication.We know(?) they will use 64B links at unknown frequency for CCD to IOD connection on Halo. Assuming 2000MHz as on desktop, that gives us 128 GB/s from single CCD to IOD, so the theoretical guessed maximum one CCD will be able to achieve (this is read bandwidth). For the mixed read/write the max should be 1,5 higher. But either way, 2 CCDs will be needed use full ~250GB/s BW if we consider only CPU part of Strix Halo.
Why do we keep idolizing single benchmarks as some sort of truth about the relative power of a design
Sorry, it was greater commentary since I have been bombarded with R23 everywhere for every cpu release recently it's like other benchmarks almost don't exist.Not idolizing, just putting up some data?
R23 is very predictable benchmark and usually easy to replicate. It has a use: to show how much throughput a chip can achieve in an embarrassingly parallel workload.Sorry, it was greater commentary since I have been bombarded with R23 everywhere for every cpu release recently it's like other benchmarks almost don't exist.
Nope, nothing is know, but right now every CCD has 2 GMI links, of which only one per die is connected to IOD on desktop. I doubt they will widen that for HALO specific CCDs, so frequency is the more unknown unknown so to speak, at least in my opinion. The frequency they will want to rather keep lower than higher for the power saving on mobile platform. This would take the least effort from them and would provide adequate BW, seeing that the primary consumer of the SoC is supposed to be the GPU, not the CPU. Of course, I am guessing, might be wrongAs they will completely change the underlying technology, almost nothing is known. From a pJ/bit POV we are talking about <2 pJ/bit vs. 0.15 pJ/bit. So even quadrupling bandwidth to 256 GB/s per CCD will still bring peak energy savings of more than 50% for the interconnect. Also die area saving are huge in comparison. Although it is not apples to apples, the comparison of the MI300A interconnect on Zen4 vs. IFoP gives a good indication.
It's also nice because no installation is necessary. R23 is so much more fun to watch than R24 too.R23 is very predictable benchmark and usually easy to replicate. It has a use: to show how much throughput a chip can achieve in an embarrassingly parallel workload.
I'm not arguing it should influence buying behavior. Just that it has at least one use. Perhaps other benchmarks do it better but it's so easy to install and run that I use it for testing cooling.
I decided to do a quick run at static 5GHz 1.1V setting, all else equal to my last run. This is by no means the lowest Vcore I can use, just one I picked up looking at auto PBO voltages.
View attachment 111666
Nice score increase in multicore and much lower load power at the same time, I like. ASUS allows me to set profiles with boost still active, might play with that later
Agree, R24 is incredibly boring and long.It's also nice because no installation is necessary. R23 is so much more fun to watch than R24 too.
One good about R24 is the included redshift GPU test. It’s a quick way of measuring RT performanceAgree, R24 is incredibly boring and long.
What if the really did increase the width of the link to the IOD though? What if they did?Nope, nothing is know, but right now every CCD has 2 GMI links, of which only one per die is connected to IOD on desktop. I doubt they will widen that for HALO specific CCDs, so frequency is the more unknown unknown so to speak, at least in my opinion. The frequency they will want to rather keep lower than higher for the power saving on mobile platform. This would take the least effort from them and would provide adequate BW, seeing that the primary consumer of the SoC is supposed to be the GPU, not the CPU. Of course, I am guessing, might be wrong
It would probably increase idle power consumption, as well as tilt the maximum power budget more towards I/O and less towards other functions on the package.What if the really did increase the width of the link to the IOD though? What if they did?
What if the width that was actually used was dynamic? For example, low power idle, where performance was the lowest priority, could only light up 32 bits of the channel at a low mhz frequency. Maximum usage could light up all 128 bits of it at the full data rate. Like a PCIe connection with 4 roots at 4 bits each. One root for low power, 2 roots for light but active usage, and all 4 roots for maximum effort.
So maybe this increases idle power usage, but maybe also it doesn't because it's gated off and the low power cores are the only thing active during idle...?It would probably increase idle power consumption, as well as tilt the maximum power budget more towards I/O and less towards other functions on the package.
You are confusing bits with Bytes. They already have 32B/16B R/W per link and 2 links on CCD. Making it 4 links I guess would require more than trivial changes in the CCD itself, what would require 5th Zen5 CCD desgin as you also need to push that BW further into the chip. And the width should already be somewhat dynamic as on Turin slides they advertise GMI folding.What if the width that was actually used was dynamic? For example, low power idle, where performance was the lowest priority, could only light up 32 bits of the channel at a low mhz frequency. Maximum usage could light up all 128 bits of it at the full data rate. Like a PCIe connection with 4 roots at 4 bits each. One root for low power, 2 roots for light but active usage, and all 4 roots for maximum effort.
It was more a hypothetical example of what they could do. Break the link into sub links that could be dynamically and individually turned off and on as well as throttled as needed.You are confusing bits with Bytes. They already have 32B/16B R/W per link and 2 links on CCD. Making it 4 links I guess would require more than trivial changes in the CCD itself, what would require 5th Zen5 CCD desgin as you also need to push that BW further into the chip. And the width should already be somewhat dynamic as on Turin slides they advertise GMI folding.
Just as with MI300A, I am expecting AMD to use fully separate function blocks on the CCD for the interconnect on Halo.Nope, nothing is know, but right now every CCD has 2 GMI links, of which only one per die is connected to IOD on desktop. I doubt they will widen that for HALO specific CCDs, so frequency is the more unknown unknown so to speak, at least in my opinion. The frequency they will want to rather keep lower than higher for the power saving on mobile platform. This would take the least effort from them and would provide adequate BW, seeing that the primary consumer of the SoC is supposed to be the GPU, not the CPU. Of course, I am guessing, might be wrong