Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 271 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Why couldn't it be latency?

Could also be driver overhead. Remember, the driver has to share system RAM with the CPU's other running processes. By the time it receives its quantum slice, the data it has may already be stale, needing to pull more which incurs another latency penalty waiting on the higher latency of LPDDR5 chips. The iGPU really needs its own slice of dedicated cache (256MB if possible).
Neither latency, nor driver overhead explains the small drop in performance for 760M relative to the drop in CU count.
 

StefanR5R

Elite Member
Dec 10, 2016
5,687
8,258
136
This guy writes "Well it’s not really a straight forward answer because there’s an abnormally large L2 adding a ton of effective bandwidth and the “standard” bandwidth number of just giving max bandwidth of the memory chips over a 128 bit bus doesn’t really give much info."
And where is the analysis of workload footprints to back this claim?
"Still just the memory bandwidth is roughly 95-100GB/S which is similar to a 1050"
Except that the GTX 1050 doesn't share this with a CPU which itself needs to access the memory all over the place in games and various other (e.g. interactive) workloads.
Edit, PS:
GTX 1050 --> 1.9 TFLOPS peak*
Radeon 780M --> 8.3 TFLOPS peak*

EDIT half a day later, *) that's of course theoretical peak FP32 throughput [ = if the units could be fed continuously, and maybe the operations mix plays a role too]
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,959
2,181
136
GTX 1050 --> 1.9 TFLOPS peak
Radeon 780M --> 8.3 TFLOPS peak
AMD doubled their FLOPS numbers for RDNA3 to denote the 'dual issue' CU change, but the reality of the µArch's throughput for most gaming workloads isn't anywhere near that.

I guess we'll see soon enough if RDNA3.5 or RDNA4 make any significant changes in that area.
 
Reactions: TESKATLIPOKA

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
Because AMDs APUs are Bandwith starved for 2 Years now?! It started with Rembrandt and now Phoenix is worse. Pheonix iGPU is clocked over 15% higher than Rembrandts and uses RDNA3, meanwhile it's not even 10% faster with same RAM and barely 15% with faster RAM. Also OEMs will always cheap out on faster RAM, so don't expect that every device comes with LPDDR5x-8533. New Tests of Desktop Versions also show that 7200 brings nearly no difference compared to 5200 (only about 7%). All this leads to the conclusion that Strix will be Bandwith starved to the moon.
Is it? Originally I had no horse in this race because I only have one use case for an APU. However, are there tools out there to measure GPU memory utilization? You actually have me tempted to run some tests.
Also, 760M is just 15% slower than 780M while having 33% less CUs, than 780M.

If this is not memory starvation - I do not know what is.
That could be any number of factors. A big one is thermals and power consumption/limits, though a look at clocks should make that pretty obvious. There are other items as well. I don’t think we have enough data to reach the conclusion that these chips are bandwidth starved, especially since APUs today have far more bandwidth available vs. the past.

Even if AMD is hitting bandwidth limits, there are a few ways to get around that limit.
 
Reactions: Tlh97

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
This guy writes "Well it’s not really a straight forward answer because there’s an abnormally large L2 adding a ton of effective bandwidth and the “standard” bandwidth number of just giving max bandwidth of the memory chips over a 128 bit bus doesn’t really give much info."
And where is the analysis of workload footprints to back this claim?
"Still just the memory bandwidth is roughly 95-100GB/S which is similar to a 1050"
Except that the GTX 1050 doesn't share this with a CPU which itself needs to access the memory all over the place in games and various other (e.g. interactive) workloads.
Edit, PS:
GTX 1050 --> 1.9 TFLOPS peak
Radeon 780M --> 8.3 TFLOPS peak
Possibly, but I disagree with the 95-100GB/s number. DDR5 has been tested on an AMD APU by an overclocker at a speed of DDR5-10600, and the number will likely rise up from there. The only limit that one has to worry about is the limits of AMD silicon. Memory chip makers are actually working on much faster DDR5, and LPDDR5. LPDDR5 12667 and beyond should be out within the next 1-2 years or so. DDR6 will accelerate this.

It is also possibly to use GDDR6 for system memory, though neither Intel or AMD have shown any desire to do this outside of consoles. GDDR6 has higher latency, but consoles use it and games run fine.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
That could be any number of factors. A big one is thermals and power consumption/limits, though a look at clocks should make that pretty obvious. There are other items as well. I don’t think we have enough data to reach the conclusion that these chips are bandwidth starved, especially since APUs today have far more bandwidth available vs. the past.

Even if AMD is hitting bandwidth limits, there are a few ways to get around that limit.
The biggest factor is sharing memory bandwidth with the CPU.

Thats the biggest one.

Currently - it will be impossible to feed fully 8 CPU cores, and 12 CUs with 128 bit bus.
760M only appears in 8600G, a CPU with 6 CPU cores. The bandwidth requirement are cut in 25% for the CPU cores, and in 33% for the CUs.

Hence why 760M is only 15% slower than 780M, despite having GPU core count cut in 33%.

Now, imagine Strix Point Zen 5 cores. There is 12 of them. What will be the bandwdith requirements for them, and for the GPU, which will have 33% more CUs than 780M?

It will be even bigger challenge.
 

soresu

Platinum Member
Dec 19, 2014
2,959
2,181
136
That's actually how much my boss makes and I starve in comparison, with crappy health insurance and not enough to get my own apartment with a proper lease agreement.

And oh, you should watch him type a formula in Excel. You will want to kill yourself.
I feel you.

That was me for 6 years watching the deputy manager swan about in my last long term job.

Utterly useless waste of space that made about 50% more than my direct boss who worked 60+ hours.

The guy took over 2 months off for a golf injury...... it just makes my blood boil.

 

soresu

Platinum Member
Dec 19, 2014
2,959
2,181
136
I wonder if there is any possibility that Strix Halo will have a GDDR6 eDRAM stack on the package.

If they already have a die with some infinity cache on it then it doesn't seem a huge stretch that they might add DRAM into that stack too.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
Why couldn't it be latency?

Could also be driver overhead. Remember, the driver has to share system RAM with the CPU's other running processes. By the time it receives its quantum slice, the data it has may already be stale, needing to pull more which incurs another latency penalty waiting on the higher latency of LPDDR5 chips. The iGPU really needs its own slice of dedicated cache (256MB if possible).
GPUs are designed to be very latency tolerant. They have large thread counts they swap out to avoid stalling. They intentionally focus on throughout over latency.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
I really hope we see 256-bit APUs in the next generation. With LPCAMM it's not a ridiculous idea, and we'd finally be able to see proper replacement of most laptop GPUs. The 128-bit bus has held back AMD ever since the original Llano APU, and it's basically dictated by how many SODIMMs you can fit on a laptop board. Time to blow the doors off already!
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
I really hope we see 256-bit APUs in the next generation. With LPCAMM it's not a ridiculous idea, and we'd finally be able to see proper replacement of most laptop GPUs. The 128-bit bus has held back AMD ever since the original Llano APU, and it's basically dictated by how many SODIMMs you can fit on a laptop board. Time to blow the doors off already!
Way due past that we get 256 bit on mainstream platforms.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,322
4,790
96
I wonder if there is any possibility that Strix Halo will have a GDDR6 eDRAM stack on the package.

If they already have a die with some infinity cache on it then it doesn't seem a huge stretch that they might add DRAM into that stack too.
no.
more tiers == more tears.
DRAM doesn't work as a cache anyway.
I mean Apple and soon Lunar Lake are doing so...
That's a total mobo area play.
tablet chips duh.
 

soresu

Platinum Member
Dec 19, 2014
2,959
2,181
136
DRAM doesn't work as a cache anyway.
I didn't say anything about using GDDR as a cache.

I meant solely as VRAM that is just on package, and separate from the main system LPDDR/DDR memory for the CPU cores.

Or I guess they could just build the IO pins into it instead, and leave the GDDR for OEMs to put on the mobo.
 

DrMrLordX

Lifer
Apr 27, 2000
21,803
11,157
136
Yes let make our hot thing hotter by putting hot things right next to it/ on top of it.

Depends, it can make things simpler depending on form factor. As it currently stands, cooling DRAM on a desktop motherboard is simply not easy to do. Yes most DRAM doesn't get that hot, but if it does, you have big goofy heatspreaders and then you're relying on internal case airflow to do the rest, which does not yield great results. The only reason why it's not a huge problem for desktop is that most pedestrian setups simply do not push much power or produce much heat with their system RAM. For OEMs it's a bit of a headache since most OEMs are not big on internal airflow (or fan noise), but having DIMMs or even SIMMs in your setup makes it difficult to integrate RAM cooling with the rest of your cooling setup.

Moving a few extra watts on-package is not going to be a big problem if you know the heat is going to be there and have the ability to cool it. On-package RAM would have relatively large area/power ratio compared to compute chiplets running AVX512 or SVE2 code (for example). If SoCs using on-package RAM aren't forced to deal with klunky/bad integrated heat spreaders, the larger package can get direct contact with a cooling plate, making it just as easy to cool on-package RAM as it is to cool RAM on dGPUs.
 
Reactions: moinmoin

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
On package RAM isn't a totally new idea. Intel already shipped it in the Kabylake+Vega wacky product they made back in 2018:


So yes, putting dedicated VRAM on package is certainly doable. But I still think that configurable, user replaceable LPCAMM is the way forward- 256 bits of LPDDR5X should be plenty of bandwidth for any APU that fits in a laptop thermal envelope.
 

S'renne

Member
Oct 30, 2022
136
99
61
On package RAM isn't a totally new idea. Intel already shipped it in the Kabylake+Vega wacky product they made back in 2018:


So yes, putting dedicated VRAM on package is certainly doable. But I still think that configurable, user replaceable LPCAMM is the way forward- 256 bits of LPDDR5X should be plenty of bandwidth for any APU that fits in a laptop thermal envelope.
Can 256 bit LPDDR5X feed the potential 40 CU RDNA 3.5 iGPU though? The 6700XT itself has 384 GB/s bandwidth..
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |