- Mar 3, 2017
- 1,687
- 6,235
- 136
Neither latency, nor driver overhead explains the small drop in performance for 760M relative to the drop in CU count.Why couldn't it be latency?
Could also be driver overhead. Remember, the driver has to share system RAM with the CPU's other running processes. By the time it receives its quantum slice, the data it has may already be stale, needing to pull more which incurs another latency penalty waiting on the higher latency of LPDDR5 chips. The iGPU really needs its own slice of dedicated cache (256MB if possible).
Confirming what I wrote.
This guy writes "Well it’s not really a straight forward answer because there’s an abnormally large L2 adding a ton of effective bandwidth and the “standard” bandwidth number of just giving max bandwidth of the memory chips over a 128 bit bus doesn’t really give much info."
AMD doubled their FLOPS numbers for RDNA3 to denote the 'dual issue' CU change, but the reality of the µArch's throughput for most gaming workloads isn't anywhere near that.GTX 1050 --> 1.9 TFLOPS peak
Radeon 780M --> 8.3 TFLOPS peak
Is it? Originally I had no horse in this race because I only have one use case for an APU. However, are there tools out there to measure GPU memory utilization? You actually have me tempted to run some tests.Because AMDs APUs are Bandwith starved for 2 Years now?! It started with Rembrandt and now Phoenix is worse. Pheonix iGPU is clocked over 15% higher than Rembrandts and uses RDNA3, meanwhile it's not even 10% faster with same RAM and barely 15% with faster RAM. Also OEMs will always cheap out on faster RAM, so don't expect that every device comes with LPDDR5x-8533. New Tests of Desktop Versions also show that 7200 brings nearly no difference compared to 5200 (only about 7%). All this leads to the conclusion that Strix will be Bandwith starved to the moon.
That could be any number of factors. A big one is thermals and power consumption/limits, though a look at clocks should make that pretty obvious. There are other items as well. I don’t think we have enough data to reach the conclusion that these chips are bandwidth starved, especially since APUs today have far more bandwidth available vs. the past.Also, 760M is just 15% slower than 780M while having 33% less CUs, than 780M.
If this is not memory starvation - I do not know what is.
Possibly, but I disagree with the 95-100GB/s number. DDR5 has been tested on an AMD APU by an overclocker at a speed of DDR5-10600, and the number will likely rise up from there. The only limit that one has to worry about is the limits of AMD silicon. Memory chip makers are actually working on much faster DDR5, and LPDDR5. LPDDR5 12667 and beyond should be out within the next 1-2 years or so. DDR6 will accelerate this.This guy writes "Well it’s not really a straight forward answer because there’s an abnormally large L2 adding a ton of effective bandwidth and the “standard” bandwidth number of just giving max bandwidth of the memory chips over a 128 bit bus doesn’t really give much info."
And where is the analysis of workload footprints to back this claim?
"Still just the memory bandwidth is roughly 95-100GB/S which is similar to a 1050"
Except that the GTX 1050 doesn't share this with a CPU which itself needs to access the memory all over the place in games and various other (e.g. interactive) workloads.
Edit, PS:
GTX 1050 --> 1.9 TFLOPS peak
Radeon 780M --> 8.3 TFLOPS peak
The biggest factor is sharing memory bandwidth with the CPU.That could be any number of factors. A big one is thermals and power consumption/limits, though a look at clocks should make that pretty obvious. There are other items as well. I don’t think we have enough data to reach the conclusion that these chips are bandwidth starved, especially since APUs today have far more bandwidth available vs. the past.
Even if AMD is hitting bandwidth limits, there are a few ways to get around that limit.
I feel you.That's actually how much my boss makes and I starve in comparison, with crappy health insurance and not enough to get my own apartment with a proper lease agreement.
And oh, you should watch him type a formula in Excel. You will want to kill yourself.
GPUs are designed to be very latency tolerant. They have large thread counts they swap out to avoid stalling. They intentionally focus on throughout over latency.Why couldn't it be latency?
Could also be driver overhead. Remember, the driver has to share system RAM with the CPU's other running processes. By the time it receives its quantum slice, the data it has may already be stale, needing to pull more which incurs another latency penalty waiting on the higher latency of LPDDR5 chips. The iGPU really needs its own slice of dedicated cache (256MB if possible).
Yeah but not anything like Macbook Air. Strix Halo is not going to be a 15-25W chip.So I guess Strix Halo could fit into the chassis of a Macbook Pro equivalent Windows Laptop.
Way due past that we get 256 bit on mainstream platforms.I really hope we see 256-bit APUs in the next generation. With LPCAMM it's not a ridiculous idea, and we'd finally be able to see proper replacement of most laptop GPUs. The 128-bit bus has held back AMD ever since the original Llano APU, and it's basically dictated by how many SODIMMs you can fit on a laptop board. Time to blow the doors off already!
yuck , if it was so attractive consoles would do itOn-package memory is the way to go.
It will enable very wide RAM buses (above 256 bit), without blowing up the motherboard cost.
I mean Apple and soon Lunar Lake are doing so...yuck , if it was so attractive consoles would do it
Yes let make our hot thing hotter by putting hot things right next to it/ on top of it.I mean Apple and soon Lunar Lake are doing so...
no.I wonder if there is any possibility that Strix Halo will have a GDDR6 eDRAM stack on the package.
If they already have a die with some infinity cache on it then it doesn't seem a huge stretch that they might add DRAM into that stack too.
That's a total mobo area play.I mean Apple and soon Lunar Lake are doing so...
I didn't say anything about using GDDR as a cache.DRAM doesn't work as a cache anyway.
Yes let make our hot thing hotter by putting hot things right next to it/ on top of it.
Can 256 bit LPDDR5X feed the potential 40 CU RDNA 3.5 iGPU though? The 6700XT itself has 384 GB/s bandwidth..On package RAM isn't a totally new idea. Intel already shipped it in the Kabylake+Vega wacky product they made back in 2018:
So yes, putting dedicated VRAM on package is certainly doable. But I still think that configurable, user replaceable LPCAMM is the way forward- 256 bits of LPDDR5X should be plenty of bandwidth for any APU that fits in a laptop thermal envelope.