Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

adamge · Nov 17, 2022

moinmoin said:
The big difference is that GDDR memory is being added by the card manufactures, so it's a cost manufacturing the card. HBM on the other hand adds to the cost of the GPU package AMD creates and has to sell to the manufacturers. As such the latter needs more investment and risk taking by AMD and as a result looks significantly more expensive than comparable GPU packages using standard GDDR memory.

(It's also the reason why only Apple adds all the memory on the same package as CPU/GPU on its A and M series chips: They don't need to sell the resulting package to anybody but themselves.)

I recall in MLID videos, he said that Nvidia bought all the RAM for every GPU chip they make, and they sell the GPU+RAM to the downstream partner (AIB, board maker, or whatever the term).

TESKATLIPOKA · Nov 17, 2022

moinmoin said:
The big difference is that GDDR memory is being added by the card manufactures, so it's a cost manufacturing the card. HBM on the other hand adds to the cost of the GPU package AMD creates and has to sell to the manufacturers. As such the latter needs more investment and risk taking by AMD and as a result looks significantly more expensive than comparable GPU packages using standard GDDR memory.

(It's also the reason why only Apple adds all the memory on the same package as CPU/GPU on its A and M series chips: They don't need to sell the resulting package to anybody but themselves.)

I don't see how is this bad for manufacturers.
I doubt they are not capable of understanding why the GPU paired with HBM would cost more than without It.
Let's say a GPU with HBM would cost $150 more for manufacturers, but they won't need to buy Vram costing $180, and also they will have easier logistics, because instead of GPU+Memory they will need to get only the GPU and also BOM would be lower.
Of course this is only in the case HBM is cheaper than GDDR6 +MCD, but It is possible.

maddie · Nov 17, 2022

TESKATLIPOKA said:
I don't see how is this bad for manufacturers.
I doubt they are not capable of understanding why the GPU paired with HBM would cost more than without It.
Let's say a GPU with HBM would cost $150 more for manufacturers, but they won't need to buy Vram costing $180, and also they will have easier logistics, because instead of GPU+Memory they will need to get only the GPU and also BOM would be lower.
Of course this is only in the case HBM is cheaper than GDDR6 +MCD, but It is possible.

I agree. If the BOM can be viewed as a closed box, why does it matter where the internals are sourced or assembled? Isn't the total cost the important factor?

scineram · Nov 17, 2022

If either vendor could gain an advantage with HBM why don't they?

TESKATLIPOKA · Nov 17, 2022

maddie said:
I agree. If the BOM can be viewed as a closed box, why does it matter where the internals are sourced or assembled? Isn't the total cost the important factor?

One thing I forgot about is the cutdown version. 7900xt misses 1 MCDs and 2 2gbit modules, which can be directly subtracted from BOM.
RX 7900XT has 20GB Vram and that would mean $26.22*10 for Vram + 5 * $6.2 for MCD we are already at $293 just for the chips. If the discount is 40-50% then $131-157 Vram + $31 MCD for a total of $162-188.
We can't do the same thing for an HBM3 version, because you have only 2 stacks and by removing one you would have low BW, but It looks like It would be cheaper even with 2 stacks.
N32 with HBM would also be tricky. One stack of HBM3 wouldn't be enough, you would need ~ 2 stacks of HBM2E instead, and we don't know what's the price difference between HBM2E and HBM3.

scineram said:
If either vendor could gain an advantage with HBM why don't they?

A very good question. I also wonder about that. Either HBM costs more than what I linked, there is not enough of It, which is unlikely or GDDR6 is not as expensive as I linked and calculated.

edit: Thanks Saylick for pointing out my mistake. Fixed some other things too.

Kaluan · Nov 17, 2022

Not sure I understand, are people talking about HBM as a replacement for GDDR6+MALL cache/IC? Or just GDDR6?

Anyway, while on the software/driver side, I don't think nVidia had to diverge too much from Ampere, I think AMD probably needs all hands on deck to get as much as possible from RDNA3 uArch. Weather this will result in great drivers on launch or a steady drip of finewine(tm) IDK, both outcomes have pros and cons.

But the fact that they're also launching/announcing a good amount of new features on the software side makes me hopeful they're on top of things.

TESKATLIPOKA · Nov 17, 2022

Kaluan said:
Not sure I understand, are people talking about HBM as a replacement for GDDR6+MALL cache/IC? Or just GDDR6?

I was talking about replacing not just GDDR6+MALL cache but GDDR6+ MCDs, because having separate MCD would be pointless with HBM.
Memory controller and PHY would be once more part of GCD. I don't think GCD would increase in size, considering how much space the interconnects used up in N31.

Saylick · Nov 17, 2022

TESKATLIPOKA said:
One thing I forgot about is the cutdown version. 7900xt misses 2 MCDs and 2 2gbit modules, which can be directly subtracted from BOM.
RX 7900XT has 20GB Vram and that would mean $26.22*10 for Vram + 6 * $6.2 for MCD we are already at $299 just for the chips. If the discount is 40-50% then $131-157 Vram + $37.2 MCD for a total of $168-194.
We can't do the same thing for an HBM3 version, because you have only 2 stacks and by removing one you would have low BW, but It looks like It would be cheaper even with 2 stacks.
N32 with HBM would also be tricky. One stack of HBM3 wouldn't be enough, you would need ~ 2 stacks of HBM2E instead, and we don't know what's the price difference between HBM2E and HBM3.

A very good question. I also wonder about that. Either HBM costs more than what I linked, there is not enough of It, which is unlikely or GDDR6 is not as expensive as I linked and calculated.

I believe the 7900XT misses just 1 MCD, not 2.

Secondly, while there might be a cost advantage to using HBM, it doesn't mean there's a performance advantage. If I'm not mistaken, the latency of accessing HBM is similar to that of traditional VRAM, correct? If so, while the total effective bandwidth out to VRAM might be comparable between HBM and the current GDDR6 + MCM approach, the lack of the huge on-die cache likely means missing that rung in the latency ladder, so the effective latency goes up and performance goes down. If one were to mitigate it by adding some cache back in, you're now adding in more cost. Idk if this thinking is correct, so someone can hopefully correct me if I'm wrong, but it's my 2c.

TESKATLIPOKA · Nov 17, 2022

Saylick said:
I believe the 7900XT misses just 1 MCD, not 2.

Secondly, while there might be a cost advantage to using HBM, it doesn't mean there's a performance advantage. If I'm not mistaken, the latency of accessing HBM is similar to that of traditional VRAM, correct? If so, while the total effective bandwidth out to VRAM might be comparable between HBM and the current GDDR6 + MCM approach, the lack of the huge on-die cache likely means missing that rung in the latency ladder, so the effective latency goes up and performance goes down. If one were to mitigate it by adding some cache back in, you're now adding in more cost. Idk if this thinking is correct, so someone can hopefully correct me if I'm wrong, but it's my 2c.

Thanks, fixed.

You make a very good point about HBM having worse latency than IC+GDDR6, but isn't the GPU very good at hiding latency? Ampere also didn't have any extra cache, and It didn't look like a problem for It winning against AMD.
BTW I didn't want HBM because It has performance advantage compared to the other approach, which It probably doesn't have. I wanted It because of the saved power.
Just think about It, memory subsystem could use ~80-90W in RX 7900XTX, which is 23-25% of the total TBP of the card.
Now imagine AMD released N31 for mobile with 165W TDP(GPU+memory), how much power draw can you realistically shave off the memory subsystem?

BTW TPU also measures power draw in cyberpunk 2077 with V-sync enabled and in that case every AMD card flops compared to Nvidia, because they don't even underclock their Vram, but keep It at full throttle, which of course increases the power consumption and this is an old problem of AMD cards If I remember correctly. Power consumption during video playback and multi monitor is also negatively affected because of this.

ASRock Radeon RX 6950 XT OC Formula

Saylick · Nov 17, 2022

TESKATLIPOKA said:
BTW TPU also measures power draw in cyberpunk 2077 with V-sync enabled and in that case every AMD card flops compared to Nvidia, because they don't even underclock their Vram, but keep It at full throttle, which of course increases the power consumption and this is an old problem of AMD cards If I remember correctly. Power consumption during video playback and multi monitor is also negatively affected because of this.

ASRock Radeon RX 6950 XT OC Formula
View attachment 71406

That's a shame that the memory doesn't downclock for less demanding workloads... you'd think AMD has that resolved by now. AMD mentions that the fan out links are aggressively clock gated to reduce power usage when they are not needed, so hopefully N31 does not have this problem.

darkswordsman17 · Nov 17, 2022

scineram said:
If either vendor could gain an advantage with HBM why don't they?

Don't both of them use HBM on their pro/enterprise/HPC stuff? Its clear that HBM does have real advantages. My guess for why its not on consumer stuff is that there's a production capacity bottleneck that can't be resolved for some reason, so they simply could not produce enough for how much they'd need for consumer markets.

moinmoin · Nov 17, 2022

TESKATLIPOKA said:
I don't see how is this bad for manufacturers.
I doubt they are not capable of understanding why the GPU paired with HBM would cost more than without It.

I don't think it's a coincidence that in the workstation and server market where 3rd party manufacturers are less common the use of HBM is much more widespread.

MrTeal · Nov 18, 2022

TESKATLIPOKA said:
Thanks, fixed.

You make a very good point about HBM having worse latency than IC+GDDR6, but isn't the GPU very good at hiding latency? Ampere also didn't have any extra cache, and It didn't look like a problem for It winning against AMD.
BTW I didn't want HBM because It has performance advantage compared to the other approach, which It probably doesn't have. I wanted It because of the saved power.
Just think about It, memory subsystem could use ~80-90W in RX 7900XTX, which is 23-25% of the total TBP of the card.
Now imagine AMD released N31 for mobile with 165W TDP(GPU+memory), how much power draw can you realistically shave off the memory subsystem?

BTW TPU also measures power draw in cyberpunk 2077 with V-sync enabled and in that case every AMD card flops compared to Nvidia, because they don't even underclock their Vram, but keep It at full throttle, which of course increases the power consumption and this is an old problem of AMD cards If I remember correctly. Power consumption during video playback and multi monitor is also negatively affected because of this.

I proposed something similar earlier in the thread, though it was closer to AMD's current solution. Currently M31 has 6 MCD with infinity cache and the GDDR6 memory controller, and it fans out to the GDDR6 memory. Rather than doing that, they could run a hybrid design with cache and HBM MC on the bottom layer, and a couple layers of HBM memory for 4GB memory per MCD/stack. You'd get the latency hiding benefit of the cache still, without needing the fanout for the memory on the card.

biostud · Nov 19, 2022

Where do you think the TDP of the N32 will end up?

Timorous · Nov 19, 2022

biostud said:
Where do you think the TDP of the N32 will end up?

250W give or take for top N32 and maybe 200W for the 3MCD version. Depends how much they push it though because if they do what they did with the 6700XT it could be more like 275W and 225W.

biostud · Nov 19, 2022

Timorous said:
250W give or take for top N32 and maybe 200W for the 3MCD version. Depends how much they push it though because if they do what they did with the 6700XT it could be more like 275W and 225W.

So a 7800X3D + 7800XT system should end up drawing around 400w? @ full load

Kaluan · Nov 19, 2022

I think they can, should and will push top N32 SKU (7800XT?) upwards of 300W TBP for roughly 44TFLOPs.

Edit: The cut down version(s) will probably stay way bellow 250W.

biostud said:
So a 7800X3D + 7800XT system should end up drawing around 400w? @ full load

You mean gaming or stress test? Gaming (especially with frame caps/freesync) should easily stay below that.

biostud · Nov 19, 2022

Kaluan said:
I think they can, should and will push top N32 SKU (7800XT?) upwards of 300W TBP for roughly 44TFLOPs.

You mean gaming or stress test? Gaming (especially with frame caps/freesync) should easily stay below that.

In regards to choosing a PSU, that still have some headroom. Sounds like 650w should suffice? (good quality PSU obviously)

TESKATLIPOKA · Nov 19, 2022

Kaluan said:
I think they can, should and will push top N32 SKU (7800XT?) upwards of 300W TBP for roughly 44TFLOPs.

Edit: The cut down version(s) will probably stay way bellow 250W.

I am a bit skeptical that TOP N32 SKU would be pushed to 300W when 7900XT also has 300W TBP, but 6750XT also had the same TBP as RX 6800 of 250W, so It's not impossible.
In the case of 300W TBP, the efficiency would be pretty bad and even then 7900xt should be faster.
I am more interested If N32 really has only 3 shader engines. 4 shader engines with 64CU or 8 per shader engine is what I would like to see. If they really end up with only 3 shader engines with 60CU I would like to see the reasoning for this move.

jpiniero · Nov 19, 2022

Kaluan said:
I think they can, should and will push top N32 SKU (7800XT?) upwards of 300W TBP for roughly 44TFLOPs.

Edit: The cut down version(s) will probably stay way bellow 250W.

I don't think there is going to be a cut down desktop N32. They didn't do it for N22 until way way later. There's even less reason to do it because the N32 GCD is much smaller.

Only maybe sooner rather than later if N32 mobile doesn't sell great.

Timorous · Nov 19, 2022

TESKATLIPOKA said:
I am a bit skeptical that TOP N32 SKU would be pushed to 300W when 7900XT also has 300W TBP, but 6750XT also had the same TBP as RX 6800 of 250W, so It's not impossible.
In the case of 300W TBP, the efficiency would be pretty bad and even then 7900xt should be faster.
I am more interested If N32 really has only 3 shader engines. 4 shader engines with 64CU or 8 per shader engine is what I would like to see. If they really end up with only 3 shader engines with 60CU I would like to see the reasoning for this move.

I know the Angstronomics leak said 3 and they were spot on with N31 but 3 SEs on N32 just does not make sense to me. It would mean just 96 rops in the 7800XT and 64 for the 7700XT which without a large clockspeed boost is a regression from 6800XT and 6700XT. Atleast with 4SEs the ROP count would be 128 and 96 respectively and then the a clockspeed boost is providing an improvement over prior gen.

Timorous · Nov 19, 2022

jpiniero said:
I don't think there is going to be a cut down desktop N32. They didn't do it for N22 until way way later. There's even less reason to do it because the N32 GCD is much smaller.

Only maybe sooner rather than later if N32 mobile doesn't sell great.

I expect cut N32 to come later so that people who can be convinced to spend the extra on the full N32 sku do so but. It is inevitable though because from AMDs perspective they can make absolutely tons of N32 dies and just palm them out based on demand since it will cover the $700 ish segment, the $500ish segment and the high end laptop segment it is likely pretty cost effective and allows AMD to be insanely flexible with supply.

biostud · Nov 19, 2022

jpiniero said:
I don't think there is going to be a cut down desktop N32. They didn't do it for N22 until way way later. There's even less reason to do it because the N32 GCD is much smaller.

Only maybe sooner rather than later if N32 mobile doesn't sell great.

Using the N32 with a cut down GCD and only 3 MCDs for RX7700 would save silicon, and seem like a obvious solution.

jpiniero · Nov 19, 2022

biostud said:
Using the N32 with a cut down GCD and only 3 MCDs for RX7700 would save silicon, and seem like a obvious solution.

The busted % is probally not that high. Mobile can take that volume.

biostud · Nov 19, 2022

jpiniero said:
The busted % is probally not that high. Mobile can take that volume.

It also depends if there is a price segment between N33 and N32, that needs to be covered.

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Member

Platinum Member

Diamond Member

Senior member

Platinum Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Golden Member

Lifer

Senior member

Lifer

Platinum Member

Lifer

Golden Member

Golden Member

Lifer

Lifer

Lifer