Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 131 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146

adamge

Member
Aug 15, 2022
56
135
66
The big difference is that GDDR memory is being added by the card manufactures, so it's a cost manufacturing the card. HBM on the other hand adds to the cost of the GPU package AMD creates and has to sell to the manufacturers. As such the latter needs more investment and risk taking by AMD and as a result looks significantly more expensive than comparable GPU packages using standard GDDR memory.

(It's also the reason why only Apple adds all the memory on the same package as CPU/GPU on its A and M series chips: They don't need to sell the resulting package to anybody but themselves.)

I recall in MLID videos, he said that Nvidia bought all the RAM for every GPU chip they make, and they sell the GPU+RAM to the downstream partner (AIB, board maker, or whatever the term).
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,430
2,915
136
The big difference is that GDDR memory is being added by the card manufactures, so it's a cost manufacturing the card. HBM on the other hand adds to the cost of the GPU package AMD creates and has to sell to the manufacturers. As such the latter needs more investment and risk taking by AMD and as a result looks significantly more expensive than comparable GPU packages using standard GDDR memory.

(It's also the reason why only Apple adds all the memory on the same package as CPU/GPU on its A and M series chips: They don't need to sell the resulting package to anybody but themselves.)
I don't see how is this bad for manufacturers.
I doubt they are not capable of understanding why the GPU paired with HBM would cost more than without It.
Let's say a GPU with HBM would cost $150 more for manufacturers, but they won't need to buy Vram costing $180, and also they will have easier logistics, because instead of GPU+Memory they will need to get only the GPU and also BOM would be lower.
Of course this is only in the case HBM is cheaper than GDDR6 +MCD, but It is possible.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
I don't see how is this bad for manufacturers.
I doubt they are not capable of understanding why the GPU paired with HBM would cost more than without It.
Let's say a GPU with HBM would cost $150 more for manufacturers, but they won't need to buy Vram costing $180, and also they will have easier logistics, because instead of GPU+Memory they will need to get only the GPU and also BOM would be lower.
Of course this is only in the case HBM is cheaper than GDDR6 +MCD, but It is possible.
I agree. If the BOM can be viewed as a closed box, why does it matter where the internals are sourced or assembled? Isn't the total cost the important factor?
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,430
2,915
136
I agree. If the BOM can be viewed as a closed box, why does it matter where the internals are sourced or assembled? Isn't the total cost the important factor?
One thing I forgot about is the cutdown version. 7900xt misses 1 MCDs and 2 2gbit modules, which can be directly subtracted from BOM.
RX 7900XT has 20GB Vram and that would mean $26.22*10 for Vram + 5 * $6.2 for MCD we are already at $293 just for the chips. If the discount is 40-50% then $131-157 Vram + $31 MCD for a total of $162-188.
We can't do the same thing for an HBM3 version, because you have only 2 stacks and by removing one you would have low BW, but It looks like It would be cheaper even with 2 stacks.
N32 with HBM would also be tricky. One stack of HBM3 wouldn't be enough, you would need ~ 2 stacks of HBM2E instead, and we don't know what's the price difference between HBM2E and HBM3.
If either vendor could gain an advantage with HBM why don't they?
A very good question. I also wonder about that. Either HBM costs more than what I linked, there is not enough of It, which is unlikely or GDDR6 is not as expensive as I linked and calculated.

edit: Thanks Saylick for pointing out my mistake. Fixed some other things too.
 
Last edited:
Reactions: maddie

Kaluan

Senior member
Jan 4, 2022
503
1,074
106
Not sure I understand, are people talking about HBM as a replacement for GDDR6+MALL cache/IC? Or just GDDR6?

Anyway, while on the software/driver side, I don't think nVidia had to diverge too much from Ampere, I think AMD probably needs all hands on deck to get as much as possible from RDNA3 uArch. Weather this will result in great drivers on launch or a steady drip of finewine(tm) IDK, both outcomes have pros and cons.

But the fact that they're also launching/announcing a good amount of new features on the software side makes me hopeful they're on top of things.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,430
2,915
136
Not sure I understand, are people talking about HBM as a replacement for GDDR6+MALL cache/IC? Or just GDDR6?
I was talking about replacing not just GDDR6+MALL cache but GDDR6+ MCDs, because having separate MCD would be pointless with HBM.
Memory controller and PHY would be once more part of GCD. I don't think GCD would increase in size, considering how much space the interconnects used up in N31.
 

Saylick

Diamond Member
Sep 10, 2012
3,389
7,154
136
One thing I forgot about is the cutdown version. 7900xt misses 2 MCDs and 2 2gbit modules, which can be directly subtracted from BOM.
RX 7900XT has 20GB Vram and that would mean $26.22*10 for Vram + 6 * $6.2 for MCD we are already at $299 just for the chips. If the discount is 40-50% then $131-157 Vram + $37.2 MCD for a total of $168-194.
We can't do the same thing for an HBM3 version, because you have only 2 stacks and by removing one you would have low BW, but It looks like It would be cheaper even with 2 stacks.
N32 with HBM would also be tricky. One stack of HBM3 wouldn't be enough, you would need ~ 2 stacks of HBM2E instead, and we don't know what's the price difference between HBM2E and HBM3.

A very good question. I also wonder about that. Either HBM costs more than what I linked, there is not enough of It, which is unlikely or GDDR6 is not as expensive as I linked and calculated.
I believe the 7900XT misses just 1 MCD, not 2.

Secondly, while there might be a cost advantage to using HBM, it doesn't mean there's a performance advantage. If I'm not mistaken, the latency of accessing HBM is similar to that of traditional VRAM, correct? If so, while the total effective bandwidth out to VRAM might be comparable between HBM and the current GDDR6 + MCM approach, the lack of the huge on-die cache likely means missing that rung in the latency ladder, so the effective latency goes up and performance goes down. If one were to mitigate it by adding some cache back in, you're now adding in more cost. Idk if this thinking is correct, so someone can hopefully correct me if I'm wrong, but it's my 2c.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,430
2,915
136
I believe the 7900XT misses just 1 MCD, not 2.

Secondly, while there might be a cost advantage to using HBM, it doesn't mean there's a performance advantage. If I'm not mistaken, the latency of accessing HBM is similar to that of traditional VRAM, correct? If so, while the total effective bandwidth out to VRAM might be comparable between HBM and the current GDDR6 + MCM approach, the lack of the huge on-die cache likely means missing that rung in the latency ladder, so the effective latency goes up and performance goes down. If one were to mitigate it by adding some cache back in, you're now adding in more cost. Idk if this thinking is correct, so someone can hopefully correct me if I'm wrong, but it's my 2c.
Thanks, fixed.

You make a very good point about HBM having worse latency than IC+GDDR6, but isn't the GPU very good at hiding latency? Ampere also didn't have any extra cache, and It didn't look like a problem for It winning against AMD.
BTW I didn't want HBM because It has performance advantage compared to the other approach, which It probably doesn't have. I wanted It because of the saved power.
Just think about It, memory subsystem could use ~80-90W in RX 7900XTX, which is 23-25% of the total TBP of the card.
Now imagine AMD released N31 for mobile with 165W TDP(GPU+memory), how much power draw can you realistically shave off the memory subsystem?

BTW TPU also measures power draw in cyberpunk 2077 with V-sync enabled and in that case every AMD card flops compared to Nvidia, because they don't even underclock their Vram, but keep It at full throttle, which of course increases the power consumption and this is an old problem of AMD cards If I remember correctly. Power consumption during video playback and multi monitor is also negatively affected because of this.

ASRock Radeon RX 6950 XT OC Formula
 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,389
7,154
136
BTW TPU also measures power draw in cyberpunk 2077 with V-sync enabled and in that case every AMD card flops compared to Nvidia, because they don't even underclock their Vram, but keep It at full throttle, which of course increases the power consumption and this is an old problem of AMD cards If I remember correctly. Power consumption during video playback and multi monitor is also negatively affected because of this.

ASRock Radeon RX 6950 XT OC Formula
View attachment 71406
That's a shame that the memory doesn't downclock for less demanding workloads... you'd think AMD has that resolved by now. AMD mentions that the fan out links are aggressively clock gated to reduce power usage when they are not needed, so hopefully N31 does not have this problem.

 
Mar 11, 2004
23,181
5,646
146
If either vendor could gain an advantage with HBM why don't they?

Don't both of them use HBM on their pro/enterprise/HPC stuff? Its clear that HBM does have real advantages. My guess for why its not on consumer stuff is that there's a production capacity bottleneck that can't be resolved for some reason, so they simply could not produce enough for how much they'd need for consumer markets.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
I don't see how is this bad for manufacturers.
I doubt they are not capable of understanding why the GPU paired with HBM would cost more than without It.
I don't think it's a coincidence that in the workstation and server market where 3rd party manufacturers are less common the use of HBM is much more widespread.
 
Reactions: Kaluan

MrTeal

Diamond Member
Dec 7, 2003
3,587
1,748
136
Thanks, fixed.

You make a very good point about HBM having worse latency than IC+GDDR6, but isn't the GPU very good at hiding latency? Ampere also didn't have any extra cache, and It didn't look like a problem for It winning against AMD.
BTW I didn't want HBM because It has performance advantage compared to the other approach, which It probably doesn't have. I wanted It because of the saved power.
Just think about It, memory subsystem could use ~80-90W in RX 7900XTX, which is 23-25% of the total TBP of the card.
Now imagine AMD released N31 for mobile with 165W TDP(GPU+memory), how much power draw can you realistically shave off the memory subsystem?

BTW TPU also measures power draw in cyberpunk 2077 with V-sync enabled and in that case every AMD card flops compared to Nvidia, because they don't even underclock their Vram, but keep It at full throttle, which of course increases the power consumption and this is an old problem of AMD cards If I remember correctly. Power consumption during video playback and multi monitor is also negatively affected because of this.
I proposed something similar earlier in the thread, though it was closer to AMD's current solution. Currently M31 has 6 MCD with infinity cache and the GDDR6 memory controller, and it fans out to the GDDR6 memory. Rather than doing that, they could run a hybrid design with cache and HBM MC on the bottom layer, and a couple layers of HBM memory for 4GB memory per MCD/stack. You'd get the latency hiding benefit of the cache still, without needing the fanout for the memory on the card.
 

Kaluan

Senior member
Jan 4, 2022
503
1,074
106
I think they can, should and will push top N32 SKU (7800XT?) upwards of 300W TBP for roughly 44TFLOPs.

Edit: The cut down version(s) will probably stay way bellow 250W.

So a 7800X3D + 7800XT system should end up drawing around 400w? @ full load
You mean gaming or stress test? Gaming (especially with frame caps/freesync) should easily stay below that.
 
Reactions: Tlh97 and Joe NYC

TESKATLIPOKA

Platinum Member
May 1, 2020
2,430
2,915
136
I think they can, should and will push top N32 SKU (7800XT?) upwards of 300W TBP for roughly 44TFLOPs.

Edit: The cut down version(s) will probably stay way bellow 250W.
I am a bit skeptical that TOP N32 SKU would be pushed to 300W when 7900XT also has 300W TBP, but 6750XT also had the same TBP as RX 6800 of 250W, so It's not impossible.
In the case of 300W TBP, the efficiency would be pretty bad and even then 7900xt should be faster.
I am more interested If N32 really has only 3 shader engines. 4 shader engines with 64CU or 8 per shader engine is what I would like to see. If they really end up with only 3 shader engines with 60CU I would like to see the reasoning for this move.
 

jpiniero

Lifer
Oct 1, 2010
14,844
5,457
136
I think they can, should and will push top N32 SKU (7800XT?) upwards of 300W TBP for roughly 44TFLOPs.

Edit: The cut down version(s) will probably stay way bellow 250W.

I don't think there is going to be a cut down desktop N32. They didn't do it for N22 until way way later. There's even less reason to do it because the N32 GCD is much smaller.

Only maybe sooner rather than later if N32 mobile doesn't sell great.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
I am a bit skeptical that TOP N32 SKU would be pushed to 300W when 7900XT also has 300W TBP, but 6750XT also had the same TBP as RX 6800 of 250W, so It's not impossible.
In the case of 300W TBP, the efficiency would be pretty bad and even then 7900xt should be faster.
I am more interested If N32 really has only 3 shader engines. 4 shader engines with 64CU or 8 per shader engine is what I would like to see. If they really end up with only 3 shader engines with 60CU I would like to see the reasoning for this move.

I know the Angstronomics leak said 3 and they were spot on with N31 but 3 SEs on N32 just does not make sense to me. It would mean just 96 rops in the 7800XT and 64 for the 7700XT which without a large clockspeed boost is a regression from 6800XT and 6700XT. Atleast with 4SEs the ROP count would be 128 and 96 respectively and then the a clockspeed boost is providing an improvement over prior gen.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
I don't think there is going to be a cut down desktop N32. They didn't do it for N22 until way way later. There's even less reason to do it because the N32 GCD is much smaller.

Only maybe sooner rather than later if N32 mobile doesn't sell great.

I expect cut N32 to come later so that people who can be convinced to spend the extra on the full N32 sku do so but. It is inevitable though because from AMDs perspective they can make absolutely tons of N32 dies and just palm them out based on demand since it will cover the $700 ish segment, the $500ish segment and the high end laptop segment it is likely pretty cost effective and allows AMD to be insanely flexible with supply.
 

biostud

Lifer
Feb 27, 2003
18,407
4,968
136
I don't think there is going to be a cut down desktop N32. They didn't do it for N22 until way way later. There's even less reason to do it because the N32 GCD is much smaller.

Only maybe sooner rather than later if N32 mobile doesn't sell great.

Using the N32 with a cut down GCD and only 3 MCDs for RX7700 would save silicon, and seem like a obvious solution.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |