Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 55 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146

Aapje

Golden Member
Mar 21, 2022
1,467
2,031
106
Hence why the SKU simply did not exist during the shortage. Even the 6800xt barely did as every chip seems to have been used for a 6900xt.

They didn't need to upsell people anyway with good spacing between products, because the shortage caused upselling automatically.

For upselling you ideally want to have a moderate price increase and a good sales argument for every step up. Intel are masters at it, with a ton of SKUs where you get an iGPU, more cores, more Mhz and/or an unlocked multiplier for a relatively modest price increase, so you can start considering an i3 or i5 and go into an loop of 'only X dollars more for Y', to end up with an i7 for way more money than you intended to spend.
 
Reactions: Kaluan

HurleyBird

Platinum Member
Apr 22, 2003
2,726
1,342
136
Unless AMD can cut in a lop sided way though the cut N31 has to be 84CUs because that cuts 1 WGP per SE and thus that is how the maths works.

It probably needs to be a multiple of 4, so no 86 -- but 88, 92 are doable.

With MCDs and a mix/match strategy between knows it brings in a new paradigm for the best way to manage your product stack which means the old 'you only need 1 cut because yields are so good' mantra may not be true any longer.

It means that yields get better overall, but there are also less structures to spread defects around. The GCD is mostly CUs now. That said, getting enough defects to necessitate 80 and 84 CU cuts (16.7% of CUs, and 12.5% of CUs respectively) seems a bit of a stretch given a 350mm2 die on a somewhat mature process. Why bother with an 84 CU cut when its so close to the 80 CU cut anyway? A die has single defect and now you need to disable a whole 12 CUs? Sure you could save up those dies for a new SKU, but it looks awkward to fit something in there with the alleged branding scheme.
 
Reactions: Kaluan

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
It probably needs to be a multiple of 4, so no 86 -- but 88, 92 are doable.

Multiple of SE count is more likely. Multi of 4 was the case with the N21 because it had 4 shader engines. N31 is supposed to have 6 so you cut 6 WGPs which is 12 CUs ergo an 84CU 42WGP design which is 7x6.

It means that yields get better overall, but there are also less structures to spread defects around. The GCD is mostly CUs now. That said, getting enough defects to necessitate 80 and 84 CU cuts (16.7% of CUs, and 12.5% of CUs respectively) seems a bit of a stretch given a 350mm2 die on a somewhat mature process. Why bother with an 84 CU cut when its so close to the 80 CU cut anyway? A die has single defect and now you need to disable a whole 12 CUs? Sure you could save up those dies for a new SKU, but it looks awkward to fit something in there with the alleged branding scheme.

It won't be defect driven it will be market driven and it gives AMD more flexibility because having individual 6SE, 5SE, 4SE and 3SE designs on top of the monolithic 2SE N33 design just means their market forecasting needs to be far more accurate.

The other thing to think about is that even still the amount of N5 die area used is small, especially when you compare vs NV

TierAMD DieAMD SizeNV DieNV SizeAMD Additional DiesSizeTotal AMD Area
90 / 900N31~350 N5AD102~610 N46x MCD6x ~35545
80 / 800N32~230 N5AD103~380 N44x MCD4x ~35370
70 / 700N32~230 N5AD104~300 N43x MCD3x ~35335
60 / 600N33~300 N6AD106~200 N400300

If NV over provision AD103 production for example then they have a lot more lag than AMD will because AMD can just decide hmm we need more 7700XTs and we have 7800 tiers on shelves still so those nice N32 dies that are hot off the production line can go into 7700 tier products rather than adding to the stack of 7800 tier products that are sitting around on shelves.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,726
1,342
136
Multiple of SE count is more likely. Multi of 4 was the case with the N21 because it had 4 shader engines. N31 is supposed to have 6 so you cut 6 WGPs which is 12 CUs ergo an 84CU 42WGP design which is 7x6.

If that's the case, then the alleged 80 CU cut doesn't work either, everything else in the leak is suspect as a result, and there's no reason to debate the merits of an 84 CU cut.

It won't be defect driven it will be market driven and it gives AMD more flexibility because having individual 6SE, 5SE, 4SE and 3SE designs on top of the monolithic 2SE N33 design just means their market forecasting needs to be far more accurate.

I agree of course it's ultimately market driven, and of course yield is a primary factor in that analysis. Cutting more than you need to in your #2 SKU, which is very close to another more aggressive cut, while leaving a big gap where you don't cut, while using product names that are entirely unintuitive based on your cuts... does not seem like a realistic market driven strategy. That's why I'm skeptical.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
If that's the case, then the alleged 80 CU cut doesn't work either, everything else in the leak is suspect as a result, and there's no reason to debate the merits of an 84 CU cut.



I agree of course it's ultimately market driven, and of course yield is a primary factor in that analysis. Cutting more than you need to in your #2 SKU, which is very close to another more aggressive cut, while leaving a big gap where you don't cut, while using product names that are entirely unintuitive based on your cuts... does not seem like a realistic market driven strategy. That's why I'm skeptical.
The 80CU is cutting an entire SE together with its cache/MCD = 80CU & 320b. Probably for a failed bond between chiplets.

Cut options are,
Multiples of 6WGP for a full N31. One per SE
Multiples of 4WGP for a full N32. One per SE
One SE (16WGP 8WGP) for both. This also cuts the bus & IF cache.

edit: Thanks for the correction. http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=members/timorous.246211/
 
Last edited:

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
The 80CU is cutting an entire SE together with its cache/MCD = 80CU & 320b. Probably for a failed bond between chiplets.

Cut options are,
Multiples of 6WGP for a full N31. One per SE
Multiples of 4WGP for a full N32. One per SE
One SE (16WGP 8WGP) for both. This also cuts the bus & IF cache.

This and yea, failed bonds could also be another reason to cut even if the die itself is fine.
 

Leeea

Diamond Member
Apr 3, 2020
3,698
5,432
136
This and yea, failed bonds could also be another reason to cut even if the die itself is fine.
The whole multi-chip packaging seems like a big push from TSMC. Like TSMC wants to show off what they can do using AMD as a proving partner.

I suspect a portion of the risk associated may be born by TSMC.
 
Reactions: Kaluan

HurleyBird

Platinum Member
Apr 22, 2003
2,726
1,342
136

GodisanAtheist

Diamond Member
Nov 16, 2006
7,063
7,489
136

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
I was kinda hoping that N33 would come first, since it would suit me fine. N31 is just too much for Fortnite shenanigans.
Same... or at least launch N32 as well so that we can play with a cheaper MCM/chiplet-based GPU sooner.

I was thinking... since I don't want to wait for mid and low end RDNA3 to come out but also don't want to deal with super high TDPs, would it make sense to buy the cheapest N31 (read: most bang-for-buck) and then slap on a -40% power limiter on it so that the final TDP is closer to 250W? I figure performance will still be a solid 30% faster than the RTX 3090 Ti but power will be much lower. That would be really sweet in my opinion, and would just be overall fun to see what kind of perf/W gains you could achieve by downclocking to the sweet spot of the freq/power curve.

Edit: Credit to @uzzi38 for creating this plot for the 6700XT, but I'm sure something like this applies to RDNA 3 where you only take a 20% reduction to performance for >40% reduction in power.

7900XT = ~1.8x 6900XT @ 350W?

After power limiter:
~1.45x 6900XT @ 225W? If so, that's a solid 25% faster than the 3090 Ti at like half the power.

 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
Angstronomics has an article on RDNA3:
Not sure I agree with their die size and Infinity Cache estimates...

Navi 31
  • gfx1100 (Plum Bonito)
  • Chiplet - 1x GCD + 6x MCD (0-hi or 1-hi)
  • 48 WGP (96 legacy CUs, 12288 ALUs)
  • 6 Shader Engines / 12 Shader Arrays
  • Infinity Cache 96MB (0-hi), 192MB (1-hi)
  • 384-bit GDDR6
  • GCD on TSMC N5, ~308 mm²
  • MCD on TSMC N6, ~37.5 mm²
Navi32
  • gfx1101 (Wheat Nas)
  • Chiplet - 1x GCD + 4x MCD (0-hi)
  • 30 WGP (60 legacy CUs, 7680 ALUs)
  • 3 Shader Engines / 6 Shader Arrays
  • Infinity Cache 64MB (0-hi)
  • 256-bit GDDR6
  • GCD on TSMC N5, ~200 mm²
  • MCD on TSMC N6, ~37.5 mm²
Navi33
  • gfx1102 (Hotpink Bonefish)
  • Monolithic
  • 16 WGP (32 legacy CUs, 4096 ALUs)
  • 2 Shader Engines / 4 Shader Arrays
  • Infinity Cache 32MB
  • 128-bit GDDR6
  • TSMC N6, ~203 mm²
 

jpiniero

Lifer
Oct 1, 2010
14,841
5,456
136
If those specs were actually legit for Navi 33, I've have to think it'd be tough for it to be competitive with the 4070 and would get killed by The Flood (if it happens).
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
Updated CU/WGP:
What AMD has officially detailed so far about RDNA 3 is yet another significant increase in performance per watt over RDNA 2, with contributions from process node and microarchitectural design choices. However, the design philosophy of gfx11 is all about area, area, area. What is the best way to achieve the performance target with minimal area? The rearchitected Compute Unit and Optimized Graphics Pipeline changes are mostly about trimming the fat in pursuit of the lowest area and cost (example: halving relative FP64 rate to 1/32). As a result of this focus, PPA is significantly increased. In fact, at the same node, an RDNA 3 WGP is slightly smaller in area than an RDNA 2 WGP, despite packing double the ALUs.

Well, now we know what OREO is:
OREO
One of the features in the RDNA 3 graphics pipeline is OREO: Opaque Random Export Order, which is just one of the many area saving techniques. With gfx10, the pixel shaders run out-of-order, where the outputs go into a Re-Order Buffer before moving to the rest of the pipeline in-order. With OREO, the next step (blend) can now receive and execute operations in any order and export to the next stage in-order. Thus, the ROB can be replaced with a much smaller skid buffer, saving area.

Next gen Infinity Cache is more bandwidth but half the size:
Infinity Cache Updates
The Memory Attached Last Level (MALL) Cache blocks are each halved in size, doubling the number of banks for the same cache amount. There are also changes and additions that increase graphics to MALL bandwidth and reduce the penalty of going out to VRAM.

Regarding how the MCDs are connected to the GCD, and how the Infinity Cache is stacked:
The world’s first chiplet GPU, Navi31 makes use of TSMC’s fanout technology (InFo- S) to lower costs, surrounding a central 48 WGP Graphics Chiplet Die (GCD) with 6 Memory Chiplet Dies (MCD), each containing 16MB of Infinity Cache and the GDDR6 controllers with 64-bit wide PHYs. The organic fanout layer has a 35-micron bump pitch, the densest available in the industry. There is a 3D stacked MCD also being productized (1-hi) using TSMC’s SoIC. While this doubles the Infinity Cache available, the performance benefit is limited given the cost increase.
 
Last edited:

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
Angstronomics has an article on RDNA3:
Not sure I agree with their die size and Infinity Cache estimates...

Navi 31
  • gfx1100 (Plum Bonito)
  • Chiplet - 1x GCD + 6x MCD (0-hi or 1-hi)
  • 48 WGP (96 legacy CUs, 12288 ALUs)
  • 6 Shader Engines / 12 Shader Arrays
  • Infinity Cache 96MB (0-hi), 192MB (1-hi)
  • 384-bit GDDR6
  • GCD on TSMC N5, ~308 mm²
  • MCD on TSMC N6, ~37.5 mm²
Navi32
  • gfx1101 (Wheat Nas)
  • Chiplet - 1x GCD + 4x MCD (0-hi)
  • 30 WGP (60 legacy CUs, 7680 ALUs)
  • 3 Shader Engines / 6 Shader Arrays
  • Infinity Cache 64MB (0-hi)
  • 256-bit GDDR6
  • GCD on TSMC N5, ~200 mm²
  • MCD on TSMC N6, ~37.5 mm²
Navi33
  • gfx1102 (Hotpink Bonefish)
  • Monolithic
  • 16 WGP (32 legacy CUs, 4096 ALUs)
  • 2 Shader Engines / 4 Shader Arrays
  • Infinity Cache 32MB
  • 128-bit GDDR6
  • TSMC N6, ~203 mm²

I think the cache amounts are wrong for the given MCD die size.

32bit PHY in N21 is around 6mm. Two of them are 12mm which means we have 25mm left of the 37.5mm die size. The Zen 3 cache dies are 64MB for 36mm of die area so 32MB would fit in 18mm which is a total of 30mm leaving 7.5mm for other bits n pieces that may be needed like links to the GCD.

I also doubt that N32 spec because it is breaking the 64bit PHYs per SE and the 8 WGPs per SE layout we have with N31 and N33. I know N23 has a different WGP amount per SE vs N21 and N22 and it is actually pretty efficient. I still see 4SE + 32WGPs as being more likely but 3SE and 30WGP is barely any different so I don't think it will make a performance difference. Also a 4SE 32WGP design won't actually be that much different die size wise in that it will be about 2/3 of N31 so in the 200mm region if N31 is 300mm give or take.

Given that N33 has no increase in bus width over N23 and has double the shaders I think it will be bandwidth starved with just 32MB of cache. I guess compression could have improved enough when combined with higher clocked VRAM but 64MB cache + lower clocked VRAM would make the card easier to cool and make hitting a ~150W TDP a bit easier which is why AMD went with Infinity cache in the 1st place. It would also make it a better 1440p card which with the amount of shader horsepower it seems to have would be a pretty great fit.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
If those specs were actually legit for Navi 33, I've have to think it'd be tough for it to be competitive with the 4070 and would get killed by The Flood (if it happens).

N33 as we have said a lot is the 7600XT, it won't be competing with the 4070 although if 4070 is only around 3090 / 3090Ti performance and N33 does hit its 6900XT performance target the performance delta won't be that great. With the above spec though I only expect that to be true at 1080p and 1440p / 4K will see the 4070 pull away provided it does have the 192bit 12GB spec AD104 supports. If 4070 is just 160bit 10GB then even this config of N33 will run it a lot closer than you would think IMO.
 
Reactions: Tlh97

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
If those specs were actually legit for Navi 33, I've have to think it'd be tough for it to be competitive with the 4070 and would get killed by The Flood (if it happens).
That's only because you're stuck on assigning N33 to a higher performance tier. Those specs put it as smaller than N23, on a cheaper node. This is the x600 model.
 

fleshconsumed

Diamond Member
Feb 21, 2002
6,485
2,362
136
Same... or at least launch N32 as well so that we can play with a cheaper MCM/chiplet-based GPU sooner.

I was thinking... since I don't want to wait for mid and low end RDNA3 to come out but also don't want to deal with super high TDPs, would it make sense to buy the cheapest N31 (read: most bang-for-buck) and then slap on a -40% power limiter on it so that the final TDP is closer to 250W? I figure performance will still be a solid 30% faster than the RTX 3090 Ti but power will be much lower. That would be really sweet in my opinion, and would just be overall fun to see what kind of perf/W gains you could achieve by downclocking to the sweet spot of the freq/power curve.

Edit: Credit to @uzzi38 for creating this plot for the 6700XT, but I'm sure something like this applies to RDNA 3 where you only take a 20% reduction to performance for >40% reduction in power.

7900XT = ~1.8x 6900XT @ 350W?

After power limiter:
~1.45x 6900XT @ 225W? If so, that's a solid 25% faster than the 3090 Ti at like half the power.

View attachment 65722
That's what RDNA2 rx6800 was. Roughly 220W board power while solidly beating 3070. The problem was that rx6800 was priced too close to 6800xt and it was/is near impossible to buy it.
 
Reactions: Tlh97 and Saylick

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
Then the desktop version is probably going to be cancelled because of The Flood. Same thing with AD106, I don't expect it on desktop any time soon.

The flood will have no competition. 6600XT power envelope for 6900XT tier 1080p and maybe 1440p performance. Given the cost of electricity now vs then that 150W saving for that tier of performance will be pretty hard to beat. Especially if it comes in at $400 (which with that die size on N6 would give it more margin that N23 had, quite a bit more actually).

Really you would need to see 2nd hand 6900XTs or 3080Ti's for around $400 to even think about it and that would be due to VRAM more than anything.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Then the desktop version is probably going to be cancelled because of The Flood. Same thing with AD106, I don't expect it on desktop any time soon.
Expand your mind for once. Most of the world won't get wet with the flood. USA & EU & China, yes, but that leaves the rest of us not having easy access to 2nd hand cards. A substantial market staying dry. So, what do we buy? Nothing?

If that size is correct, then this card will be cheap. If RX6600 was $329, then RX7600 could be <$300.
 

jpiniero

Lifer
Oct 1, 2010
14,841
5,456
136
The flood will have no competition. 6600XT power envelope for 6900XT tier 1080p and maybe 1440p performance.

That's what I mean, I'm not sure it'd be near 6900 XT performance. The cut model definitely wouldn't be.

Keep in mind it would need to be 90% faster than the 6650 XT and 103% at 1440p to get to 6900 XT levels.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |