Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 47 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146

Mopetar

Diamond Member
Jan 31, 2011
8,010
6,454
136
But how bad is the amd (~6700) flood going to be compared to what nvidia pumped out with great abandon on their samsung node?

Does it really matter whose used GPUs that AMD is competing with? Even brand loyalists will be tempted to snap up good deals given the general unavailability of cards at reasonable prices for most of this generation.

Or once Intel opens the floodgates of ARC GPUs

I don't know if Intel will contribute to the flood. That's almost looking more like a plague at this point.
 

amenx

Diamond Member
Dec 17, 2004
4,008
2,278
136

Leeea

Diamond Member
Apr 3, 2020
3,698
5,432
136
Navi 31 may be 24gb 384 bit?

It is all rumors, but the chip speculated there is a monster of a GPU.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
It is all rumors, but the chip speculated there is a monster of a GPU.

Still not convinced about 32MB + 64bit MCDs with 3D stacking on some parts but maybe I have talked myself into it.

CardBusMemoryCacheShadersDie
7950XT384bit24GB384MB12,288N31
7900XT384bit24GB192MB12,288N31
7850XT320bit20GB160MB10,240N31
7800XT256bit16GB128MB8,192N32
7700XT192bit12GB96MB6,144N32
7600XT128bit8GB64MB4,096N33

I know N33 is rumoured to have 128MB of cache but I think this stack works better and for 1080p 32MB is fine so going 64MB would allow for a smaller die that will be easier to hit the desired price point and it doesn't have the issue of having more cache than the 7700XT would have.
 
Reactions: Tlh97 and Leeea

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Having a 384bit bus would really surprise me. RDNA2's whole thing was to reduce bandwidth by adding on die cache. Now these rumors are claiming AMD is adding way more cache, but then also greatly increasing the bus width??

This makes no sense. All that worked and added cost to the die, only to also increase cost and power consumption of the board and memory modules.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Still not convinced about 32MB + 64bit MCDs with 3D stacking on some parts but maybe I have talked myself into it.

CardBusMemoryCacheShadersDie
7950XT384bit24GB384MB12,288N31
7900XT384bit24GB192MB12,288N31
7850XT320bit20GB160MB10,240N31
7800XT256bit16GB128MB8,192N32
7700XT192bit12GB96MB6,144N32
7600XT128bit8GB64MB4,096N33

I know N33 is rumoured to have 128MB of cache but I think this stack works better and for 1080p 32MB is fine so going 64MB would allow for a smaller die that will be easier to hit the desired price point and it doesn't have the issue of having more cache than the 7700XT would have.
I don't think these 5x32b (20GB) and 3x32b (12GB) memory configurations are realistic. The IF cache has to be distributed close to the shaders for energy efficiency. There's a reason why Navi2 has the IF cache spread out on several sides of the die. Data locality maximized. If you start having odd memory controller configs, this can't be preserved. The iF cache needs to be spread out and situated close to the shaders it serves and the memory controller should be as close to the IF cache it services. This will work well for most data.

AMD maintained memory size for all models using the same die. This is not by accident, with IF cache, they need to do this to optimize energy efficiency.

Having said all that we have a RX 6700 GPU with 160b memory. Obviously a product being used to utilize partial defectives. The utility of any explanation is in it's predictive power. I predict a change in expected perf/power for this model.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,429
2,914
136
Still not convinced about 32MB + 64bit MCDs with 3D stacking on some parts but maybe I have talked myself into it.

CardBusMemoryCacheShadersDie
7950XT384bit24GB384MB12,288N31
7900XT384bit24GB192MB12,288N31
7850XT320bit20GB160MB10,240N31
7800XT256bit16GB128MB8,192N32
7700XT192bit12GB96MB6,144N32
7600XT128bit8GB64MB4,096N33

I know N33 is rumoured to have 128MB of cache but I think this stack works better and for 1080p 32MB is fine so going 64MB would allow for a smaller die that will be easier to hit the desired price point and it doesn't have the issue of having more cache than the 7700XT would have.
1. I think N33 looks realistic.
2. I don't see a reason for 7950XT to have 2x more cache than 7900XT when the rest is the same.
3. Deactivating 25% of 7800xt to get 7700xt is too much.
 

jpiniero

Lifer
Oct 1, 2010
14,841
5,456
136
Having a 384bit bus would really surprise me. RDNA2's whole thing was to reduce bandwidth by adding on die cache. Now these rumors are claiming AMD is adding way more cache, but then also greatly increasing the bus width??

This makes no sense. All that worked and added cost to the die, only to also increase cost and power consumption of the board and memory modules.

Given the (2k?) price point, probably memory capacity to be competitive with AD102. I don't think you will see 4 GB chips until GDDR7 at the earliest.
 
Reactions: Tlh97 and Stuka87

GodisanAtheist

Diamond Member
Nov 16, 2006
7,064
7,489
136
More shaders = more cache + higher bus width = slower memory required.

Its all a series of trade-offs, but if you're going to feed 12K shaders, you need to have more cache distributed more broadly to make sure those shaders stay fed. This means you're going to need more bandwidth to get data to and from the larger caches.

Since AMD seems hellbent on avoiding the use of any sort of exotic memory modules like Nvidia (or maybe there is a licensing premium for them) this is the route they have to take given the absolutely absurd jump in shading power.
 
Reactions: Tlh97 and Leeea

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
1. I think N33 looks realistic.
2. I don't see a reason for 7950XT to have 2x more cache than 7900XT when the rest is the same.
3. Deactivating 25% of 7800xt to get 7700xt is too much.

2. Market it as an 8K card and charge a lot of money for it. Very low volume to and a way to test stacking process improvements.

Edit to add. Also I didn't speculate on clocks and TDP which could also be higher for a 7950XT part to take the benchmark wins where the better option is actually the 7900XT.

3. They need a cut version somewhere for a 7700XT and with only 3 known dies the options are an 8GB part based on N33 which will get slated in the tech press or a cut N32 part.

Edit to add. Also 25% cut is the same as the 6800 vs 6900XT except the 7700XT will also have less ram and fewer MCDs to make the price point easier to hit.
 
Last edited:

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
I don't think these 5x32b (20GB) and 3x32b (12GB) memory configurations are realistic. The IF cache has to be distributed close to the shaders for energy efficiency. There's a reason why Navi2 has the IF cache spread out on several sides of the die. Data locality maximized. If you start having odd memory controller configs, this can't be preserved. The iF cache needs to be spread out and situated close to the shaders it serves and the memory controller should be as close to the IF cache it services. This will work well for most data.

AMD maintained memory size for all models using the same die. This is not by accident, with IF cache, they need to do this to optimize energy efficiency.

Having said all that we have a RX 6700 GPU with 160b memory. Obviously a product being used to utilize partial defectives. The utility of any explanation is in it's predictive power. I predict a change in expected perf/power for this model.

It would be the same as the 6/4 MCD variants with a chip missing so I don't see your point.

What was true for RDNA2 is not necessarily the same for RDNA3 and an advantage to the layout rumoured is that parts with less memory and cut memory buses also use less silicon.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,429
2,914
136
2. Market it as an 8K card and charge a lot of money for it. Very low volume to and a way to test stacking process improvements.
....
8K is 4x larger than 4K, so 2x more cache won't help you much, when the number of shaders(ROPS, TMU, etc.) stays the same.
edit: higher clockspeed could help somewhat.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
It would be the same as the 6/4 MCD variants with a chip missing so I don't see your point.

What was true for RDNA2 is not necessarily the same for RDNA3 and an advantage to the layout rumoured is that parts with less memory and cut memory buses also use less silicon.
The 6 & 4 MCD variants are, AFAIK, apply to different GCD configs. The MCD to shader ratio appearing to be a constant, so there really is no difference between them. The smaller one can be seen a proportional shrink of the larger.

The high level understanding of IF cache is not that complicated. What is, is the detailed engineering on how to physically implement the concept. A 3 & 5 MCD layout will have some of the shaders much further than normal for a data path to the stored IF cache data. The lower level caches will not be affected by this, but any data movement between the L2 and the IF caches will be more expensive for the shaders that would have been served by the missing MCD. A net decrease in perf/W.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
The 6 & 4 MCD variants are, AFAIK, apply to different GCD configs. The MCD to shader ratio appearing to be a constant, so there really is no difference between them. The smaller one can be seen a proportional shrink of the larger.

The high level understanding of IF cache is not that complicated. What is, is the detailed engineering on how to physically implement the concept. A 3 & 5 MCD layout will have some of the shaders much further than normal for a data path to the stored IF cache data. The lower level caches will not be affected by this, but any data movement between the L2 and the IF caches will be more expensive for the shaders that would have been served by the missing MCD. A net decrease in perf/W.

5 serving 10k shaders and 3 serving 6k shaders would also be the exact same ratio as 6 for 12k and 4 for 8k. It is 2k shaders per MCD...

Edit to add. Infact it makes perfect sense since it would be 1 WGP per MCD and a cut down 5WGP N31 would be 10k shaders and a cut down 3 WGP would be 6k shaders.

So AMD just need to deactivate a WGP for the 320bit and 192bit variants if they exist.
 
Last edited:

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
8K is 4x larger than 4K, so 2x more cache won't help you much, when the number of shaders(ROPS, TMU, etc.) stays the same.
edit: higher clockspeed could help somewhat.

Given how the 6900XT/6950XT do at 4k with a 256bit bus and 128MB IC I think a 50% wider bus + 3x more cache will be fine for 8K, especially reconstructed 8K from a 4k + input resolution.
 

MrTeal

Diamond Member
Dec 7, 2003
3,586
1,746
136
Having a 384bit bus would really surprise me. RDNA2's whole thing was to reduce bandwidth by adding on die cache. Now these rumors are claiming AMD is adding way more cache, but then also greatly increasing the bus width??

This makes no sense. All that worked and added cost to the die, only to also increase cost and power consumption of the board and memory modules.
It doesn't seem that off, does it? Navi 31 would have 1.2x the CU of Navi 21 and 2.4x the SP. It's still stuck with the same 18gbps GDDR6, so moving to 1.5x the memory bandwidth (and thus 1.5x the cache) seems reasonable.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
5 serving 10k shaders and 3 serving 6k shaders would also be the exact same ratio as 6 for 12k and 4 for 8k. It is 2k shaders per MCD...

Edit to add. Infact it makes perfect sense since it would be 1 WGP per MCD and a cut down 5WGP N31 would be 10k shaders and a cut down 3 WGP would be 6k shaders.

So AMD just need to deactivate a WGP for the 320bit and 192bit variants if they exist.
Yeah, you're right.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136
I don't think these 5x32b (20GB) and 3x32b (12GB) memory configurations are realistic. The IF cache has to be distributed close to the shaders for energy efficiency. There's a reason why Navi2 has the IF cache spread out on several sides of the die. Data locality maximized. If you start having odd memory controller configs, this can't be preserved. The iF cache needs to be spread out and situated close to the shaders it serves and the memory controller should be as close to the IF cache it services. This will work well for most data.
IF cache is actually close to Memory controllers and quite far from shaders. ALU-->VGPR-->L0-->L1-->L2-->IFC/LLC-->DRAM.
L2 slices are actually in the center of the chip close to each other near the CP (because there are crossbars between them)
Shaders (from different waves) which export to memory feed back to another set of shaders without going out to DRAM at L2. To maximize parallelism (and therefore to avoid multiple shaders grinding specific DRAM channels), each L2 slice is associated with IFC/LLC chunk.

However I am quite perplexed at so many MCDs. why make so many tiny MCDs instead of bigger and fewer MCDs.
One reason I can imagine is that there is broad reuse of these MCDs across product lines otherwise I am scratching my head, it is not like 100mm2+ dies are not getting great yields.
Wondering about packaging steps too. I think EFB not CoWoS.

And why 3D? In AMD's slide they were specifying 3D for MI300 but RDNA3 is just plain "Advanced Chiplet Packaging"
For 3D IFC not sure, because that would mean packaging at TMSC for SoIC and Tongfu for EFB.
But lets see, another 3 months.
 

Leeea

Diamond Member
Apr 3, 2020
3,698
5,432
136
Since AMD seems hellbent on avoiding the use of any sort of exotic memory modules like Nvidia
Probably just trying to avoid having to pay exotic prices.

TSMC makes the cache for AMD, it is AMDs product, AMDs profit. AMD likely did the math and figured cache is cheaper then exotic memory chips.

Memory chips are a cost to AMD, no profit in them. An unpredictable cost in all likely hood, as they are most likely purchased on spot pricing.


Where as TSMC is already bought and allocated to AMD. If AMD does not use their allotments with TSMC, they pay either way. It is a predictable cost they are going to pay regardless. AMD can shuffle their allotment with TSMC around, but one way or another they need to use it all.
 
Last edited:

Frenetic Pony

Senior member
May 1, 2012
218
179
116
More dies == more cost savings as long as packaging cost less than amount saved by smaller die sizes. Also if there's a lot then it's going to look like Zen, only one compute die ever needs to be designed, cost saving and better time to market and more.

Anyway, thinking about 6MCD... that could mean 1 MCD is roughly similar in size to a zen chiplet. As that is assumedly around the best cost per die AMD can get that makes sense. It could also mean the 4 configurations might be something like (lowest to highest config): 1 mcd, 2 mcd, 4 mcd, 6mcd.

You slot 1 of these next to an APU chiplet that has all the command/control parts on it as well as CPU and whatever for the lowest config. Then as you go up you pair them with more and more memory chiplets. The thing I'm slightly fogged by is the mention of 2 UMCs for 6 MCDs. Middle config = 3 or 4 MCDs with 1 UMC. But what does lowest dedicated config look like then? 1 UMC as well but just way too much bandwidth, is there two different kinds of UMC, does monolithic even make sense with such small chiplets you're producing anyways? The questions spiral off into infinity, so I'll just leave it.
 

Aapje

Golden Member
Mar 21, 2022
1,467
2,031
106
However I am quite perplexed at so many MCDs. why make so many tiny MCDs instead of bigger and fewer MCDs.

They are doing the same thing for the AM5 motherboards. The X670 boards have two PROM21 chips, while the B650 has one PROM21. This makes the X670 more complex than when having a single X670-specific chipset, but is much cheaper, as it doesn't require two designs.

It's also more flexible, since if the demand for X670 is higher or lower than expected, you can just take chips that were intended for B650 or make more B650 boards with those chips.

One reason I can imagine is that there is broad reuse of these MCDs across product lines otherwise I am scratching my head, it is not like 100mm2+ dies are not getting great yields.

The cheaper the production, the more the cost of design matter. So it makes perfect sense to have separate GCDs, which are the expensive to produce parts, but reuse MCDs. It makes it very easy for them to release intermediate products. For example, they can make cards with 1, 2, 4 and 6 MCDs, with matching bus widths. So a 7800 XT with 4 MCDs would actually be cheaper to make than a 7900 XT with 6 MCDs and a wider bus, while the 6800 XT is nearly as expensive to make as the 6900 XT.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136
Anyway, thinking about 6MCD... that could mean 1 MCD is roughly similar in size to a zen chiplet.
1 MCD with 2x UMC and about 32MiB Cache would be around 55 45mm2 on N7 (using N21 as reference and +10% interconnect logic). Much smaller than a Zen chiplet

The thing I'm slightly fogged by is the mention of 2 UMCs for 6 MCDs. Middle config = 3 or 4 MCDs with 1 UMC. But what does lowest dedicated config look like then? 1 UMC as well but just way too much bandwidth, is there two different kinds of UMC, does monolithic even make sense with such small chiplets you're producing anyways? The questions spiral off into infinity, so I'll just leave it.
UMC = Unified Memory Controller which can handle 1x GDDR6 chip with 32 data lines and 2 channels.
1x MCD has 2 UMCs.
So 6x MCDs = 6 * 2 * 32 = 384 Bit wide and 12x G6 chips.
 
Last edited:

Kepler_L2

Senior member
Sep 6, 2020
466
1,910
106
The 6 & 4 MCD variants are, AFAIK, apply to different GCD configs. The MCD to shader ratio appearing to be a constant, so there really is no difference between them. The smaller one can be seen a proportional shrink of the larger.

The high level understanding of IF cache is not that complicated. What is, is the detailed engineering on how to physically implement the concept. A 3 & 5 MCD layout will have some of the shaders much further than normal for a data path to the stored IF cache data. The lower level caches will not be affected by this, but any data movement between the L2 and the IF caches will be more expensive for the shaders that would have been served by the missing MCD. A net decrease in perf/W.
There's 1 MCD for each SE on Navi3x. Cutdown configs will disable 1 SE and also lose 1 MCD. It will not affect efficiency at all.
 
Reactions: Mopetar
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |