Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

Mopetar · Jul 11, 2022

tomatosummit said:
But how bad is the amd (~6700) flood going to be compared to what nvidia pumped out with great abandon on their samsung node?

Does it really matter whose used GPUs that AMD is competing with? Even brand loyalists will be tempted to snap up good deals given the general unavailability of cards at reasonable prices for most of this generation.

biostud said:
Or once Intel opens the floodgates of ARC GPUs

I don't know if Intel will contribute to the flood. That's almost looking more like a plague at this point.

amenx · Jul 13, 2022

Navi 31 may be 24gb 384 bit?

AMD RDNA3 Navi 31 GPU with six MCDs to feature 384-bit wide memory bus - VideoCardz.com

AMD accidentally confirms Navi 31 MCD count Some new information on AMD next-gen RDNA3 series has been posted and immediately retracted from Linux code. The calm before the storm. It has been relatively quiet in terms of leaks for RDNA3 architecture. However, it appears that AMD themselves made...

videocardz.com

Leeea · Jul 13, 2022

amenx said:
Navi 31 may be 24gb 384 bit?

AMD RDNA3 Navi 31 GPU with six MCDs to feature 384-bit wide memory bus - VideoCardz.com

AMD accidentally confirms Navi 31 MCD count Some new information on AMD next-gen RDNA3 series has been posted and immediately retracted from Linux code. The calm before the storm. It has been relatively quiet in terms of leaks for RDNA3 architecture. However, it appears that AMD themselves made...

videocardz.com

It is all rumors, but the chip speculated there is a monster of a GPU.

Timorous · Jul 13, 2022

Leeea said:
It is all rumors, but the chip speculated there is a monster of a GPU.

Still not convinced about 32MB + 64bit MCDs with 3D stacking on some parts but maybe I have talked myself into it.

Card	Bus	Memory	Cache	Shaders	Die
7950XT	384bit	24GB	384MB	12,288	N31
7900XT	384bit	24GB	192MB	12,288	N31
7850XT	320bit	20GB	160MB	10,240	N31
7800XT	256bit	16GB	128MB	8,192	N32
7700XT	192bit	12GB	96MB	6,144	N32
7600XT	128bit	8GB	64MB	4,096	N33

I know N33 is rumoured to have 128MB of cache but I think this stack works better and for 1080p 32MB is fine so going 64MB would allow for a smaller die that will be easier to hit the desired price point and it doesn't have the issue of having more cache than the 7700XT would have.

Stuka87 · Jul 13, 2022

Having a 384bit bus would really surprise me. RDNA2's whole thing was to reduce bandwidth by adding on die cache. Now these rumors are claiming AMD is adding way more cache, but then also greatly increasing the bus width??

This makes no sense. All that worked and added cost to the die, only to also increase cost and power consumption of the board and memory modules.

maddie · Jul 13, 2022

Timorous said:
Still not convinced about 32MB + 64bit MCDs with 3D stacking on some parts but maybe I have talked myself into it.

Card Bus Memory Cache Shaders Die
7950XT 384bit 24GB 384MB 12,288 N31
7900XT 384bit 24GB 192MB 12,288 N31
7850XT 320bit 20GB 160MB 10,240 N31
7800XT 256bit 16GB 128MB 8,192 N32
7700XT 192bit 12GB 96MB 6,144 N32
7600XT 128bit 8GB 64MB 4,096 N33

I know N33 is rumoured to have 128MB of cache but I think this stack works better and for 1080p 32MB is fine so going 64MB would allow for a smaller die that will be easier to hit the desired price point and it doesn't have the issue of having more cache than the 7700XT would have.

I don't think these 5x32b (20GB) and 3x32b (12GB) memory configurations are realistic. The IF cache has to be distributed close to the shaders for energy efficiency. There's a reason why Navi2 has the IF cache spread out on several sides of the die. Data locality maximized. If you start having odd memory controller configs, this can't be preserved. The iF cache needs to be spread out and situated close to the shaders it serves and the memory controller should be as close to the IF cache it services. This will work well for most data.

AMD maintained memory size for all models using the same die. This is not by accident, with IF cache, they need to do this to optimize energy efficiency.

Having said all that we have a RX 6700 GPU with 160b memory. Obviously a product being used to utilize partial defectives. The utility of any explanation is in it's predictive power. I predict a change in expected perf/power for this model.

TESKATLIPOKA · Jul 13, 2022

Timorous said:
Still not convinced about 32MB + 64bit MCDs with 3D stacking on some parts but maybe I have talked myself into it.

Card Bus Memory Cache Shaders Die
7950XT 384bit 24GB 384MB 12,288 N31
7900XT 384bit 24GB 192MB 12,288 N31
7850XT 320bit 20GB 160MB 10,240 N31
7800XT 256bit 16GB 128MB 8,192 N32
7700XT 192bit 12GB 96MB 6,144 N32
7600XT 128bit 8GB 64MB 4,096 N33

I know N33 is rumoured to have 128MB of cache but I think this stack works better and for 1080p 32MB is fine so going 64MB would allow for a smaller die that will be easier to hit the desired price point and it doesn't have the issue of having more cache than the 7700XT would have.

1. I think N33 looks realistic.
2. I don't see a reason for 7950XT to have 2x more cache than 7900XT when the rest is the same.
3. Deactivating 25% of 7800xt to get 7700xt is too much.

jpiniero · Jul 13, 2022

Stuka87 said:
Having a 384bit bus would really surprise me. RDNA2's whole thing was to reduce bandwidth by adding on die cache. Now these rumors are claiming AMD is adding way more cache, but then also greatly increasing the bus width??

This makes no sense. All that worked and added cost to the die, only to also increase cost and power consumption of the board and memory modules.

Given the (2k?) price point, probably memory capacity to be competitive with AD102. I don't think you will see 4 GB chips until GDDR7 at the earliest.

GodisanAtheist · Jul 13, 2022

More shaders = more cache + higher bus width = slower memory required.

Its all a series of trade-offs, but if you're going to feed 12K shaders, you need to have more cache distributed more broadly to make sure those shaders stay fed. This means you're going to need more bandwidth to get data to and from the larger caches.

Since AMD seems hellbent on avoiding the use of any sort of exotic memory modules like Nvidia (or maybe there is a licensing premium for them) this is the route they have to take given the absolutely absurd jump in shading power.

Timorous · Jul 13, 2022

TESKATLIPOKA said:
1. I think N33 looks realistic.
2. I don't see a reason for 7950XT to have 2x more cache than 7900XT when the rest is the same.
3. Deactivating 25% of 7800xt to get 7700xt is too much.

2. Market it as an 8K card and charge a lot of money for it. Very low volume to and a way to test stacking process improvements.

Edit to add. Also I didn't speculate on clocks and TDP which could also be higher for a 7950XT part to take the benchmark wins where the better option is actually the 7900XT.

3. They need a cut version somewhere for a 7700XT and with only 3 known dies the options are an 8GB part based on N33 which will get slated in the tech press or a cut N32 part.

Edit to add. Also 25% cut is the same as the 6800 vs 6900XT except the 7700XT will also have less ram and fewer MCDs to make the price point easier to hit.

Timorous · Jul 13, 2022

maddie said:
I don't think these 5x32b (20GB) and 3x32b (12GB) memory configurations are realistic. The IF cache has to be distributed close to the shaders for energy efficiency. There's a reason why Navi2 has the IF cache spread out on several sides of the die. Data locality maximized. If you start having odd memory controller configs, this can't be preserved. The iF cache needs to be spread out and situated close to the shaders it serves and the memory controller should be as close to the IF cache it services. This will work well for most data.

AMD maintained memory size for all models using the same die. This is not by accident, with IF cache, they need to do this to optimize energy efficiency.

Having said all that we have a RX 6700 GPU with 160b memory. Obviously a product being used to utilize partial defectives. The utility of any explanation is in it's predictive power. I predict a change in expected perf/power for this model.

It would be the same as the 6/4 MCD variants with a chip missing so I don't see your point.

What was true for RDNA2 is not necessarily the same for RDNA3 and an advantage to the layout rumoured is that parts with less memory and cut memory buses also use less silicon.

TESKATLIPOKA · Jul 13, 2022

Timorous said:
2. Market it as an 8K card and charge a lot of money for it. Very low volume to and a way to test stacking process improvements.
....

8K is 4x larger than 4K, so 2x more cache won't help you much, when the number of shaders(ROPS, TMU, etc.) stays the same.
edit: higher clockspeed could help somewhat.

maddie · Jul 13, 2022

Timorous said:
It would be the same as the 6/4 MCD variants with a chip missing so I don't see your point.

What was true for RDNA2 is not necessarily the same for RDNA3 and an advantage to the layout rumoured is that parts with less memory and cut memory buses also use less silicon.

The 6 & 4 MCD variants are, AFAIK, apply to different GCD configs. The MCD to shader ratio appearing to be a constant, so there really is no difference between them. The smaller one can be seen a proportional shrink of the larger.

The high level understanding of IF cache is not that complicated. What is, is the detailed engineering on how to physically implement the concept. A 3 & 5 MCD layout will have some of the shaders much further than normal for a data path to the stored IF cache data. The lower level caches will not be affected by this, but any data movement between the L2 and the IF caches will be more expensive for the shaders that would have been served by the missing MCD. A net decrease in perf/W.

Timorous · Jul 13, 2022

maddie said:
The 6 & 4 MCD variants are, AFAIK, apply to different GCD configs. The MCD to shader ratio appearing to be a constant, so there really is no difference between them. The smaller one can be seen a proportional shrink of the larger.

The high level understanding of IF cache is not that complicated. What is, is the detailed engineering on how to physically implement the concept. A 3 & 5 MCD layout will have some of the shaders much further than normal for a data path to the stored IF cache data. The lower level caches will not be affected by this, but any data movement between the L2 and the IF caches will be more expensive for the shaders that would have been served by the missing MCD. A net decrease in perf/W.

5 serving 10k shaders and 3 serving 6k shaders would also be the exact same ratio as 6 for 12k and 4 for 8k. It is 2k shaders per MCD...

Edit to add. Infact it makes perfect sense since it would be 1 WGP per MCD and a cut down 5WGP N31 would be 10k shaders and a cut down 3 WGP would be 6k shaders.

So AMD just need to deactivate a WGP for the 320bit and 192bit variants if they exist.

Timorous · Jul 13, 2022

TESKATLIPOKA said:
8K is 4x larger than 4K, so 2x more cache won't help you much, when the number of shaders(ROPS, TMU, etc.) stays the same.
edit: higher clockspeed could help somewhat.

Given how the 6900XT/6950XT do at 4k with a 256bit bus and 128MB IC I think a 50% wider bus + 3x more cache will be fine for 8K, especially reconstructed 8K from a 4k + input resolution.

MrTeal · Jul 13, 2022

Stuka87 said:
Having a 384bit bus would really surprise me. RDNA2's whole thing was to reduce bandwidth by adding on die cache. Now these rumors are claiming AMD is adding way more cache, but then also greatly increasing the bus width??

This makes no sense. All that worked and added cost to the die, only to also increase cost and power consumption of the board and memory modules.

It doesn't seem that off, does it? Navi 31 would have 1.2x the CU of Navi 21 and 2.4x the SP. It's still stuck with the same 18gbps GDDR6, so moving to 1.5x the memory bandwidth (and thus 1.5x the cache) seems reasonable.

igor_kavinski · Jul 13, 2022

Gobs of bandwidth for more gibs. Yum!

maddie · Jul 13, 2022

Timorous said:
5 serving 10k shaders and 3 serving 6k shaders would also be the exact same ratio as 6 for 12k and 4 for 8k. It is 2k shaders per MCD...

Edit to add. Infact it makes perfect sense since it would be 1 WGP per MCD and a cut down 5WGP N31 would be 10k shaders and a cut down 3 WGP would be 6k shaders.

So AMD just need to deactivate a WGP for the 320bit and 192bit variants if they exist.

Yeah, you're right.

DisEnchantment · Jul 13, 2022

maddie said:
I don't think these 5x32b (20GB) and 3x32b (12GB) memory configurations are realistic. The IF cache has to be distributed close to the shaders for energy efficiency. There's a reason why Navi2 has the IF cache spread out on several sides of the die. Data locality maximized. If you start having odd memory controller configs, this can't be preserved. The iF cache needs to be spread out and situated close to the shaders it serves and the memory controller should be as close to the IF cache it services. This will work well for most data.

IF cache is actually close to Memory controllers and quite far from shaders. ALU-->VGPR-->L0-->L1-->L2-->IFC/LLC-->DRAM.
L2 slices are actually in the center of the chip close to each other near the CP (because there are crossbars between them)
Shaders (from different waves) which export to memory feed back to another set of shaders without going out to DRAM at L2. To maximize parallelism (and therefore to avoid multiple shaders grinding specific DRAM channels), each L2 slice is associated with IFC/LLC chunk.

However I am quite perplexed at so many MCDs. why make so many tiny MCDs instead of bigger and fewer MCDs.
One reason I can imagine is that there is broad reuse of these MCDs across product lines otherwise I am scratching my head, it is not like 100mm2+ dies are not getting great yields.
Wondering about packaging steps too. I think EFB not CoWoS.

And why 3D? In AMD's slide they were specifying 3D for MI300 but RDNA3 is just plain "Advanced Chiplet Packaging"
For 3D IFC not sure, because that would mean packaging at TMSC for SoIC and Tongfu for EFB.
But lets see, another 3 months.

Leeea · Jul 13, 2022

GodisanAtheist said:
Since AMD seems hellbent on avoiding the use of any sort of exotic memory modules like Nvidia

Probably just trying to avoid having to pay exotic prices.

TSMC makes the cache for AMD, it is AMDs product, AMDs profit. AMD likely did the math and figured cache is cheaper then exotic memory chips.

Memory chips are a cost to AMD, no profit in them. An unpredictable cost in all likely hood, as they are most likely purchased on spot pricing.

Where as TSMC is already bought and allocated to AMD. If AMD does not use their allotments with TSMC, they pay either way. It is a predictable cost they are going to pay regardless. AMD can shuffle their allotment with TSMC around, but one way or another they need to use it all.

igor_kavinski · Jul 13, 2022

Leeea said:
Memory chips are a cost to AMD, no profit in them.

They have some really smart people there to figure that out. Wish Intel had a few.

Frenetic Pony · Jul 13, 2022

More dies == more cost savings as long as packaging cost less than amount saved by smaller die sizes. Also if there's a lot then it's going to look like Zen, only one compute die ever needs to be designed, cost saving and better time to market and more.

Anyway, thinking about 6MCD... that could mean 1 MCD is roughly similar in size to a zen chiplet. As that is assumedly around the best cost per die AMD can get that makes sense. It could also mean the 4 configurations might be something like (lowest to highest config): 1 mcd, 2 mcd, 4 mcd, 6mcd.

You slot 1 of these next to an APU chiplet that has all the command/control parts on it as well as CPU and whatever for the lowest config. Then as you go up you pair them with more and more memory chiplets. The thing I'm slightly fogged by is the mention of 2 UMCs for 6 MCDs. Middle config = 3 or 4 MCDs with 1 UMC. But what does lowest dedicated config look like then? 1 UMC as well but just way too much bandwidth, is there two different kinds of UMC, does monolithic even make sense with such small chiplets you're producing anyways? The questions spiral off into infinity, so I'll just leave it.

Aapje · Jul 13, 2022

DisEnchantment said:
However I am quite perplexed at so many MCDs. why make so many tiny MCDs instead of bigger and fewer MCDs.

They are doing the same thing for the AM5 motherboards. The X670 boards have two PROM21 chips, while the B650 has one PROM21. This makes the X670 more complex than when having a single X670-specific chipset, but is much cheaper, as it doesn't require two designs.

It's also more flexible, since if the demand for X670 is higher or lower than expected, you can just take chips that were intended for B650 or make more B650 boards with those chips.

One reason I can imagine is that there is broad reuse of these MCDs across product lines otherwise I am scratching my head, it is not like 100mm2+ dies are not getting great yields.

The cheaper the production, the more the cost of design matter. So it makes perfect sense to have separate GCDs, which are the expensive to produce parts, but reuse MCDs. It makes it very easy for them to release intermediate products. For example, they can make cards with 1, 2, 4 and 6 MCDs, with matching bus widths. So a 7800 XT with 4 MCDs would actually be cheaper to make than a 7900 XT with 6 MCDs and a wider bus, while the 6800 XT is nearly as expensive to make as the 6900 XT.

DisEnchantment · Jul 13, 2022

Frenetic Pony said:
Anyway, thinking about 6MCD... that could mean 1 MCD is roughly similar in size to a zen chiplet.

1 MCD with 2x UMC and about 32MiB Cache would be around 55 45mm2 on N7 (using N21 as reference and +10% interconnect logic). Much smaller than a Zen chiplet

Frenetic Pony said:
The thing I'm slightly fogged by is the mention of 2 UMCs for 6 MCDs. Middle config = 3 or 4 MCDs with 1 UMC. But what does lowest dedicated config look like then? 1 UMC as well but just way too much bandwidth, is there two different kinds of UMC, does monolithic even make sense with such small chiplets you're producing anyways? The questions spiral off into infinity, so I'll just leave it.

UMC = Unified Memory Controller which can handle 1x GDDR6 chip with 32 data lines and 2 channels.
1x MCD has 2 UMCs.
So 6x MCDs = 6 * 2 * 32 = 384 Bit wide and 12x G6 chips.

Kepler_L2 · Jul 13, 2022

maddie said:
The 6 & 4 MCD variants are, AFAIK, apply to different GCD configs. The MCD to shader ratio appearing to be a constant, so there really is no difference between them. The smaller one can be seen a proportional shrink of the larger.

The high level understanding of IF cache is not that complicated. What is, is the detailed engineering on how to physically implement the concept. A 3 & 5 MCD layout will have some of the shaders much further than normal for a data path to the stored IF cache data. The lower level caches will not be affected by this, but any data movement between the L2 and the IF caches will be more expensive for the shaders that would have been served by the missing MCD. A net decrease in perf/W.

There's 1 MCD for each SE on Navi3x. Cutdown configs will disable 1 SE and also lose 1 MCD. It will not affect efficiency at all.

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Platinum Member

Lifer

Diamond Member

Golden Member

Golden Member

Platinum Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Lifer

Diamond Member

Golden Member

Diamond Member

Lifer

Senior member

Golden Member

Golden Member

Senior member