Discussion RDNA4 + CDNA3 Architectures Thread

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,235
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
I don't think we can expect RX 8600 to cost only $199-249 when a cheaper N6 N33 sells for $269.
The question is what they can pack inside a 100mm2 GCD.
If they can put at least 2SE, 40CU, 2560SP, 160TMU and 64ROPs, which is ~25% more than what N33 has, then I wouldn't be surprised If they asked $299 for this.

RX 8600, if based on the assumptions from the patent, would be carrying less of its own design cost, no dedicated masks to create, since every piece of silicon would be shared with every other SKU in the family.

AMD can introduce it for $299, but it would still have to be above break even later in its life at $199, which would be more challenging on N3. Also, AMD throwing N3 capacity at product that is barely above break-even would not be the most optimal approach.

BTW, your specs are in the same ballpark I would be expecting from the single GCD at ~100mm2 on N4.

The only real problem would be 8GB vram.
Seriously, will they newer release a 3gbit DDR6 modules?

Yeah, that is the biggest problem of the bottom of the range card. Hopefully, AMD will consider 3 gbit DDR6 modules to get to 12 GB.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
That's precisely what the patent describes. Multiple GDCs stacked on base dies (which contain memory controller and cache) and base dies are connected together with stacked active silicon bridge.

So this would be the first time a stacked chip spans 2 base chips. Which seems quite ambitious. This type of connection would go way beyond any other horizontal links, including Apple's.

BTW, the patent was filed in 2021, published in October 2022, but still an interesting read, even if you just look at the pictures (by downloading the PDF):

I'm going to have to print this out. Going back and forth between the text and images on screen is annoying. Oh duh, as I type this I just realized I can open two copies of the PDF and put them side by side split screen. nvm
 

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
I'm going to have to print this out. Going back and forth between the text and images on screen is annoying. Oh duh, as I type this I just realized I can open two copies of the PDF and put them side by side split screen. nvm
That is a real PITA. Grouping all the pictures at the beginning, and then the description pages down makes it difficult to read.

BTW, good tip to keep 2 copies open at the same time.
 

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
Well, to really benefit from a multi-chip strategy, the GCD's must be modular as well. That will help with scaling across product lines (as I believe @Kepler_L2 was pointing to. Then, AMD will need a very high bandwidth, low latency interconnect like Apple uses in it's dual chip M2 Ultra SoC. Apple, and TSMC have shown that this can be done with two 'chiplets'. This will have to be extended to three or four interconnects with smaller GCDs (maybe 150mm^2) to go from a lower end GPU up to the top performance rung. Sadly, if rumors are correct, AMD will be using N4P for RDNA4 - rather than the higher density N3E; maybe the performance (clocks) just are not going to be there with that widely available node at the start of HVM.
If AMD is using N4P for RDNA4, we may be looking at something very different. RDNA3 appears to be density optimized (according to this, the GCD has a density of > 150M / mm2), which makes it a poor fit for high performance gaming, but great for mobile.

If AMD is able to relax density for RDNA4 and allow for higher clocks, we may actually get a proper flagship out of them.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,062
7,487
136
If AMD is using N4P for RDNA4, we may be looking at something very different. RDNA3 appears to be density optimized (according to this, the GCD has a density of > 150M / mm2), which makes it a poor fit for high performance gaming, but great for mobile.

If AMD is able to relax density for RDNA4 and allow for higher clocks, we may actually get a proper flagship out of them.

-That is an outrageously high transistor density if true. RDNA1/2 was something like 50Mtr/mm2 on N7.

AMD was clearly going for as small a die as possible, at the expense of absolutely everything else.

I wonder how many of those extra transistors in N31 are just there compensating for the absurdly high density (ironically).
 

Hitman928

Diamond Member
Apr 15, 2012
5,600
8,790
136
-That is an outrageously high transistor density if true. RDNA1/2 was something like 50Mtr/mm2 on N7.

AMD was clearly going for as small a die as possible, at the expense of absolutely everything else.

I wonder how many of those extra transistors in N31 are just there compensating for the absurdly high density (ironically).

I have no idea of the accuracy of the numbers, but are you comparing the overall chip of RDNA 1/2 to just the GCD of RDNA3? You would need to compare the density of RDNA 1/2 without all the memory controllers and such which should be much denser than the overall chip.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,428
2,914
136
Any RDNA4 news? Thread seems all about CDNA3.

It really seem that RDNA4 and CDNA3 are so different, that they should have their own threads.
Didn't hear anything new.

On the other hand 32gbps GDDR7 looks very nice.
With this memory, N31 would need only 4*MCDs for example.
This new MCD would be 24MB IC + 64-bit GDDR7. It would provide 96MB IC and 256-bit bus, but with 6.7% higher BW compared to N31 If they used 32gbps chips.

Compared to N33 the BW would be 78% higher.
It should be enough for a 56CU GPU with similar clocks to N33 without any significant performance hit at least at 1080-1440p.

@Heartbreaker That depends entirely on how much a 32gbps GDDR7 chip would cost over 20gbps GDDR6. The real problem is the size, we need 32Gbit chips or at least 24Gbit.
 
Last edited:
Reactions: Tlh97 and Joe NYC

Heartbreaker

Diamond Member
Apr 3, 2006
4,262
5,259
136
Didn't hear anything new.

On the other hand 32gbps GDDR7 looks very nice.
With this memory, N31 would need only 4*MCDs for example.
This new MCD would be 24MB IC + 64-bit GDDR7. It would provide 96MB IC and 256-bit bus, but with 6.7% higher BW compared to N31 If they used 32gbps chips.

Compared to N33 the BW would be 78% higher.
It should be enough for a 56CU GPU with similar clocks to N33 without any significant performance hit at least at 1080-1440p.

I think AMD leans more toward wider/cheaper, rather than more narrow/expensive, and with the separate MCDs it make even more sense. Wider also makes it easier to have more memory.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,262
5,259
136
@Heartbreaker That depends entirely on how much a 32gbps GDDR7 chip would cost over 20gbps GDDR6. The real problem is the size, we need 32Gbit chips or at least 24Gbit.

I've noticed that AMD tends to let NVidia push the faster/newer memory first and tends to stay back from the bleeding edge.

Wider/more MCDs gives more options on memory capacity as well.
 
Jul 27, 2020
17,916
11,688
116
Why do you think there will be one?
You think they will fix It, and It will suddenly clock 1/3 higher?
Should be similar to the jump from 6900 to 6950 XT. Sadly, they don't aim for the sky. They are happy with being No.2. I don't expect the jump to be more than 10% and maybe they will improve the card's cooling to minimize common thermal complaints about RDNA3.

They could, however, increase the Infinity Cache capacity or reduce its latency without increasing the size. Another thing they could do is switch to GDDR7 for the 7950 XTX and pit it against the 4080 Ti for $50 or $100 less.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,428
2,914
136
NV fixed GF100 in less time than the gap between N31 and N32 so it has happened before.
According to TPU, GTX 580 was 16% faster than GTX 480. It had 10% higher clocks, but max OC was comparable, the rest of the performance was from more shaders 512 vs 480. Power consumption dropped by 30W or 12%.
In case of fixed N31 we are supposedly talking about ~3.5GHz clocks at an unknown TBP.
RTX 7900XTX needs 406W just to work at 2545MHz in Cyberpunk 4K Ultra + RT. TPU
Fixing N31 looks a lot harder to me.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,428
2,914
136
Should be similar to the jump from 6900 to 6950 XT. Sadly, they don't aim for the sky. They are happy with being No.2. I don't expect the jump to be more than 10% and maybe they will improve the card's cooling to minimize common thermal complaints about RDNA3.

They could, however, increase the Infinity Cache capacity or reduce its latency without increasing the size. Another thing they could do is switch to GDDR7 for the 7950 XTX and pit it against the 4080 Ti for $50 or $100 less.
10% extra performance wouldn't help them at all. Nvidia would just release the full AD103 with a bit higher clocks for similar performance. AD102 would still be unchallenged.

Why would they need to increase Infinity cache or use GDDR7, which is not even available? Doesn't make any sense, considering AMD could simply use 24gbps GDDR6, which is available and would boost BW by 20%.
 
Jul 27, 2020
17,916
11,688
116
Is there anything preventing them from using dual GCDs with slightly lower, greener clocks to bruteforce their way to 4090+ levels of pixel pushing power? It's only 22% slower in raster than a 4090 on average (TPU).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |