Discussion RDNA4 + CDNA3 Architectures Thread

Page 373 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,773
6,749
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

branch_suggestion

Senior member
Aug 4, 2023
631
1,344
96
boring and niche, if you're spending this R&D you do a 500 on 500, N2 on N4C and you win.
Oh I'm not advocating for N4c sequel.
Something far far simpler instead.
WoW slab really is the best of both worlds.
No need for the API breaking hacks to scale across multiple frontends but you can scale perf to a similar degree. (you can still implement some of the ideas if they scale well enough)
Also think of how easy it is to market a beyond reticle stack, X3D branding is a legit marketing weapon now and you can tout lots of big, shiny numbers that are > everyone else.
It is such a simple product manager goal, give 1kmm^2, N2 on N3/4 SoIC-WoW and let them cook.
Use some of that Sony/MS console R&D money and build something that will lift all boats.
I really hope AMD sees the situation is very winnable and Mr Hyunh has the mandate to do what is best for client compute. Even if they start it a bit late compare to the mono RDNA5 parts it would still be ready by Computex 2027. Oh, and don't undersize the mono parts this time, 1.5x the N44/N48 CUs and GDDR7 with the new uArch should be good enough for the normies.
 

Kepler_L2

Senior member
Sep 6, 2020
782
3,169
136
WoW slab really is the best of both worlds.
No need for the API breaking hacks to scale across multiple frontends but you can scale perf to a similar degree. (you can still implement some of the ideas if they scale well enough)
Also think of how easy it is to market a beyond reticle stack, X3D branding is a legit marketing weapon now and you can tout lots of big, shiny numbers that are > everyone else.
It is such a simple product manager goal, give 1kmm^2, N2 on N3/4 SoIC-WoW and let them cook.
Use some of that Sony/MS console R&D money and build something that will lift all boats.
I really hope AMD sees the situation is very winnable and Mr Hyunh has the mandate to do what is best for client compute. Even if they start it a bit late compare to the mono RDNA5 parts it would still be ready by Computex 2027. Oh, and don't undersize the mono parts this time, 1.5x the N44/N48 CUs and GDDR7 with the new uArch should be good enough for the normies.
Even the monolithic gfx13 stuff has the new CP/GE distributed logic so that's not a problem. If they go with chiplets for dGPUs I expect them to just re-use one ~100mm² MID across the stack with ~150/250/350mm² GCDs, which should be enough for a 2x perf increase over N48 even on N3P.
 

branch_suggestion

Senior member
Aug 4, 2023
631
1,344
96
Even the monolithic gfx13 stuff has the new CP/GE distributed logic so that's not a problem.
Very cool, wonder how it will scale between implementations.
If they go with chiplets for dGPUs I expect them to just re-use one ~100mm² MID across the stack with ~150/250/350mm² GCDs, which should be enough for a 2x perf increase over N48 even on N3P.
Yeah how to implement the IO is the sticky point, you can have a WoW stack with a PHY that connects to the MID via fanout. Or just leave enough room between all the MALL and G7 PHYs, but this might not leave enough space, especially if they do smaller stacks.
MID approach does make sense only if it scales across the stack, could work for both 2D and 3D paradigms.
350mm^2 of just compute, cache and memory PHYs could get a bit over 1.5x on N3P, 2x would require more area or 3D stacking.
 

marees

Senior member
Apr 28, 2024
946
1,261
96
Quick and dirty is that the 9070XT has about 50% more performance than the 7800XT with a very similar config.

Doing the same at 96CUs means 7900XTX + 50% with a very similar config.

So sure the numbers are super rough but even if error bars are +/-10% it gives you a ballpark.
AMD's claim is 40% raster increase per CU & doubling of RT
 

uzzi38

Platinum Member
Oct 16, 2019
2,745
6,627
146
RDNA4 clocks vary a lot more depending on how hard you hammer the memory.
Remember, this thing is not like the others.

Have you been shown the funny?

(IYKYK)

But generally speaking on the topic of overclocking, trust me when I say shader clocks absolutely won't be limiting factor. Power budget or the other silicon in there absolutely will be.
 

DownTheSky

Senior member
Apr 7, 2013
800
167
116
So we're not there yet. Performance looks similar to a 4080. I saw some videos on youtube and it's enough for RT medium aka normal RT. For path tracing, even 1080p you need a 5090.
 

soresu

Diamond Member
Dec 19, 2014
3,689
3,026
136
For path tracing, even 1080p you need a 5090

Unlike the fairly stale and nearly tapped out state of raster gfx, the software side of GPU RT/PT is still in its wild west period of discovery, with a lot of potential low hanging fruit remaining to improve performance with the same hardware without using super resolution haxx.

Point of fact, the software side has improved performance many times more than the hardware side since Turing was announced in 2018.
 
Reactions: lightmanek

inquiss

Senior member
Oct 13, 2010
352
527
136
Have you been shown the funny?

(IYKYK)

But generally speaking on the topic of overclocking, trust me when I say shader clocks absolutely won't be limiting factor. Power budget or the other silicon in there absolutely will be.
Will whatever you're hinting at be disclosed tomorrow, do you think?
 

Josh128

Senior member
Oct 14, 2022
706
1,228
106
If the person I talked to doesn't include it in their review, it should be perfectly fine for me to post the screenshot here anyway methinks. It's something anyone could try on N48.
Because there'll likely be like 50+ reviews tomorrow, and I'll only be skimming through a few, and if video reviews, skimming through them, please come back and post what you are talking about specifically because theres a 99% chance it will be missed.
 

sandorski

No Lifer
Oct 10, 1999
70,595
6,142
126
This could be the magic moment, AMD winning the GPU Generation by Volume alone. RDNA5 may be the next part of the combo that puts them on par with NVidia unlike we have ever seen.
 

linkgoron

Platinum Member
Mar 9, 2005
2,552
1,215
136
This could be the magic moment, AMD winning the GPU Generation by Volume alone. RDNA5 may be the next part of the combo that puts them on par with NVidia unlike we have ever seen.
AMD missing this huge opportunity by canning the big RDNA4 is nuts. RDNA4 looks like an actually successful, competitive, good arch with great engineering and they just squandered this huge opportunity. Who knows if RDNA5 will have the same chance vs Rubin or whatever the next NV arch is.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |