Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

madtronik · Tuesday at 6:07 PM

blckgrffn said:
Right?

Honestly, that bundle doesn't even offend me. The RMx 750 is good kit and the right PSU for that GPU and given how hard it chases the $750 ($999) 5070 Ti, it just is what it is.

Yeah, if I intended to upgrade I would get that combo in a heartbeat. My current PSU couldn't handle a 9070 XT in any form and that one is the first PSU that would show up in my shortlist when looking for one. Overall great price.

GodisanAtheist · Tuesday at 6:57 PM

lucasworais said:
Still can’t believe AMD didn’t think of making something so people who bought the 7900 XT/XTX could upgrade. Truly amateurism s***

- It makes sense if you kind of dig into it. AMD has 10% marketshare, so they already don't have a lot of people with their cards, and then a very small % of people will buy a $800+ card. And then most of the people who do that will end up buying Nvidia anyway. So of their 10% marketshare, maybe 0.1% of people had 7900XTXs?

They have better sales numbers than us, but they probably figured "why go all in on a card we can only sell to maybe 100,000 people tops, more realistically 50,000 tops because a bunch of people are going to go buy Nvidia anyway".

AMD needs Nvidia owners to buy their cards, they don't need more AMD buyers. Hence the focus on features and trying to keep costs down with a performant mid-range chip that might appeal to all the xx60 owners out there.

The logic makes sense, it just chafes our balls because we like the narrative of two top end cards slugging it out even if it doesn't make any financial sense for one party.

adroc_thurston · Tuesday at 8:37 PM

lucasworais said:
Still can’t believe AMD didn’t think of making something so people who bought the 7900 XT/XTX could upgrade. Truly amateurism s***

They did.
It wasn't a thing you would be able to afford.

Saylick · Tuesday at 9:30 PM

I had a good laugh:

gdansk · Tuesday at 10:11 PM

When was excessive noise last a problem for Radeon?
The last I can recall was when the reference cards were blowers. It shows what decade that lunatic is stuck in.

Edit: I guess the stock 5700 XT also had a blower and that was only 5.5 years ago.

MrTeal · Tuesday at 10:35 PM

RX480 was pretty loud too, and the 290 was famously loud.

AMD really didn't get MBA right until RDNA2.

gdansk · Tuesday at 10:56 PM

MrTeal said:
RX480 was pretty loud too, and the 290 was famously loud.

AMD really didn't get MBA right until RDNA2.

Admittedly MBA hasn't been great for RDNA3 either so I can see why they cancelled it. Might actually help their perception.

BurnItDwn · 2025-03-12T00:49:29-0400

DownTheSky said:
I OCd a 9070 to within 3% of a 9070XT. Undervolting is crazy,

Any way to edit the BIOS/power limit yet?

Dont know about upping the power limit, but, im damn impressed by the 9070.
UPS dropped mine off today.
Not sure how stable this is yet, but it ran a decent 3dmark speedway of 6299.

400 max freq offset
-120 voltage offset
2640 max memory frequency w/ fast timing
and +10% power limit.

3DMark.com search

www.3dmark.com

Dont think I'll be able to get much higher. any higher memory clock and I've got to give it more voltage and it scores lower. I can set the max clocks higher, but, it doesnt really help anything.
hotspot is nice and cool, but vram temp does exceed 80 degrees.

I can run the memory at stock speed and go down to -125 or -130 voltage offset, but it scores lower. Anyhow, I've dropped it down to 350mhz offset / -110 voltage offset / 2630 max freq memory with 10% power limit. Hopefully it will be stable for general use, but I think I may need to be a bit more conservative when summer comes and it gets warmer in here.

Hans Gruber · 2025-03-12T01:34:08-0400

BurnItDwn said:
Dont know about upping the power limit, but, im damn impressed by the 9070.
UPS dropped mine off today.
Not sure how stable this is yet, but it ran a decent 3dmark speedway of 6299.

400 max freq offset
-120 voltage offset
2640 max memory frequency w/ fast timing
and +10% power limit.

3DMark.com search

3DMark.com search

www.3dmark.com

Going to try a few more runs with higher vram clocks since all the other higher scores seem to be running higher memory clocks and slightly lower max clocks

Seems like any less than 120mv offset for my card results in ... pretty quick instabillity, so whatever I wind up testing, i'll probably scale back a bit

EDIT: also I dont know what Im doing when it comes to modern overclocking.

You want to under volt a bit to see where the GPU becomes unstable. You want to OC the hell out of your GDDR6 memory. In my opinion that is where your performance increase will come from. The 9070 is where it's as for super power efficiency out of the box.

Joe NYC · 2025-03-12T01:52:32-0400

Win2012R2 said:
Because there is no shortage of N4 capacity and monolithic gaming GPUs don't even require fairly simple packaging used by EPYCs - this is THE time to regain market share, it may shape next 10 years big time, and will help MI series too.

This may be Taiwan earthquake related, when TSMC lost 10s of thousands of wafers, but last month or two, N5 and N3 nodes ran at full utilization.

deasd · 2025-03-12T02:05:02-0400

RDNA4 performs very well in latest build of Topaz Video AI. It's close to RTX4090 to say the least.

Video AI 6.1.X - User benchmarking Results

Topaz Video AI v6.1.2 System Information OS: Windows v11.24 CPU: AMD Ryzen 9 7950X3D 16-Core Processor 63.615 GB GPU: AMD Radeon RX 9070 XT 15.374 GB Processing Settings device: 0 vram: 0.99 instances: 1 Input Resolution: 1920x1080 Benchmark Results Artemis 1X: 35.32 fps 2X: 15.88 fps 4X: 04.40...

community.topazlabs.com

Topaz Video AI v6.1.2
System Information
OS: Windows v11.24
CPU: AMD Ryzen 9 7950X3D 16-Core Processor 63.615 GB
GPU: AMD Radeon RX 9070 XT 15.374 GB
Processing Settings
device: 0 vram: 0.99 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 35.32 fps 2X: 15.88 fps 4X: 04.40 fps
Iris 1X: 42.69 fps 2X: 19.51 fps 4X: 05.13 fps
Proteus 1X: 37.95 fps 2X: 19.49 fps 4X: 06.04 fps
Gaia 1X: 19.80 fps 2X: 12.42 fps 4X: 05.34 fps
Nyx 1X: 16.10 fps 2X: 12.77 fps
Nyx Fast 1X: 32.24 fps
Rhea 4X: 04.14 fps
RXL 4X: 04.01 fps
Hyperion HDR 1X: 29.17 fps
4X Slowmo Apollo: 35.71 fps APFast: 70.88 fps Chronos: 22.61 fps CHFast: 35.56 fps
16X Slowmo Aion: 53.67 fps

for comparison:

4090

Topaz Video AI v6.1.0
System Information
OS: Windows v11.24
CPU: Intel(R) Core(TM) i9-14900KF 63.725 GB
GPU: NVIDIA GeForce RTX 4090 23.576 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 37.49 fps 2X: 17.40 fps 4X: 04.69 fps
Iris 1X: 37.54 fps 2X: 21.42 fps 4X: 05.46 fps
Proteus 1X: 41.45 fps 2X: 20.63 fps 4X: 05.79 fps
Gaia 1X: 15.52 fps 2X: 10.77 fps 4X: 05.33 fps
Nyx 1X: 17.25 fps 2X: 14.35 fps
Nyx Fast 1X: 30.76 fps
Rhea 4X: 05.31 fps
RXL 4X: 04.78 fps
Hyperion HDR 1X: 27.22 fps
4X Slowmo Apollo: 42.02 fps APFast: 75.30 fps Chronos: 32.57 fps CHFast: 36.62 fps
16X Slowmo Aion: 34.98 fps

5090

Topaz Video AI Alpha v6.1.1.1.a.trt
System Information
OS: Windows v10.22
CPU: AMD Ryzen 9 7950X 16-Core Processor 31.71 GB
GPU: NVIDIA GeForce RTX 5090 31.349 GB
Processing Settings
device: 0 vram: 1 instances: 0
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 50.65 fps 2X: 21.41 fps 4X: 05.78 fps
Iris 1X: 51.28 fps 2X: 21.92 fps 4X: 06.05 fps
Proteus 1X: 52.88 fps 2X: 22.94 fps 4X: 06.12 fps
Gaia 1X: 18.89 fps 2X: 13.77 fps 4X: 05.70 fps
Nyx 1X: 18.29 fps 2X: 12.39 fps
Nyx Fast 1X: 43.99 fps
Rhea 4X: 05.78 fps
RXL 4X: 05.76 fps
Hyperion HDR 1X: 45.23 fps
4X Slowmo Apollo: 48.47 fps APFast: 76.59 fps Chronos: 46.67 fps CHFast: 50.36 fps
16X Slowmo Aion: 43.08 fps

Win2012R2 · 2025-03-12T05:07:40-0400

blckgrffn said:
Yeah but... carrying inventory is expensive AF.

They don't need to carry it - just keep producing for longer, and RDNA5 is nice latest reassuringly expensive stuff on N2, this should allow to drop prices a tad over time also - if 9070 level card became mass base then it would help push PC gaming very nicely.

Vikv1918 · 2025-03-12T05:28:18-0400

I found there are a lot of insiders and knowledgeable people here, so I made an account here to ask your thoughts on the state of "path tracing" in RDNA GPUs. I use "path tracing" in quotes to refer mainly to the four nvidia optimized "path traced" AAA games (CP2077, AW2, BM:W and Indiana Jones).

In these games the Radeon GPUs usually get only 30-50% of the performance of their equivalent raster-perf nvidia GPUs. In your opinion, what fraction of this performance gap is due to true hardware superiority of nvidia, and what percentage is due to vendor specific optimizations?

Win2012R2 · 2025-03-12T05:35:41-0400

Vikv1918 said:
In your opinion, what fraction of this performance gap is due to true hardware superiority of nvidia, and what percentage is due to vendor specific optimizations?

Not an insider, just IMHO - any game marketing itself as "path tracing" is very very heavily NVIDIA ~~sponsored~~ optimised, most likely using NV driver extensions rather than standard DXR, on top of that biggest Nvidia cards obviously got more cores.

P.S. I am a lifelong Nvidia user.

darkswordsman17 · 2025-03-12T06:28:17-0400

AMD should just make a single GPU that's a full wafer. Every gen, a single one. Auction it off. Claim the performance crown, and then they'll sell infinite GPUs without having to actually produce and sell it.

Timorous said:
This.

If AMD had bothered to aim for the top they would have won 5 gens on the trot with 500mm ish designs.

RV770 was ~250mm in the 4870 , 500mm on the same design would have been a 1600 shader monster and given us close to 5870 performance way earlier, it would have crushed GT200 entirely.

Cypress was ~330mm in the 5870, a 500mm design would have been a 2400 shader monster and would have handily bested the GTX 480 and probably gone toe to toe with the GTX 580.

Caymen was also ~330mm so a 500mm design would also have been a 2400 or whatever shader monster that would have been faster than the GTX 580.

Tahiti in the 7970 was also around the 330mm mark so again a 500mm design would have been faster than Hawaii was 2 years earlier, that would have been faster than the GTX 680 and it would have been faster than the GTX 780Ti and the GTX Titan. It probably would have forced NV to release a GK100 based 680 rather than allowing them to go with GK104 and then the refresh GK110 in the 780 would have been a very minor bump.

Hawaii was closer to 450mm so there was room for a 56CU design rather than the 44 they went with. Give it 7gbps ram rather than the 5gbps they did give it and you also have enough bandwidth without needing HBM. That gives you pretty close to the cut down Fury performance years earlier. It would have been faster than the 980 but behind the 980Ti although it probably would have released around the time of the 780Ti so would have been a clear step ahead.

Maybe the extra income from having top tier parts and the extra experience with larger designs would have meant Fiji was actually better than it ended up being.

I know we complain about NV shrinkflation of putting smaller GPUs further up the stack but ATi / AMD actually started that trend and allowed NV to use 330mm dies in the x80 range instead of their older designs that used 500mm designs in that segment.

EDIT: I credit these decisions with where we are today in the GPU market, it allowed NV to become the defacto performance champion and grow the mindshare that comes with it and then they started to do what they did with CUDA and create a walled garden, g-sync, Ray Tracing, DLSS, NVenc etc etc are all just features that allow them to have none hardware reasons to buy their product, it gives them an amount of protection should they misfire which they clearly are with Blackwell but give it 6 months when the supply issues are behind us and I don't think the successful 9000 series launch will make that much difference, primarily because there is no way AMD are going to shift away from their product allocations so they won't try and fill in the gap in the market with extra product.

AMD could've kept the small GPU and battled with NVidia on the high end if they'd kept pushing Crossfire. Most of the typical arguments that could've been used against it would've been nonsense with how Nvidia has gone, with huge cards and high power use (so, 2 2 slot cards vs 1 4 slot, power use, 2 225W vs 1 450W, etc). Plus if they'd design the cards for tandem use (design it for a fan perpindicular to the board to push air out of the case, move the the I/O ports freeing up things for air exhaust), and it could've helped them make the argument for a home server combining Threadripper or EPYC with a dedicated card for each family member; that also could've helped sell gaming handhelds if you were rendering there but streaming to the handheld). Keep focusing on areas where that makes sense, that would be tangible things for gamers (multi-monitor sim setups, VR where they could do per eye rendering, ultrawidescreen). I think many of the things they've done building towards chiplets could've been being worked on that way as well, meaning they could've gone chiplets but fallen back to Crossfire support until they got it working and then FineWine all over agains as older cards should see a boost in both performance and compatibility once they are able to make mGPU appear as one. With it, they could've combined what they achieved with their streamlined stack, but also go higher. I think this likely would've helped them in mining boom and busts as well, as when mining is booming, well they likely had planned for lots of chips. When it busts, gamers that maybe had to settle for one card could add another for cheap and add performance. It also would help with the longer time between generations.

Win2012R2 said:
Not an insider, just IMHO - any game marketing itself as "path tracing" is very very heavily NVIDIA ~~sponsored~~ optimised, most likely using NV driver extensions rather than standard DXR, on top of that biggest Nvidia cards obviously got more cores.

I have a strong hunch this is not a good faith actor, and most likely is a bot, troll-mult, or RBM, maybe all 3 in one even.

Gideon · 2025-03-12T06:39:04-0400

Vikv1918 said:
I found there are a lot of insiders and knowledgeable people here, so I made an account here to ask your thoughts on the state of "path tracing" in RDNA GPUs. I use "path tracing" in quotes to refer mainly to the four nvidia optimized "path traced" AAA games (CP2077, AW2, BM:W and Indiana Jones).

Not an insider, but there is public info out there. Just look at the credits at the end of the games. For instance in CP2077 there are hundreds of Nvidia employees listed, but only one from AMD. This is also partially to blame on AMDs for not offering developers to game-studios for free to bolt on Radeon features, like Nvidia does massively.

All of these games have had massive developer input from Nvidia (Remedy and CD Project Red have been particularily close to them).

Now I'm pretty sure, noone is actively sabotaging AMDs GPUs (e.g. making them do more work), but rather all the development has been done optimizing to Nvidia. Which also makes sense, as they were the only game in town capable of running them.

The closest thing to sabotage is probably just setting the default (and only) settings brutal enough that not competitor can handle them. Case in point is CP2077 where path tracing traces 2 rays per pixel. There are mods out there that allow setting it to 1, which offers massive performance uplifts on older GPUs with minimal quality drops:

But obviously the do not offer "Path Tracing: Low" option, as that would make it actually runnable (with FSR) on RX 7xxx serries cards (though barely)

Now would it mean RX9070 would be just as fast when optimized to the same degree? Most probably not quite. But I'm pretty sure it would be much closer.

DAPUNISHER · 2025-03-12T08:32:52-0400

Yeah, Optiscaler has the 9070XT doing much better in 2077.

Vikv1918 · 2025-03-12T08:43:16-0400

Gideon said:
Not an insider, but there is public info out there. Just look at the credits at the end of the games. For instance in CP2077 there are hundreds of Nvidia employees listed, but only one from AMD. This is also partially to blame on AMDs for not offering developers to game-studios for free to bolt on Radeon features, like Nvidia does massively.

All of these games have had massive developer input from Nvidia (Remedy and CD Project Red have been particularily close to them).

Now I'm pretty sure, noone is actively sabotaging AMDs GPUs (e.g. making them do more work), but rather all the development has been done optimizing to Nvidia. Which also makes sense, as they were the only game in town capable of running them.

The closest thing to sabotage is probably just setting the default (and only) settings brutal enough that not competitor can handle them. Case in point is CP2077 where path tracing traces 2 rays per pixel. There are mods out there that allow setting it to 1, which offers massive performance uplifts on older GPUs with minimal quality drops:

But obviously the do not offer "Path Tracing: Low" option, as that would make it actually runnable (with FSR) on RX 7xxx serries cards (though barely)

Now would it mean RX9070 would be just as fast when optimized to the same degree? Most probably not quite. But I'm pretty sure it would be much closer.

I fully agree with this, but I think what I'm trying to understand is really the nature of the optimizations. Like for example, does Black Myth:Wukong send the same instructions to both AMD and nvidia GPUs when they're doing full ray tracing? Or are the instructions different? If they're the same, does the nature of those instructions have any impact on performance? Like do AMD GPUs have a harder time executing those specific instructions because they're designed to be more easily executed on nvidia etc etc.

One other thing I find curious is how not a single developer is stepping forward and implementing path-tracing on their own. Like path-tracing is supposed to be much easier than raster lighting to implement, so why is it only the nvidia sponsored games that have done it so far? What do devs have to lose by implementing a simple unbiased path tracer and letting both vendor hardware handle it on their own?

Another final question is, is there any kind of demo that tests the true "raw" ray tracing performance to compare AMD and nvidia. I think 3dmark Speedway or Port Royal is a bad example because the 7900 XTX scores almost the same or even higher than 9070 XT, which is not indicative of the RT performance of these GPUs in actual games. And Blender is again heavily nvidia optimized so its not a good example either.

Chips and cheese has done what I think is the best technical analysis of ray tracing in AMD GPUs but the final comparison with nvidia was inconclusive due to the black box nature of nvidia GPU analyzer software

basix · 2025-03-12T09:19:02-0400

Why nobody does PT if not sponsored by Nvidia? It is just too expensive to run. To few people could run the game. So why put in the effort?

If you would have a more scalable PT approach (e.g. adjusting between 1/4...1 ray/sample per pixel) it might become more viable. But afaik ReSTIR is limited to ~1 sample per pixel. At least currently.

SolidQ · 2025-03-12T10:55:41-0400

basix said:
Why nobody does PT if not sponsored by Nvidia?

AMD will do

basix · 2025-03-12T12:26:42-0400

...in the future

As long as there are not more GPUs with more performance or a more scalable path tracing solution, adoption will be slow. Game developers prioritize wide usability over maximum graphical fidelity.

But AMD could improve that:
- Provide something like Ray Reconstruction as an open solution to GPUOpen. RR improves quality and provides an additional scalability lever (less samples-per-pixel rendered, but compensated to a good degree by RR)
- Release a scalable PT GI solution akin to ReSTIR, but improved (e.g. Area ReSTIR + ShaRC + scalable samples-per-pixel). Successor to their GI-1.0-Sample or Brixelizer GI (see GPUOpen)

ReSTIR is scalable today. But only 1...N samples-per-pixel and not less. At least in their RTXGI git sample it is like that.

gdansk · 2025-03-12T12:42:23-0400

Personally, I don't understand why one would allow path-tracing to influence current purchasing decisions.
It reduces $1600 GPUs (RTX 5080, RTX 4090) to sub 60 FPS at 1080p. Even if you're not using a poverty GPU you get a poverty GPU experience.

That leaves you with only one GPU to buy which allows a playable experience at an absurd price and power consumption. If you really want to use path-tracing you will need to buy a new GPU in a generation or two anyway.

SolidQ · 2025-03-12T13:10:57-0400

gdansk said:
Personally, I don't understand why one would allow path-tracing to influence current purchasing decisions.

There is people who get orgasm from "better" lightning. Personally i'm more care about design/Art, that what make game beautiful, not just RT or PT.

Art just beautiful and no things like RT/PT

Win2012R2 · 2025-03-12T13:11:01-0400

gdansk said:
That leaves you with only one GPU to buy

4090 had great sales and 5090 would have had too had they hit at least +50% real perf, and more than that in ray tracing.

basix · 2025-03-12T14:17:06-0400

gdansk said:
Personally, I don't understand why one would allow path-tracing to influence current purchasing decisions.
It reduces $1600 GPUs (RTX 5080, RTX 4090) to sub 60 FPS at 1080p. Even if you're not using a poverty GPU you get a poverty GPU experience.

That leaves you with only one GPU to buy which allows a playable experience at an absurd price and power consumption. If you really want to use path-tracing you will need to buy a new GPU in a generation or two anyway.

That's why I am mentioning a more scalable solution, primarly down-scaling from the current PT implementation so that more GPUs can actually run it at decent framerates.

I would also not base a GPU buying decision on PT performance. SW and algorithms can improve that performance much more than HW TFLOPS. Unreal Engine 5 is an example in that direction. Not actually path tracing but coming closer to it and at the same time quite good performance.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Junior Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Platinum Member

Senior member

Senior member

Junior Member

Senior member

Lifer

Golden Member

Super Moderator CPU Forum Mod and Elite Member

Junior Member

Member

Golden Member

Member

Diamond Member

Golden Member

Senior member

Member