Discussion RDNA4 + CDNA3 Architectures Thread

Page 409 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,769
6,709
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

madtronik

Junior Member
Jul 22, 2019
5
10
81
Right?

Honestly, that bundle doesn't even offend me. The RMx 750 is good kit and the right PSU for that GPU and given how hard it chases the $750 ($999) 5070 Ti, it just is what it is.
Yeah, if I intended to upgrade I would get that combo in a heartbeat. My current PSU couldn't handle a 9070 XT in any form and that one is the first PSU that would show up in my shortlist when looking for one. Overall great price.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,725
8,708
136
Still can’t believe AMD didn’t think of making something so people who bought the 7900 XT/XTX could upgrade. Truly amateurism s***

- It makes sense if you kind of dig into it. AMD has 10% marketshare, so they already don't have a lot of people with their cards, and then a very small % of people will buy a $800+ card. And then most of the people who do that will end up buying Nvidia anyway. So of their 10% marketshare, maybe 0.1% of people had 7900XTXs?

They have better sales numbers than us, but they probably figured "why go all in on a card we can only sell to maybe 100,000 people tops, more realistically 50,000 tops because a bunch of people are going to go buy Nvidia anyway".

AMD needs Nvidia owners to buy their cards, they don't need more AMD buyers. Hence the focus on features and trying to keep costs down with a performant mid-range chip that might appeal to all the xx60 owners out there.

The logic makes sense, it just chafes our balls because we like the narrative of two top end cards slugging it out even if it doesn't make any financial sense for one party.
 

gdansk

Diamond Member
Feb 8, 2011
4,001
6,545
136
When was excessive noise last a problem for Radeon?
The last I can recall was when the reference cards were blowers. It shows what decade that lunatic is stuck in.

Edit: I guess the stock 5700 XT also had a blower and that was only 5.5 years ago.
 

BurnItDwn

Lifer
Oct 10, 1999
26,256
1,761
126
I OCd a 9070 to within 3% of a 9070XT. Undervolting is crazy,

Any way to edit the BIOS/power limit yet?

Dont know about upping the power limit, but, im damn impressed by the 9070.
UPS dropped mine off today.
Not sure how stable this is yet, but it ran a decent 3dmark speedway of 6299.


400 max freq offset
-120 voltage offset
2640 max memory frequency w/ fast timing
and +10% power limit.



Dont think I'll be able to get much higher. any higher memory clock and I've got to give it more voltage and it scores lower. I can set the max clocks higher, but, it doesnt really help anything.
hotspot is nice and cool, but vram temp does exceed 80 degrees.

I can run the memory at stock speed and go down to -125 or -130 voltage offset, but it scores lower. Anyhow, I've dropped it down to 350mhz offset / -110 voltage offset / 2630 max freq memory with 10% power limit. Hopefully it will be stable for general use, but I think I may need to be a bit more conservative when summer comes and it gets warmer in here.
 
Last edited:

Hans Gruber

Platinum Member
Dec 23, 2006
2,480
1,328
136
Dont know about upping the power limit, but, im damn impressed by the 9070.
UPS dropped mine off today.
Not sure how stable this is yet, but it ran a decent 3dmark speedway of 6299.


400 max freq offset
-120 voltage offset
2640 max memory frequency w/ fast timing
and +10% power limit.



Going to try a few more runs with higher vram clocks since all the other higher scores seem to be running higher memory clocks and slightly lower max clocks

Seems like any less than 120mv offset for my card results in ... pretty quick instabillity, so whatever I wind up testing, i'll probably scale back a bit

EDIT: also I dont know what Im doing when it comes to modern overclocking.
You want to under volt a bit to see where the GPU becomes unstable. You want to OC the hell out of your GDDR6 memory. In my opinion that is where your performance increase will come from. The 9070 is where it's as for super power efficiency out of the box.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,906
4,270
106
Because there is no shortage of N4 capacity and monolithic gaming GPUs don't even require fairly simple packaging used by EPYCs - this is THE time to regain market share, it may shape next 10 years big time, and will help MI series too.

This may be Taiwan earthquake related, when TSMC lost 10s of thousands of wafers, but last month or two, N5 and N3 nodes ran at full utilization.
 

deasd

Senior member
Dec 31, 2013
592
1,002
136
RDNA4 performs very well in latest build of Topaz Video AI. It's close to RTX4090 to say the least.


Topaz Video AI v6.1.2
System Information
OS: Windows v11.24
CPU: AMD Ryzen 9 7950X3D 16-Core Processor 63.615 GB
GPU: AMD Radeon RX 9070 XT 15.374 GB
Processing Settings
device: 0 vram: 0.99 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 35.32 fps 2X: 15.88 fps 4X: 04.40 fps
Iris 1X: 42.69 fps 2X: 19.51 fps 4X: 05.13 fps
Proteus 1X: 37.95 fps 2X: 19.49 fps 4X: 06.04 fps
Gaia 1X: 19.80 fps 2X: 12.42 fps 4X: 05.34 fps
Nyx 1X: 16.10 fps 2X: 12.77 fps
Nyx Fast 1X: 32.24 fps
Rhea 4X: 04.14 fps
RXL 4X: 04.01 fps
Hyperion HDR 1X: 29.17 fps
4X Slowmo Apollo: 35.71 fps APFast: 70.88 fps Chronos: 22.61 fps CHFast: 35.56 fps
16X Slowmo Aion: 53.67 fps


for comparison:

4090
Topaz Video AI v6.1.0
System Information
OS: Windows v11.24
CPU: Intel(R) Core(TM) i9-14900KF 63.725 GB
GPU: NVIDIA GeForce RTX 4090 23.576 GB
Processing Settings
device: 0 vram: 1 instances: 1
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 37.49 fps 2X: 17.40 fps 4X: 04.69 fps
Iris 1X: 37.54 fps 2X: 21.42 fps 4X: 05.46 fps
Proteus 1X: 41.45 fps 2X: 20.63 fps 4X: 05.79 fps
Gaia 1X: 15.52 fps 2X: 10.77 fps 4X: 05.33 fps
Nyx 1X: 17.25 fps 2X: 14.35 fps
Nyx Fast 1X: 30.76 fps
Rhea 4X: 05.31 fps
RXL 4X: 04.78 fps
Hyperion HDR 1X: 27.22 fps
4X Slowmo Apollo: 42.02 fps APFast: 75.30 fps Chronos: 32.57 fps CHFast: 36.62 fps
16X Slowmo Aion: 34.98 fps

5090
Topaz Video AI Alpha v6.1.1.1.a.trt
System Information
OS: Windows v10.22
CPU: AMD Ryzen 9 7950X 16-Core Processor 31.71 GB
GPU: NVIDIA GeForce RTX 5090 31.349 GB
Processing Settings
device: 0 vram: 1 instances: 0
Input Resolution: 1920x1080
Benchmark Results
Artemis 1X: 50.65 fps 2X: 21.41 fps 4X: 05.78 fps
Iris 1X: 51.28 fps 2X: 21.92 fps 4X: 06.05 fps
Proteus 1X: 52.88 fps 2X: 22.94 fps 4X: 06.12 fps
Gaia 1X: 18.89 fps 2X: 13.77 fps 4X: 05.70 fps
Nyx 1X: 18.29 fps 2X: 12.39 fps
Nyx Fast 1X: 43.99 fps
Rhea 4X: 05.78 fps
RXL 4X: 05.76 fps
Hyperion HDR 1X: 45.23 fps
4X Slowmo Apollo: 48.47 fps APFast: 76.59 fps Chronos: 46.67 fps CHFast: 50.36 fps
16X Slowmo Aion: 43.08 fps
 
Last edited:

Vikv1918

Junior Member
Mar 12, 2025
2
2
36
I found there are a lot of insiders and knowledgeable people here, so I made an account here to ask your thoughts on the state of "path tracing" in RDNA GPUs. I use "path tracing" in quotes to refer mainly to the four nvidia optimized "path traced" AAA games (CP2077, AW2, BM:W and Indiana Jones).

In these games the Radeon GPUs usually get only 30-50% of the performance of their equivalent raster-perf nvidia GPUs. In your opinion, what fraction of this performance gap is due to true hardware superiority of nvidia, and what percentage is due to vendor specific optimizations?
 
Reactions: scineram

Win2012R2

Senior member
Dec 5, 2024
741
740
96
In your opinion, what fraction of this performance gap is due to true hardware superiority of nvidia, and what percentage is due to vendor specific optimizations?
Not an insider, just IMHO - any game marketing itself as "path tracing" is very very heavily NVIDIA sponsored optimised, most likely using NV driver extensions rather than standard DXR, on top of that biggest Nvidia cards obviously got more cores.

P.S. I am a lifelong Nvidia user.
 
Last edited:
Mar 11, 2004
23,432
5,832
146
AMD should just make a single GPU that's a full wafer. Every gen, a single one. Auction it off. Claim the performance crown, and then they'll sell infinite GPUs without having to actually produce and sell it.

This.

If AMD had bothered to aim for the top they would have won 5 gens on the trot with 500mm ish designs.

RV770 was ~250mm in the 4870 , 500mm on the same design would have been a 1600 shader monster and given us close to 5870 performance way earlier, it would have crushed GT200 entirely.

Cypress was ~330mm in the 5870, a 500mm design would have been a 2400 shader monster and would have handily bested the GTX 480 and probably gone toe to toe with the GTX 580.

Caymen was also ~330mm so a 500mm design would also have been a 2400 or whatever shader monster that would have been faster than the GTX 580.

Tahiti in the 7970 was also around the 330mm mark so again a 500mm design would have been faster than Hawaii was 2 years earlier, that would have been faster than the GTX 680 and it would have been faster than the GTX 780Ti and the GTX Titan. It probably would have forced NV to release a GK100 based 680 rather than allowing them to go with GK104 and then the refresh GK110 in the 780 would have been a very minor bump.

Hawaii was closer to 450mm so there was room for a 56CU design rather than the 44 they went with. Give it 7gbps ram rather than the 5gbps they did give it and you also have enough bandwidth without needing HBM. That gives you pretty close to the cut down Fury performance years earlier. It would have been faster than the 980 but behind the 980Ti although it probably would have released around the time of the 780Ti so would have been a clear step ahead.

Maybe the extra income from having top tier parts and the extra experience with larger designs would have meant Fiji was actually better than it ended up being.

I know we complain about NV shrinkflation of putting smaller GPUs further up the stack but ATi / AMD actually started that trend and allowed NV to use 330mm dies in the x80 range instead of their older designs that used 500mm designs in that segment.

EDIT: I credit these decisions with where we are today in the GPU market, it allowed NV to become the defacto performance champion and grow the mindshare that comes with it and then they started to do what they did with CUDA and create a walled garden, g-sync, Ray Tracing, DLSS, NVenc etc etc are all just features that allow them to have none hardware reasons to buy their product, it gives them an amount of protection should they misfire which they clearly are with Blackwell but give it 6 months when the supply issues are behind us and I don't think the successful 9000 series launch will make that much difference, primarily because there is no way AMD are going to shift away from their product allocations so they won't try and fill in the gap in the market with extra product.

AMD could've kept the small GPU and battled with NVidia on the high end if they'd kept pushing Crossfire. Most of the typical arguments that could've been used against it would've been nonsense with how Nvidia has gone, with huge cards and high power use (so, 2 2 slot cards vs 1 4 slot, power use, 2 225W vs 1 450W, etc). Plus if they'd design the cards for tandem use (design it for a fan perpindicular to the board to push air out of the case, move the the I/O ports freeing up things for air exhaust), and it could've helped them make the argument for a home server combining Threadripper or EPYC with a dedicated card for each family member; that also could've helped sell gaming handhelds if you were rendering there but streaming to the handheld). Keep focusing on areas where that makes sense, that would be tangible things for gamers (multi-monitor sim setups, VR where they could do per eye rendering, ultrawidescreen). I think many of the things they've done building towards chiplets could've been being worked on that way as well, meaning they could've gone chiplets but fallen back to Crossfire support until they got it working and then FineWine all over agains as older cards should see a boost in both performance and compatibility once they are able to make mGPU appear as one. With it, they could've combined what they achieved with their streamlined stack, but also go higher. I think this likely would've helped them in mining boom and busts as well, as when mining is booming, well they likely had planned for lots of chips. When it busts, gamers that maybe had to settle for one card could add another for cheap and add performance. It also would help with the longer time between generations.

Not an insider, just IMHO - any game marketing itself as "path tracing" is very very heavily NVIDIA sponsored optimised, most likely using NV driver extensions rather than standard DXR, on top of that biggest Nvidia cards obviously got more cores.

I have a strong hunch this is not a good faith actor, and most likely is a bot, troll-mult, or RBM, maybe all 3 in one even.
 
Reactions: lightmanek

Gideon

Golden Member
Nov 27, 2007
1,964
4,802
136
I found there are a lot of insiders and knowledgeable people here, so I made an account here to ask your thoughts on the state of "path tracing" in RDNA GPUs. I use "path tracing" in quotes to refer mainly to the four nvidia optimized "path traced" AAA games (CP2077, AW2, BM:W and Indiana Jones).

Not an insider, but there is public info out there. Just look at the credits at the end of the games. For instance in CP2077 there are hundreds of Nvidia employees listed, but only one from AMD. This is also partially to blame on AMDs for not offering developers to game-studios for free to bolt on Radeon features, like Nvidia does massively.

All of these games have had massive developer input from Nvidia (Remedy and CD Project Red have been particularily close to them).

Now I'm pretty sure, noone is actively sabotaging AMDs GPUs (e.g. making them do more work), but rather all the development has been done optimizing to Nvidia. Which also makes sense, as they were the only game in town capable of running them.

The closest thing to sabotage is probably just setting the default (and only) settings brutal enough that not competitor can handle them. Case in point is CP2077 where path tracing traces 2 rays per pixel. There are mods out there that allow setting it to 1, which offers massive performance uplifts on older GPUs with minimal quality drops:


But obviously the do not offer "Path Tracing: Low" option, as that would make it actually runnable (with FSR) on RX 7xxx serries cards (though barely)

Now would it mean RX9070 would be just as fast when optimized to the same degree? Most probably not quite. But I'm pretty sure it would be much closer.
 

Vikv1918

Junior Member
Mar 12, 2025
2
2
36
Not an insider, but there is public info out there. Just look at the credits at the end of the games. For instance in CP2077 there are hundreds of Nvidia employees listed, but only one from AMD. This is also partially to blame on AMDs for not offering developers to game-studios for free to bolt on Radeon features, like Nvidia does massively.

All of these games have had massive developer input from Nvidia (Remedy and CD Project Red have been particularily close to them).

Now I'm pretty sure, noone is actively sabotaging AMDs GPUs (e.g. making them do more work), but rather all the development has been done optimizing to Nvidia. Which also makes sense, as they were the only game in town capable of running them.

The closest thing to sabotage is probably just setting the default (and only) settings brutal enough that not competitor can handle them. Case in point is CP2077 where path tracing traces 2 rays per pixel. There are mods out there that allow setting it to 1, which offers massive performance uplifts on older GPUs with minimal quality drops:


But obviously the do not offer "Path Tracing: Low" option, as that would make it actually runnable (with FSR) on RX 7xxx serries cards (though barely)

Now would it mean RX9070 would be just as fast when optimized to the same degree? Most probably not quite. But I'm pretty sure it would be much closer.
I fully agree with this, but I think what I'm trying to understand is really the nature of the optimizations. Like for example, does Black Myth:Wukong send the same instructions to both AMD and nvidia GPUs when they're doing full ray tracing? Or are the instructions different? If they're the same, does the nature of those instructions have any impact on performance? Like do AMD GPUs have a harder time executing those specific instructions because they're designed to be more easily executed on nvidia etc etc.

One other thing I find curious is how not a single developer is stepping forward and implementing path-tracing on their own. Like path-tracing is supposed to be much easier than raster lighting to implement, so why is it only the nvidia sponsored games that have done it so far? What do devs have to lose by implementing a simple unbiased path tracer and letting both vendor hardware handle it on their own?

Another final question is, is there any kind of demo that tests the true "raw" ray tracing performance to compare AMD and nvidia. I think 3dmark Speedway or Port Royal is a bad example because the 7900 XTX scores almost the same or even higher than 9070 XT, which is not indicative of the RT performance of these GPUs in actual games. And Blender is again heavily nvidia optimized so its not a good example either.

Chips and cheese has done what I think is the best technical analysis of ray tracing in AMD GPUs but the final comparison with nvidia was inconclusive due to the black box nature of nvidia GPU analyzer software
 
Reactions: scineram

basix

Member
Oct 4, 2024
77
148
66
Why nobody does PT if not sponsored by Nvidia? It is just too expensive to run. To few people could run the game. So why put in the effort?

If you would have a more scalable PT approach (e.g. adjusting between 1/4...1 ray/sample per pixel) it might become more viable. But afaik ReSTIR is limited to ~1 sample per pixel. At least currently.
 

basix

Member
Oct 4, 2024
77
148
66
...in the future

As long as there are not more GPUs with more performance or a more scalable path tracing solution, adoption will be slow. Game developers prioritize wide usability over maximum graphical fidelity.

But AMD could improve that:
- Provide something like Ray Reconstruction as an open solution to GPUOpen. RR improves quality and provides an additional scalability lever (less samples-per-pixel rendered, but compensated to a good degree by RR)
- Release a scalable PT GI solution akin to ReSTIR, but improved (e.g. Area ReSTIR + ShaRC + scalable samples-per-pixel). Successor to their GI-1.0-Sample or Brixelizer GI (see GPUOpen)

ReSTIR is scalable today. But only 1...N samples-per-pixel and not less. At least in their RTXGI git sample it is like that.
 

gdansk

Diamond Member
Feb 8, 2011
4,001
6,545
136
Personally, I don't understand why one would allow path-tracing to influence current purchasing decisions.
It reduces $1600 GPUs (RTX 5080, RTX 4090) to sub 60 FPS at 1080p. Even if you're not using a poverty GPU you get a poverty GPU experience.

That leaves you with only one GPU to buy which allows a playable experience at an absurd price and power consumption. If you really want to use path-tracing you will need to buy a new GPU in a generation or two anyway.
 

basix

Member
Oct 4, 2024
77
148
66
Personally, I don't understand why one would allow path-tracing to influence current purchasing decisions.
It reduces $1600 GPUs (RTX 5080, RTX 4090) to sub 60 FPS at 1080p. Even if you're not using a poverty GPU you get a poverty GPU experience.

That leaves you with only one GPU to buy which allows a playable experience at an absurd price and power consumption. If you really want to use path-tracing you will need to buy a new GPU in a generation or two anyway.

That's why I am mentioning a more scalable solution, primarly down-scaling from the current PT implementation so that more GPUs can actually run it at decent framerates.

I would also not base a GPU buying decision on PT performance. SW and algorithms can improve that performance much more than HW TFLOPS. Unreal Engine 5 is an example in that direction. Not actually path tracing but coming closer to it and at the same time quite good performance.
 
Reactions: Tlh97 and Saylick
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |