Discussion RDNA4 + CDNA3 Architectures Thread

Page 178 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,749
6,614
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,323
2,599
136
big RT up and FSR4 seems confirmed
FSR4 was already confirmed from another AMD employee, and now the GPUOpen blog article has given some technical detail on it prior to release.

The "new AI capabilities" is probably referring to new instructions or data types that are targeted for AI/ML compute.
 

Kepler_L2

Senior member
Sep 6, 2020
581
2,401
136
FSR4 was already confirmed from another AMD employee, and now the GPUOpen blog article has given some technical detail on it prior to release.

The "new AI capabilities" is probably referring to new instructions or data types that are targeted for AI/ML compute.
FP8/BF8 support, sparsity support, 2x DP4a/WMMA rates per cycle.
 

SolidQ

Senior member
Jul 13, 2023
593
747
96
The only other major feature is something called "WGP take-over mode" which changes the scheduling model to improve data locality and reduce global data sharing.
That would be interesting compare to RDNA 3/3.5 wgp

Outside of RT/AI stuff there hasn't been too many changes.
would be interesting compare PS5pro RDNA4 RT and Desktop RDNA4 RT per WGP
 

ToTTenTranz

Member
Feb 4, 2021
182
312
106

Interesting, Sony is quoting the PS5 Pro's TF at 16.7, which suggests that Sony either isn't going to support the dual issue or they think it's misleading. Power supply has been increased by 40 W compared to the OG PS5 which isn't a lot.

Also there's a teardown if you are into that.

Those sound like base clocks without boost.
Also, the N4 SoC is apparently smaller than the original N7 one. There's 2GB DDR5 compared to 512MB DDR4 on the 2020 console, which is how Sony released a larger part of the main GDDR6 pool for developers (a similar method to what they did for the PS4 Pro).

As for the dual-issue, my guess is Sony doesn't want to claim 33.4 TFLOPs for just a small subset of supported instructions. If for most loads the theoretical max is 16.7 TFLOPs, that's how many they want to declare.
 

Gideon

Golden Member
Nov 27, 2007
1,842
4,379
136
SolidQ posted it a few days ago...

That looks more like the Nvidia Ray Reconstruciton analog than just DLSS. TBF the stills look AMAZING

Just take a look at the "Restaurant" sign on this uncompressed 17MB image:


Honestly getting from this input 1080p noisy image:


TO this 4K image:



Seems almost outlandish to me, particularily as they claim this is "real real-time". I understand this is their favored rather-static test scene, they use previous frame info, lots of other buffers, whatnot. But the stills look unbelievably good. At least compared to what i've seen in Ray Reconstruction (1440p DLSS Quality) cook up in Alan Wake 2 and Cyberpunk 2077, upscaling from a similar base resolution.

The latter tends to blur and create those "artsy" almos painting-like look for small detail on similar upscale levels.

I'll try to not get my hopes up, as this is an AMD software solution we're talking about after all ...

But I sure would like to see it in motion, in actual games. This looks highly promising!
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,323
2,599
136
That looks more like the Nvidia Ray Reconstruciton analog than just DLSS. TBF the stills look AMAZING

Just take a look at the "Restaurant" sign on this uncompressed 17MB image:


Honestly getting from this input 1080p noisy image:


TO this 4K image:



Seems almost outlandish to me, particularily as they claim this is "real real-time". I understand this is their favored rather-static test scene, they use previous frame info, lots of other buffers, whatnot. But the stills look unbelievably good. At least compared to what i've seen in Ray Reconstruction (1440p DLSS Quality) cook up in Alan Wake 2 and Cyberpunk 2077, upscaling from a similar base resolution.

The latter tends to blur and create those "artsy" almos painting-like look for small detail on similar upscale levels.

I'll try to not get my hopes up, as this is an AMD software solution we're talking about after all ...

But I sure would like to see it in motion, in actual games. This looks highly promising!
Denoising is and always will be a flawed solution to sample variance, as much a crutch to augment RT image quality as super resolution is to performance.

The averaging of pixels turns out blurred outcomes, so the less variance (noise) you start with the better - no amount of magic machine learning models will ever fix this.
 
Reactions: Tlh97 and marees

poke01

Platinum Member
Mar 8, 2022
2,581
3,409
106
If RDNA4 is more efficient in idle and power efficiency has improved I’ll be getting one to test. I’ll still get one if it’s not improved as it looks to be big upgrade in terms over RDNA3 in terms of RT and AI.
Denoising is and always will be a flawed solution to sample variance, as much a crutch to augment RT image quality as super resolution is to performance.

The averaging of pixels turns out blurred outcomes, so the less variance (noise) you start with the better - no amount of magic machine learning models will ever fix this.
ehh, AMD can cook up something up. Their frame gen is already better than Nvidias.q
 

soresu

Diamond Member
Dec 19, 2014
3,323
2,599
136
ehh, AMD can cook up something up. Their frame gen is already better than Nvidias.q
You can't pull data out of thin air once it is gone, best an AI model can do is hallucinate what it infers might have been there based on its training data.

The better solution is to improve the original noisy frames, either by using more efficient rendering algorithms a la variations on ReSTIR, or by spending more time (samples) on each frame.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,932
5,075
136
You can't pull data out of thin air once it is gone, best an AI model can do is hallucinate what it inferes might have been there based on its training data.

The better solution is to improve the original noisy frames, either by using more efficient rendering algorithms a la variations on ReSTIR, or by spending more time (samples) on each frame.
Here we are on verge of the game equivalent of the many worlds theory. Is mine exactly as yours?
 
Reactions: soresu

Gideon

Golden Member
Nov 27, 2007
1,842
4,379
136
You can't pull data out of thin air once it is gone, best an AI model can do is hallucinate what it inferes might have been there based on its training data.

The better solution is to improve the original noisy frames, either by using more efficient rendering algorithms a la variations on ReSTIR, or by spending more time (samples) on each frame.
While i agree broadly, there is actually more info to go on, so it doesn't have to be just hallucination.

Upscaling works on accumulation over multiple frames and all the other buffers (depth, etc):



And Let's not forget not all upscaling methods use AI. This is what UE5 TSR (NO AI, thus no hallucinations) could do in 2021. Upscale surprisingly well from 480p -> 1440p given enoug frames for data accumulation (obviously not playable like that but this still isn't "hallucination"):


I posted more info about it here:

Yeah, both TAA and upscaling have a plethora of issues, but it isn't very productive to just step on the high horse and claim "they should all just fix the problems by going 8x MSAA and 1024 samples per pixel" or something.

The AMD blog post had a noisy output even while using 32768 samples per pixel:


I know ReSTIR can do better, but still. The reality is GPU hardware won't scale anywhere near as fast as it used to 1998 -> 2016. We ain't gonna get hardware that's capable of doing say 4 rays per and 4x MSAA (thus 2x downscale instead of TAA and upscale) on a comparable image. Not even in the next 10 years.

I do see plenty issues with the hype around going to "Path Tracing + Temporal AA and upscaling + Framegen" only, particularily given the sloppy implementations. Imo Threat Iteractive (despite his arrogance) gives excellent insight into those:


I don't think this is the direction the majority of games should flock to, but i do see benefits in researching into that too.
 
Last edited:

marees

Senior member
Apr 28, 2024
578
639
96
What I heard was very similar to yours, RT = AD104/4070TI?, rasterization/nonRT = 7900XTX, but I don't know the naming scheme. TSMC N4.

Or are you just speculating?
I don't think raster will be 7900 xtx. These are polaris like chips. It should be around 7900xt

RT could be anywhere from 4070 to 4070 ti super
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |