Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

soresu · Oct 30, 2024

SolidQ said:
big RT up and FSR4 seems confirmed

FSR4 was already confirmed from another AMD employee, and now the GPUOpen blog article has given some technical detail on it prior to release.

The "new AI capabilities" is probably referring to new instructions or data types that are targeted for AI/ML compute.

Kepler_L2 · Oct 30, 2024

soresu said:
FSR4 was already confirmed from another AMD employee, and now the GPUOpen blog article has given some technical detail on it prior to release.

The "new AI capabilities" is probably referring to new instructions or data types that are targeted for AI/ML compute.

FP8/BF8 support, sparsity support, 2x DP4a/WMMA rates per cycle.

SolidQ · Oct 30, 2024

Kepler_L2 said:
FP8/BF8 support, sparsity support, 2x DP4a/WMMA rates per cycle.

Is there any info about new WGP?

Kepler_L2 · Oct 30, 2024

SolidQ said:
Is there any info about new WGP?

Outside of RT/AI stuff there hasn't been too many changes. The only other major feature is something called "WGP take-over mode" which changes the scheduling model to improve data locality and reduce global data sharing.

Kepler_L2 · Oct 30, 2024

I might as well post all the changes

SolidQ · Oct 30, 2024

Kepler_L2 said:
The only other major feature is something called "WGP take-over mode" which changes the scheduling model to improve data locality and reduce global data sharing.

That would be interesting compare to RDNA 3/3.5 wgp

Outside of RT/AI stuff there hasn't been too many changes.

would be interesting compare PS5pro RDNA4 RT and Desktop RDNA4 RT per WGP

moinmoin · Oct 30, 2024

soresu said:
and now the GPUOpen blog article has given some technical detail on it prior to release.

Can you link it? I don't seem to find it.

soresu · Oct 30, 2024

moinmoin said:
Can you link it? I don't seem to find it.

SolidQ posted it a few days ago...

Neural Supersampling and Denoising for Real-time Path Tracing

Read our research for a neural denoising and supersampling technique, with the aim of achieving real-time path tracing.

gpuopen.com

jpiniero · Nov 3, 2024

https://videocardz.com/newz/playstation-5-pro-specs-and-teardown-leaks-ahead-of-launch-16-7-tflops-rdna-gpu-and-8-zen2-cores

Interesting, Sony is quoting the PS5 Pro's TF at 16.7, which suggests that Sony either isn't going to support the dual issue or they think it's misleading. Power supply has been increased by 40 W compared to the OG PS5 which isn't a lot.

Also there's a teardown if you are into that.

ToTTenTranz · Nov 3, 2024

jpiniero said:
https://videocardz.com/newz/playstation-5-pro-specs-and-teardown-leaks-ahead-of-launch-16-7-tflops-rdna-gpu-and-8-zen2-cores

Interesting, Sony is quoting the PS5 Pro's TF at 16.7, which suggests that Sony either isn't going to support the dual issue or they think it's misleading. Power supply has been increased by 40 W compared to the OG PS5 which isn't a lot.

Also there's a teardown if you are into that.

Those sound like base clocks without boost.
Also, the N4 SoC is apparently smaller than the original N7 one. There's 2GB DDR5 compared to 512MB DDR4 on the 2020 console, which is how Sony released a larger part of the main GDDR6 pool for developers (a similar method to what they did for the PS4 Pro).

As for the dual-issue, my guess is Sony doesn't want to claim 33.4 TFLOPs for just a small subset of supported instructions. If for most loads the theoretical max is 16.7 TFLOPs, that's how many they want to declare.

gdansk · Nov 3, 2024

It's misleading since almost nothing actually dual issues.
Good on Sony for avoiding that.

adroc_thurston · Nov 3, 2024

gdansk said:
It's misleading since almost nothing actually dual issues.

You don't have to. W64 is just 1clk instead.

Gideon · Nov 4, 2024

soresu said:
SolidQ posted it a few days ago...

Neural Supersampling and Denoising for Real-time Path Tracing

Read our research for a neural denoising and supersampling technique, with the aim of achieving real-time path tracing.

gpuopen.com

That looks more like the Nvidia Ray Reconstruciton analog than just DLSS. TBF the stills look AMAZING

Just take a look at the "Restaurant" sign on this uncompressed 17MB image:

https://gpuopen.com/docs_images/neural_supersampling_and_denoising_for_real-time_path_tracing/neural_supersampling_and_denoising_for_real-time_path_tracing-html-_images-Picture4.png

Honestly getting from this input 1080p noisy image:

TO this 4K image:

Seems almost outlandish to me, particularily as they claim this is "real real-time". I understand this is their favored rather-static test scene, they use previous frame info, lots of other buffers, whatnot. But the stills look unbelievably good. At least compared to what i've seen in Ray Reconstruction (1440p DLSS Quality) cook up in Alan Wake 2 and Cyberpunk 2077, upscaling from a similar base resolution.

The latter tends to blur and create those "artsy" almos painting-like look for small detail on similar upscale levels.

I'll try to not get my hopes up, as this is an AMD software solution we're talking about after all ...

But I sure would like to see it in motion, in actual games. This looks highly promising!

Hail The Brain Slug · Nov 4, 2024

static shock said:
I just read on a forum a rumor that RDNA4 have ADA levels of ray tracing power.

Thought better of pretending like you have insider info?

soresu · Nov 4, 2024

Gideon said:
That looks more like the Nvidia Ray Reconstruciton analog than just DLSS. TBF the stills look AMAZING

Just take a look at the "Restaurant" sign on this uncompressed 17MB image:

https://gpuopen.com/docs_images/neural_supersampling_and_denoising_for_real-time_path_tracing/neural_supersampling_and_denoising_for_real-time_path_tracing-html-_images-Picture4.png

Honestly getting from this input 1080p noisy image:

TO this 4K image:

Seems almost outlandish to me, particularily as they claim this is "real real-time". I understand this is their favored rather-static test scene, they use previous frame info, lots of other buffers, whatnot. But the stills look unbelievably good. At least compared to what i've seen in Ray Reconstruction (1440p DLSS Quality) cook up in Alan Wake 2 and Cyberpunk 2077, upscaling from a similar base resolution.

The latter tends to blur and create those "artsy" almos painting-like look for small detail on similar upscale levels.

I'll try to not get my hopes up, as this is an AMD software solution we're talking about after all ...

But I sure would like to see it in motion, in actual games. This looks highly promising!

Denoising is and always will be a flawed solution to sample variance, as much a crutch to augment RT image quality as super resolution is to performance.

The averaging of pixels turns out blurred outcomes, so the less variance (noise) you start with the better - no amount of magic machine learning models will ever fix this.

linkgoron · Nov 4, 2024

Hail The Brain Slug said:
View attachment 110948
Thought better of pretending like you have insider info?

He posted that post and then once he read his own post he could edit the post and truthfully say that he saw it posted on a forum.

Also, IIRC correctly he already deleted previously claimed "insider info" posts of his.

MrTeal · Nov 4, 2024

static shock said:
I just read on a forum a rumor that RDNA4 have ADA levels of ray tracing power.

Cool, so we can expect Navi 48 to have similar FPS in heavy RT games as AD107. 👍

poke01 · Nov 4, 2024

If RDNA4 is more efficient in idle and power efficiency has improved I’ll be getting one to test. I’ll still get one if it’s not improved as it looks to be big upgrade in terms over RDNA3 in terms of RT and AI.

soresu said:
Denoising is and always will be a flawed solution to sample variance, as much a crutch to augment RT image quality as super resolution is to performance.

The averaging of pixels turns out blurred outcomes, so the less variance (noise) you start with the better - no amount of magic machine learning models will ever fix this.

ehh, AMD can cook up something up. Their frame gen is already better than Nvidias.q

soresu · Nov 4, 2024

poke01 said:
ehh, AMD can cook up something up. Their frame gen is already better than Nvidias.q

You can't pull data out of thin air once it is gone, best an AI model can do is hallucinate what it infers might have been there based on its training data.

The better solution is to improve the original noisy frames, either by using more efficient rendering algorithms a la variations on ReSTIR, or by spending more time (samples) on each frame.

maddie · Nov 4, 2024

soresu said:
You can't pull data out of thin air once it is gone, best an AI model can do is hallucinate what it inferes might have been there based on its training data.

The better solution is to improve the original noisy frames, either by using more efficient rendering algorithms a la variations on ReSTIR, or by spending more time (samples) on each frame.

Here we are on verge of the game equivalent of the many worlds theory. Is mine exactly as yours?

Gideon · Nov 5, 2024

soresu said:
You can't pull data out of thin air once it is gone, best an AI model can do is hallucinate what it inferes might have been there based on its training data.

The better solution is to improve the original noisy frames, either by using more efficient rendering algorithms a la variations on ReSTIR, or by spending more time (samples) on each frame.

While i agree broadly, there is actually more info to go on, so it doesn't have to be just hallucination.

Upscaling works on accumulation over multiple frames and all the other buffers (depth, etc):

And Let's not forget not all upscaling methods use AI. This is what UE5 TSR (NO AI, thus no hallucinations) could do in 2021. Upscale surprisingly well from 480p -> 1440p given enoug frames for data accumulation (obviously not playable like that but this still isn't "hallucination"):

I posted more info about it here:

Discussion - AMD Gaming Super Resolution GSR

New Patent came up today for AMD's FSR https://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220210150669%22.PGNR.&OS=DN/20210150669&RS=DN/20210150669 https://www.freepatentsonline.com/y2021/0150669.html 20210150669...

forums.anandtech.com

Yeah, both TAA and upscaling have a plethora of issues, but it isn't very productive to just step on the high horse and claim "they should all just fix the problems by going 8x MSAA and 1024 samples per pixel" or something.

The AMD blog post had a noisy output even while using 32768 samples per pixel:

I know ReSTIR can do better, but still. The reality is GPU hardware won't scale anywhere near as fast as it used to 1998 -> 2016. We ain't gonna get hardware that's capable of doing say 4 rays per and 4x MSAA (thus 2x downscale instead of TAA and upscale) on a comparable image. Not even in the next 10 years.

I do see plenty issues with the hype around going to "Path Tracing + Temporal AA and upscaling + Framegen" only, particularily given the sloppy implementations. Imo Threat Iteractive (despite his arrogance) gives excellent insight into those:

Threat Interactive

Official YouTube Channel of The New Indie Game Studio: Threat Interactive Website: https://threatinteractive.wordpress.com/ Twitter: https://x.com/ThreatInteract Reddit: https://new.reddit.com/user/ThreatInteractive Official Threat Interactive Discord: https://discord.gg/7ZdvFxFTba Email...

www.youtube.com

I don't think this is the direction the majority of games should flock to, but i do see benefits in researching into that too.

jpiniero · Nov 5, 2024

static shock said:
Sorry AMD
Rx 8800 xtx: 7900xtx performance, ray tracing power improved by 2x, AI engine more powerful, 150w TGP. TSMC N3E.

Yeah... I don't see AMD using N3E at this point for gaming GPUs. Or NV for that matter.

vanplayer · Nov 6, 2024

static shock said:
You got it.

What I heard was very similar to yours, RT = AD104/4070TI?, rasterization/nonRT = 7900XTX, but I don't know the naming scheme. TSMC N4.

Or are you just speculating?

marees · Nov 6, 2024

vanplayer said:
What I heard was very similar to yours, RT = AD104/4070TI?, rasterization/nonRT = 7900XTX, but I don't know the naming scheme. TSMC N4.

Or are you just speculating?

I don't think raster will be 7900 xtx. These are polaris like chips. It should be around 7900xt

RT could be anywhere from 4070 to 4070 ti super

SolidQ · Nov 6, 2024

marees said:
RT could be anywhere from 4070 to 4070 ti super

That from new patch RE4. What is real resolution don't know

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Senior member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Lifer

Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Lifer

Member

Senior member

Senior member