Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

soresu · Jan 13, 2025

adroc_thurston said:
That's funny. Because they did, with H100.

I mean, ye but no.

IIRC it's not a wide departure like RDNA and CDNA at the µArch level.

It's more to do with chip layout and the datacenter/HPC platforms it was designed to run on.

soresu · Jan 13, 2025

SolidQ said:
What games that was?
What if RTX 5070 is real 4070S perf or +5%

Goddammit I saw a bar graph at the time and now I can't find it.

It showed a raster only title on the left and then a load of RT/DLSS titles to the right.

SolidQ · Jan 13, 2025

soresu said:
It showed a raster only title on the left and then a load of RT/DLSS titles to the right.

how much perf was 15-20%?

soresu · Jan 13, 2025

SolidQ said:
how much perf was 15-20%?

Don't remember, I just remember the graph and people talking about it.

Keller_TT · Jan 13, 2025

soresu said:
Don't remember, I just remember the graph and people talking about it.

I peg it at 4070 Ti for Raster and TiS for RT based on detailed specs. The charts seem to be in that range too.
Jensen couldn't go beyond 549 as it's 12 GB and AMD would be near enough for a $400 16GB 4070S level undercut even if it's not as efficient.

soresu · Jan 13, 2025

Ah found it....

Oh wait no, that says RT not raster - mandella effect'd myself.

SolidQ · Jan 13, 2025

soresu said:
Ah found it....

View attachment 114839
Oh wait no, that says RT not raster - mandella effect'd myself.

ah this, but there no pure raster perf and also UE5 game, except wukong, but that with RT result

soresu · Jan 13, 2025

There is also a 5070 compared to 4070 bar graph.

Given raster perf is generally improving significantly slower per gen than RT we can infer that it isn't going to be any great surprise as to how much it has changed.

adroc_thurston · Jan 13, 2025

soresu said:
IIRC it's not a wide departure like RDNA and CDNA at the µArch level.

yeah it is.

soresu said:
It's more to do with chip layout and the datacenter/HPC platforms it was designed to run on.

no the SM has been different since A100. H100 broke CUDA forward progress guarantee.

SolidQ · Jan 13, 2025

Clocks for Powercolor models

Keller_TT · Jan 13, 2025

Looks like Sony is the most judicious when it comes to maximizing PPA and PPW for an enthusiast console.
Their custom Navi2 6800 + Navi 48 uGPU paired with Zen2 is like a 3.6 GHz 3700X+3070 Ti with 16 Gigs VRAM on tap. For mid-range price conscious gamers, that's the best bang for the buck even if it isn't as subsidized as PS5.
It will improve optimization for the PC too.

Heartbreaker · Jan 13, 2025

DaaQ said:
Nvidia is all AI focused. EDIT: Gaming is an afterthought.

Despite some political rhetoric. Corporations aren't people. They can focus on multiple things.

There are ZERO signs they are spending less on gaming R&D, and I would bet they are probably spending more. The people that work in those divisions are 100% focused on gaming.

They have more money from and for AI, but that hasn't taken anything away from their gaming division, which most likely better funded, and better staffed than it has ever been.

gaav87 · Jan 13, 2025

Anyone have, any clue if RDNA4 has any cooperative vectors for neural rendering like nvidia and intel (vpu) have ?

soresu · Jan 13, 2025

gaav87 said:
Anyone have, any clue if RDNA4 has any cooperative vectors for neural rendering like nvidia and intel (vpu) have ?

Neural rendering in this context is just code for "uses ML ops to augment performance or image quality".

In which case yes, RDNA4 reportedly receives a major uptick for computing many ML data types/ops through improvements to its CUs.

Tensor cores, matrix cores, Intel VPU etc are just domain specific (tuned to/designed for a specific workload type) accelerators for AI/ML ops, whereas CUs and CUDA cores are general compute that work with everything, albeit not at its most optimal performance or efficiency.

poke01 · Jan 13, 2025

gaav87 said:
Anyone have, any clue if RDNA4 has any cooperative vectors for neural rendering like nvidia and intel (vpu) have ?

View attachment 114850
View attachment 114851

soresu said:
Neural rendering in this context is just code for "uses ML ops to augment performance or image quality".

In which case yes, RDNA4 reportedly receives a major uptick for computing many ML data types through improvements to its CUs.

Tensor cores, matrix cores, Intel VPU etc are just domain specific (tuned to/designed for a specific workload type) accelerators for AI/ML ops, whereas CUs and CUDA cores are general compute that work with everything, albeit not at its most optimal performance or efficiency.

AMD can support this with RDNA4

soresu · Jan 13, 2025

poke01 said:
AMD can support this with RDNA4

That would be my assumption yes.

At the end of the day nVidia have the problem of trying to create a walled garden in gaming due to AMD having sewn up the hi end in consoles, and therefore most games produced will need to be flexible in hardware implementation if they want said features to be actually used by game devs.

The same is true with nVidia doing research into speeding up path tracing with ReSTIR etc in order to further their push towards RT gaming.

At the end of the day it needs to be flexible enough to have wide hardware support so that devs don't have to make entirely redundant code to support multiple platforms.

Win2012R2 · Jan 13, 2025

poke01 said:
AMD can support this with RDNA4

They could "support" ray tracing too in RDNA2, just very slowly... seems inevitable since Nvidia specifically said they've got some kind of "light" tensors very tightly integrated with shaders or something along those lines, so for light neural stuff it will be very efficient.

But in any case it will take Microsoft at least a year to push update out to DX, and then years before any major game starts supporting it, by that point DLSS 8 will be playing games for you...

gaav87 · Jan 13, 2025

poke01 said:
AMD can support this with RDNA4

soresu said:
Neural rendering in this context is just code for "uses ML ops to augment performance or image quality".

In which case yes, RDNA4 reportedly receives a major uptick for computing many ML data types/ops through improvements to its CUs.

Tensor cores, matrix cores, Intel VPU etc are just domain specific (tuned to/designed for a specific workload type) accelerators for AI/ML ops, whereas CUs and CUDA cores are general compute that work with everything, albeit not at its most optimal performance or efficiency.

Hm this would mean they have separate hw for RT now ?

gaav87 · Jan 13, 2025

ToTTenTranz said:
During the keynote what Jensen said was they're now running tensor ops in the cuda cores and not just the tensor cores, and that allows them to use AI in some game-specific instructions.
I don't think that means they can just add the tensor output from cuda/shader cores to the tensor cores, as they still share the same L1 and L2 IIRC.

Underwhelming or not, they're completely alone from the ~$500 up, which means they can ask how much ever they want for the 5090.

And what if they are running neural shader ops on tensor cores with the use of cooperative vectors (from the screenshot on nvidia website) ?
Why would they use expensive gddr7 just for +10% performance uplift ? And the +33-45% leatherjacket showed is correct and that made amd scared ? Fuk knows...

poke01 · Jan 13, 2025

Nope RDNA4 RT is done on CUs.

gaav87 said:
Hm this would mean they have separate hw for RT now ?

Keller_TT · Jan 14, 2025

poke01 said:
Nope RDNA4 RT is done on CUs.

RDNA 4 does add hardware specific for RT though for BVH traversal and RT calculations. RDNA 3 introduced WMMA support to speed up matrix multiplication which too was done on CUs on RDNA 2 and was a very inefficient resource hog. RT was an afterthought for RDNA2.
AMD just didn't add any special "matrix cores", but just beefed up the CUs with hardware acceleration for RT & ML data structures.

On paper, RDNA4 doubles RTops over RDNA 3, and AMD also had bottlenecks to unlock the full potential of WMMA in RDNA 3 which should be rectified in the new architecture. So, overall, I would definitely expect a 2-3x boost from RDNA2 based on the game. Except for heavy Path tracing, the gap to 5070 class Blackwell should be significantly less.

adroc_thurston · Jan 14, 2025

Keller_TT said:
RT was an afterthought for RDNA2

No, it's just a baseline compliant implementation.

Keller_TT said:
RDNA4 doubles RTops over RDNA 3

What's RTops

Keller_TT said:
the gap to 5070 class Blackwell should be significantly less.

Well Blackwell doesn't improve RTRT much.

Keller_TT · Jan 14, 2025

adroc_thurston said:
What's RTops

Just shortened for RT Operations per second. Nvidia calls it RTX-OPS.

adroc_thurston · Jan 14, 2025

Keller_TT said:
Just shortened for RT Operations per second. Nvidia calls it RTX-OPS.

That's not a real metric.
Lol. Otherwise every atomic memory operation is an RTX-OP™.

Keller_TT · Jan 14, 2025

adroc_thurston said:
That's not a real metric.
Lol. Otherwise every atomic memory operation is an RTX-OP™.

Yea yea. I didn't want to get too technical with ray-triangle intersects and all. So, just shortened as I thought it should be self-evident to understand. lol.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Member