Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

gdansk · Mar 3, 2025

Timorous said:
E core spam, GPU edition.

Isn't core spam business as usual for GPUs?

SolidQ · Mar 3, 2025

adroc_thurston said:
outright the opposite though.

like Oriented bound Box?

adroc_thurston · Mar 3, 2025

SolidQ said:
like Oriented bound Box?

No like our of order shader cores.

HuesToo4 · Mar 3, 2025

adroc_thurston said:
hindsight is 20/20

Not really, AMD doesn't even try to beat Nvidia's top end dGPU chips. They can, but don't want to because of high R&D costs and too much risk; RDNA 5 being fully monolithic proves this.

soresu · Mar 3, 2025

Timorous said:
E core spam, GPU edition.

AMD isn't even doing that with their CPUs with the exception of reduced cache and SIMD bit length to increase core count densities.

Timorous · Mar 3, 2025

gdansk said:
Isn't core spam business as usual for GPUs?

Not oooe core spam. Which seems to be what rdna5 may have.

soresu · Mar 3, 2025

adroc_thurston said:
Something far far simpler instead.

OoO is by itself far more complicated to implement performantly and efficiently than in order compute.

Especially given the number of engineers that have worked on OoO GPU designs is far less than those which have worked on OoO CPUs in the history of the industry.

You're just substituting process/packaging complexity for µArch complexity - which kinda runs counter to the whole chiplet era ethos of keeping die sizes down for minimised mask costs and maximised yields, something they had already identified as an issue before Zen 2 was in development 7-9 years ago.

Josh128 · Mar 3, 2025

RDNA5 X3D

soresu · Mar 3, 2025

randomhero said:
Great arch, poor execution

Execution was fine if a bit delayed to see how RTX 50 launch panned out.

They weren't factoring in the possibility of nV screwing the pooch on a product launch to a once in a generation degree.

No one was.

They prioritised the market segments that move the most volume and the GPUs needed to fill it.

That left them with engineering muscle for executing RDNA5/UDNA faster to pursue their long term RTG goals.

Josh128 · Mar 3, 2025

soresu said:
Execution was fine if a bit delayed to see how RTX 50 launch panned out.

They weren't factoring in the possibility of nV screwing the pooch on a product launch to a once in a generation degree.

No one was.

They prioritised the market segments that move the most volume and the GPUs needed to fill it.

That left them with engineering muscle for executing RDNA5/UDNA faster to pursue their long term RTG goals.

RTG has long term goals??

soresu · Mar 3, 2025

Josh128 said:
RDNA5 X3D

If you look back at their exascale ambitions many years back (GCN days) they were looking at stacking HBM directly on top of GPU dies.

Eventually this will happen for datacenter GPUs at least, and stacked cache is definitely on the table for both CPU and GPU.

As vertical interconnect pitch decreases and density increases we'll see the area heavy cache move off the compute dies onto dedicated stacked dies which have their own SRAM optimised process.

gaav87 · Mar 3, 2025

Anyway this is max oced 9070xt +9800x3d(oced) 40% over 4080s+ oced 14900K
It also beats xtx and 4090...
Lower vram numbers cause of mpo windowed mode or some bs.
4k max settings 50% fsr no rt.
Im not even bothering to post my score cause its two exing it.

adroc_thurston · Mar 3, 2025

HuesToo4 said:
AMD doesn't even try to beat Nvidia's top end dGPU chips. They can, but don't want to because of high R&D costs and too much risk; RDNA 5 being fully monolithic proves this.

They sure did. Pussied out last minute.

adroc_thurston · Mar 3, 2025

soresu said:
OoO is by itself far more complicated to implement performantly and efficiently than in order compute.

Especially given the number of engineers that have worked on OoO GPU designs is far less than those which have worked on OoO CPUs in the history of the industry.

You're just substituting process/packaging complexity for µArch complexity - which kinda runs counter to the whole chiplet era ethos of keeping die sizes down for minimised mask costs and maximised yields, something they had already identified as an issue before Zen 2 was in development 7-9 years ago.

Oh no I'm not talking that.
I'm just saying they should loan Venice-D CCD floorplan.

soresu said:
If you look back at their exascale ambitions many years back (GCN days) they were looking at stacking HBM directly on top of GPU dies.

Eventually this will happen for datacenter GPUs at least, and stacked cache is definitely on the table for both CPU and GPU.

As vertical interconnect pitch decreases and density increases we'll see the area heavy cache move off the compute dies onto dedicated stacked dies which have their own SRAM optimised process.

DRAM is a really bad fit for that.

SolidQ · Mar 3, 2025

gaav87 said:
4k max settings 50% fsr no rt.

and 4080\4090 also DLSS 50%? or both native?

gaav87 · Mar 3, 2025

SolidQ said:
and 4080\4090 also DLSS 50%? or both native?

dlss50%

Mopetar · Mar 3, 2025

How much does OoO realistically get you for the average GPU load? Most stuff on the GPU is already designed to be a massively parallel workload that doesn't have many branches or stalls due to random memory accesses. GPUs were created to deal with this kind of crunch work in the first place.

CPUs need OoO to keep their pipelines full because they have to deal with a lot of branches or other code that could lead to pipeline stalls. GPU code generally shouldn't have those sort of problems. I'm sure there's performance to gain by finding something that lacks dependencies that could be executed sooner rather than later, but I don't think there's nearly as much room for massive performance uplifts.

I guess AMD doesn't need anything nearly as sophisticated as they use in their CPUs. If it's low hanging fruit and there's a good enough return for the transistors used, then I'll trust them on it.

adroc_thurston · Mar 3, 2025

Mopetar said:
Most stuff on the GPU is already designed to be a massively parallel workload that doesn't have many branches or stalls due to random memory accesses

Pretty much everything RTRT is that.
Trickier compute shaders are that too.

SolidQ · Mar 3, 2025

Lower vram numbers cause of mpo windowed mode or some bs.

Isn't AMD mention in RDNA4 something they do with VRAM?

Bryo4321 · Mar 3, 2025

gaav87 said:
Anyway this is max oced 9070xt +9800x3d(oced) 40% over 4080s+ oced 14900K
It also beats xtx and 4090...
Lower vram numbers cause of mpo windowed mode or some bs.
4k max settings 50% fsr no rt.
Im not even bothering to post my score cause its two exing it.

View attachment 118783
View attachment 118784

The dlss/fsr comparison makes this a little wonky but I’m still having trouble making sense of how that’s possible lol.

SolidQ · Mar 3, 2025

Bryo4321 said:
The dlss/fsr comparison makes this a little wonky but I’m still having trouble making sense of how that’s possible lol.

Fabio mentioned, he can get better results than AMD posted.
Also guy from reddit few days ago mentioned, 9070XT OC can get up 5080 perf

DiogoDX · Mar 3, 2025

Bryo4321 said:
The dlss/fsr comparison makes this a little wonky but I’m still having trouble making sense of how that’s possible lol.

I think there are one or two games that the 7900XTX matches the 4090. Since the 9070XT should be close to the XTX I can see some cases that happens.

DownTheSky · Mar 3, 2025

Bryo4321 said:
The dlss/fsr comparison makes this a little wonky but I’m still having trouble making sense of how that’s possible lol.

Because it's not.

SolidQ · Mar 3, 2025

DownTheSky said:
Because it's not.

i'd assuming it's with FG results, and we know AMD FG is way faster, than NV FG

gaav87 · Mar 3, 2025

DownTheSky said:
Because it's not.

Idk i think that guy is some LN overclocker so would not be surprised if this was chilled or smt. xD

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Golden Member

Diamond Member

Junior Member

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Golden Member

Senior member

Diamond Member

Diamond Member

Golden Member

Member

Golden Member

Senior member

Senior member

Golden Member

Senior member