Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

adroc_thurston · Mar 2, 2025

Win2012R2 said:
Have they improved dual issue to be at least on par with Nvidia's?

No.
CDNA3 has it though.
Really not all that relevant for gaming at all. Also you don't really need to dual-issue when you can just spam w64 for like PS waves.

gaav87 · Mar 2, 2025

AI image sharpening desktop wide is big... No longer limited to certain games i always used RIS at like 30-40%

DisEnchantment · Mar 2, 2025

itsmydamnation said:
man...... if only AMD had a 96CU // 384bit part ... my 7900XTX is 6 months olds , i'll still probably buy a 9090XTX. But right now its like RDNA 4 can do lower precision better , but has less memory (24gb vs 16gb matters for local LLM's ) and doesn't perform any better until FSR4 makes XESS on RDNA3 look like crap.

the distilled DS models would love RDNA4 , RDNA3 // 7900XTX was already at 4090 levels of performance in it.

I am totally with you on this. I also am a bit disappointed with only 64CUs and 16GB VRAM.

I could be in the minority of folks who love AI/ML, but last year I got around 120K+ lines of code in C++ and Rust with coding assistants doing 60% of the work and I was really impressed.
I found a good way to use Coding Assistants, basically to do lots of repetitive jobs and asking it to fill things which I already know how it would look like but just let it automate itself.
No going back, now I feel odd to not have the coding assistant whenever I am coding something.

I can also use it to analyze start up logs, compilation problems, possible security vulnerabilities, and communication patterns and it really amplified my productivity at work.

With 96/128CUs +32G we could have been able to run all these distilled models locally, especially the bigger models not the absolute bottom 1B+ models, it can be a lot more pervasive where to use ML/AI for daily work stuffs.

Personally, I got Topaz Photo AI and have been able to restore really old photos of my pop who is gone and it was bringing back so many strong memories. I love this app especially the denoise, upscale and face restore.
It works well with AMD cards and I hope in the future it can leverage FP8 support

Win2012R2 said:
Have they improved dual issue to be at least on par with Nvidia's?

Currently from LLVM code, they cannot feed more than 4 vector operands in VOPD, the other 2 operands can only be scalar or constants. So nothing much can change here. VOPD is limited as usual.
I had hoped for more than 4 register banks in RDNA3 but apparently wasn't the case with RDNA3/4.

Win2012R2 · Mar 2, 2025

DisEnchantment said:
I had hoped for more than 4 register banks in RDNA3 but apparently wasn't the case with RDNA3/4.

N2 with its SRAM shrink should give some nice improvements, pity it's couple of years away...

MS_AT · Mar 2, 2025

DisEnchantment said:
With 96/128CUs +32G we could have been able to run all these distilled models locally

You don't need more CUs for that, just more memory at suitable MemBW.

soresu · Mar 2, 2025

Petition to the admins for more emoji variations.

Needz a thinking one for these patent or tech advance drops.

soresu · Mar 2, 2025

RnR_au said:
From an AI perspective...

View attachment 118546

Source

This is the more complete list provided from AMD's own slides...

soresu · Mar 2, 2025

adroc_thurston said:
If they're really building an OoO shader core, pretty much anything is possible.

OoO is not without it's notable downsides though.

Power and area/complexity also increases significantly as a result, and it's not like AMD's CUs haven't been getting progressively chonkier in each generation already.

Area especially would be problematic considering how close we are to diminishing returns in area scaling these days.

That's why this research going on to find alternative architectural approaches like Forward Slice Core interest me so much (thanks to Nosta for providing some practical interesting information).

It's not all the way to OoO perf, but it's a big increase over in order at minimal increase to power and area.

That being said, if they have already been laying the ground work for a shift to OoO in RDNA4 and RDNA3 µArchs then the shift may not be so detrimental as I fear.

eek2121 · Mar 2, 2025

branch_suggestion said:
Other than an enforced memoc on the XT, not much can be done.
Vanilla 9070 I could certainly see a driver update to a 260W TBP to counter a 5070 Super or whatever in tandem with a price cut.
The current config is built to win the efficiency graphs, nothing more.

It is possible we may get an XTX variant down the road with higher clocks and more (possibly faster?) memory. There have been a few rumors suggesting that AMD is playing with something in house.

adroc_thurston said:
Raytracing as a workload is a bespoke kind of horror for GPUs: it's very divergent and very very latency-sensitive, and GPUs naturally suck at that.

Man, I still remember when it was “impossible” to do hardware RT. We have come a long way.

adroc_thurston said:
Maybe?
If they're really building an OoO shader core, pretty much anything is possible.
A P6 moment for GPUs would be insane.

Agreed!

adroc_thurston · Mar 2, 2025

soresu said:
Area especially would be problematic considering how close we are to diminishing returns in area scaling these days.

Oh hell no logic scaling is the only thing we have going on these days.

soresu said:
That's why this research going on to find alternative architectural approaches like Forward Slice Core interest me so much (thanks to Nosta for providing some practical interesting information).

Meme academia stuff is meme.

soresu said:
That being said, if they have already been laying the ground work for a shift to OoO in RDNA4 and RDNA3 µArchs then the shift may not be so detrimental as I fear.

Thanks AMD for making GPUs not boring again.

eek2121 said:
Man, I still remember when it was “impossible” to do hardware RT. We have come a long way.

Well, for most implementations they've sidestepped the problem of doing RT on GPUs by not doing it on the GPU actual.

Panino Manino · Mar 2, 2025

poke01 said:
Kingdom come 2 uses Voxel cone tracing, not RT. NV and Intel cards play well with this game.

Oh well at least AMD has CoD and Spider-Man.

adroc_thurston said:
It's a cryengine thing. It hates AMD GPUs for idk why reasons.

I don't understand.
When Crytek demonstrated their "Software RT" it was running on a AMD Vega, wasn't it? To prove that you didn't needed Nvidia dedicated hardware.
Shouldn't it run better?

adroc_thurston · Mar 2, 2025

Panino Manino said:
Shouldn't it run better?

SVOGI itself maybe, but there are many many other things that make a video game.

soresu · Mar 2, 2025

adroc_thurston said:
Oh hell no logic scaling is the only thing we have going on these days.

It is still scaling, but the end of that scaling is now visible on the horizon.

adroc_thurston said:
Meme academia stuff is meme.

Yesterdays meme academia stuff is tomorrows slightly altered corporate patent meme stuff.

Nearly everything the tech corps do started in academia, and often is co created in lock step with academic collaboration.

Just look at how many of nVidia's real time RT papers over the last decade have academic co authors.

soresu · Mar 2, 2025

Panino Manino said:
When Crytek demonstrated their "Software RT" it was running on a AMD Vega, wasn't it? To prove that you didn't needed Nvidia dedicated hardware.

More like it was to prove that their Sparse Voxel Octree system in the engine was already well suited to RT rendering, probably more so than it was for raster gfx in fact

adroc_thurston · Mar 2, 2025

soresu said:
It is still scaling, but the end of that scaling is now visible on the horizon.

uh. nope. NOPE.

soresu said:
Yesterdays meme academia stuff is tomorrows slightly altered corporate patent meme stuff.

Nope.
Academia never leaves academia.

soresu said:
Just look at how many of nVidia's real time RT papers over the last decade have academic co authors.

That's normal but the ideas are IHV-driven, not from ivory towers at all.

soresu · Mar 2, 2025

adroc_thurston said:
uh. nope. NOPE.

I was talking about area scaling, and on that front at least my point is valid.

If we are talking about changing transistor device types, materials, vertical scaling and/or wholesale changing compute paradigms (a la reversible computing, Blueshift Memory's Cambridge architecture, optical computing etc etc) that's another whole range of matters entirely - there's definitely plenty of room on that front, no arguments at all there.

But from a perspective of purely area based scaling the horizon within the next 2 decades is most definitely visible simply from a perspective of the fundamental pitch limits at which a transistor could work at all, let alone be performant and energy efficient.

adroc_thurston · Mar 2, 2025

soresu said:
I was talking about area scaling, and on that front at least my point is valid.

Well yeah logic scaling is super alive and well.

jpiniero · Mar 2, 2025

adroc_thurston said:
Well yeah logic scaling is super alive and well.

Just $$$$$$$$$

TESKATLIPOKA · Mar 2, 2025

adroc_thurston said:
And with 300W TBP.

In a way we can say that RTX 5080 is pretty bad compared to 4080 Super. Link
TBP: 360W vs 320W (+12.5%)
Real power consumption: 332W vs 290W (+14.5%)
Bandwidth: 960GB/s vs 736GB/s (+30.4%)
Cuda cores: 10,752 vs 10,240 (+5%)
Clockspeed(median): 2662Mhz vs 2730MHz (-2.5%)
Compute performance: 57,244 vs 55,910 (+2.4%)
Average performance: 100 vs 88 (+13.5%)
It looks like ~11% comes from just the faster memory. And I find It surprising that despite the higher power consumption It has lower GPU clocks. Is GDDR7 so power hungry or what?
On the other hand, OC is very good on RTX 5080 -> 13-17% of extra performance according to TPU's findings.

I don't think RX 9070(XT) will have as good OC as Blackwell, but maybe we will receive another pleasant surprise.

@SolidQ True for RX 9070, but that one has low clocks to begin with and It will need 300W board power limit to really shine.

SolidQ · Mar 2, 2025

TESKATLIPOKA said:
I don't think RX 9070

it's have very good OC, don't know for XT

soresu · Mar 2, 2025

jpiniero said:
Just $$$$$$$$$

So. Much. $$$$$$$.

Across all aspects - design, fabbing and litho.

And for diminishing returns at that.

Eventually the worlds population and all the assorted industries will not offer enough possible revenue to justify the cost of process node development.

Especially as population growth in developed nations is showing serious signs of faltering.

amenx · Mar 2, 2025

Didnt expect this.

Sapphire does the unthinkable and puts 16-pin power connector on an RX 9070 XT

https://www.tomshardware.com/pc-com...-power-connector-inside-offers-cableless-look

soresu · Mar 2, 2025

amenx said:
Didnt expect this.

Sapphire does the unthinkable and puts 16-pin power connector on an RX 9070 XT

https://www.tomshardware.com/pc-com...-power-connector-inside-offers-cableless-look

Not only that, but given the recommendation on distance from connector to a bend (35mm) it would seem to be extremely close to the point that I would not trust it.

They could have fixed that simply by leaving a groove in the heatsink fins, which makes them not doing so somewhat stupid.

It's enough to chance a 16 pin connector/cable vs the cost of a modern gfx card already without leaving that further uncertainty there.

Hail The Brain Slug · Mar 2, 2025

soresu said:
Not only that, but given the recommendation on distance from connector to a bend (35mm) it would seem to be extremely close to the point that I would not trust it.

They could have fixed that simply by leaving a groove in the heatsink fins, which makes them not doing so somewhat stupid.

It's enough to chance a 16 pin connector/cable vs the cost of a modern gfx card already without leaving that further uncertainty there.

Some cables are physically incapable of bending the way the Nitro+ will require. I checked my FSP 12v2x6 cable and its been designed to not allow bends anywhere near the connector end. I quite literally could not use the Nitro+ if I wanted with my native 12v2x6.

I really wanted the Nitro+, but I think I'll have to take my second choice in the TUF.

IEC · Mar 2, 2025

amenx said:
Didnt expect this.

Sapphire does the unthinkable and puts 16-pin power connector on an RX 9070 XT

https://www.tomshardware.com/pc-com...-power-connector-inside-offers-cableless-look

At least it's got active cooling. /s

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Senior member

Golden Member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Elite Member