Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

gdansk · Jan 31, 2025

adroc_thurston said:
View attachment 116051
"merely adequate, not exciting, barely beating BMG (N3B)"

For similar configurations, yes.

adroc_thurston · Jan 31, 2025

gdansk said:
For similar configurations, yes.

They're not even remotely similar between LNL and STX1 and you know it.
But you know what's similar? Full die AD106 and stxH.

marees · Jan 31, 2025

jpiniero said:
So $649 perhaps?

I am guessing so.

If it is closer to 7900xtx then $650

& if the 9070 is between 7900gre & 7900xt then $500 - $530

gaav87 · Jan 31, 2025

Chinese came to my twitter and said chiphell boss said i quote "i will chop my dk off if the price is above 5k yuan" and other chinese said its bad translated with google translate and it reads "no more than 5k yuan"
So the price is under 600$

adroc_thurston · Jan 31, 2025

marees said:
& if the 9070 is between 7900gre & 7900xt then $500 - $530

9070 is 7900XT more or less.

gaav87 said:
So the price is under 600$

599. Boooooooooo. Lame.

gaav87 · Jan 31, 2025

adroc_thurston said:
9070 is 7900XT more or less.

599. Boooooooooo. Lame.

What price high and low price would be funny to you ?
I think 649$ or 449$ would be hilarious both ways xD

adroc_thurston · Jan 31, 2025

gaav87 said:
What price high and low price would be funny to you ?
I think 649$ or 449$ would be hilarious both ways xD

$749 will be funny. I need it at $749.

Meteor Late · Jan 31, 2025

adroc_thurston said:
They're not even remotely similar between LNL and STX1 and you know it.
But you know what's similar? Full die AD106 and stxH.

AD106 is a terrible 128 bit bus garbage bin card, sorry if I'm not impressed that you require a really expensive APU and platform to beat it.
Nvidia calling this RTX 4070 is laughable.

adroc_thurston · Jan 31, 2025

Meteor Late said:
AD106 is a terrible 128 bit bus garbage bin card

Wdym garbage bin, it's the full die lmao.

Meteor Late said:
sorry if I'm not impressed that you require a really expensive APU and platform to beat it.

This cope again?

ToTTenTranz · Feb 1, 2025

adroc_thurston said:
$749 will be funny. I need it at $749.

$749 would be super funny. I'd get to watch no one buying RDNA4 because AMD suddenly got too high on the smell of their own farts.

Blackwell is indeed a failure on the hardware front, but on software (number of RTX optimized games, transformer-based DLSS4 looking spectacular, DLSS4 tool doing automatic DLL replacements, deals to get RTX Mega Geometry going, etc.) Nvidia has never been as strong.

AMD's leadership is mighty stupid and insane if they believe this is the time they can get away with the "Nvidia -15% price for -10% performance" strategy that's been failing for a decade.

adroc_thurston · Feb 1, 2025

ToTTenTranz said:
$749 would be super funny. I'd get to watch no one buying RDNA4 because AMD suddenly got too high on the smell of their own farts.

No that would be based.

ToTTenTranz said:
(number of RTX optimized games, transformer-based DLSS4 looking spectacular, DLSS4 tool doing automatic DLL replacements, deals to get RTX Mega Geometry going, etc.) Nvidia has never been as strong.

meh.

ToTTenTranz said:
AMD's leadership is mighty stupid and insane if they believe this is the time they can get away with the "Nvidia -15% price for -10% performance" strategy that's been failing for a decade.

It actually worked since they stopped selling boards at a loss (sans turds like Polaris).

carrotmania · Feb 1, 2025

ToTTenTranz said:
Blackwell is indeed a failure on the hardware front, but on software (number of RTX optimized games, transformer-based DLSS4 looking spectacular, DLSS4 tool doing automatic DLL replacements, deals to get RTX Mega Geometry going, etc.) Nvidia has never been as strong.

Transformer brings is own set of problems, it is no where near perfect. FSR4 is a bigger leap over FSR3 than Transformer is over CNN. All AMD has to do is come up with a fancy name for their new model. Go-bot Model!

Meteor Late · Feb 1, 2025

carrotmania said:
Transformer brings is own set of problems, it is no where near perfect. FSR4 is a bigger leap over FSR3 than Transformer is over CNN. All AMD has to do is come up with a fancy name for their new model. Go-bot Model!

Yeah no shit, the difference between DLSS and FSR was like a galaxy apart, FSR was that bad. So FSR4 being a good improvement should indeed reduce the difference, because it was gigantic.
But there is no doubt AMD should be in a much better position software wise, or should I say feature-wise. RT difference looks like it will be fairly small this time.

carrotmania · Feb 1, 2025

Meteor Late said:
Yeah no shit, the difference between DLSS and FSR was like a galaxy apart, FSR was that bad.

It really wasn't. DLSS was craptastic to 3.5, not long ago. 3.8 is still awful in many cases. FSR4 looks to better than 3.8, whereas Transformer (D4) is better in some regards and actually worse in other cases.

basix · Feb 1, 2025

I assume that FSR4 uses a Vision Transformer as well. The first ViT paper was published 2022, afaik. And when AMD did their option evaluation for FSR4 around a year ago, ViT was one of the options for sure, not only CNN.

But let's wait and see. The first footage from CES looks promising. But SR ist not the whole thing. RR is also in research at AMD. And what about an improved AFMF and RSR?
And because Nvidia has brought MFG to the table, AMD is likely to be "forced" to bring a MFG contender as well.

But it is true: FSR4 will likely close the gap to DLSS. Maybe not completely, but closer than before.

Jan Olšan · Feb 1, 2025

ToTTenTranz said:
"Nvidia -15% price for -10% performance" strategy that's been failing for a decade.

Which card is that comparison for?

ToTTenTranz said:
for a decade.

I kind of doubt that rule has matched reality for so long...

GTracing · Feb 1, 2025

For those who missed it, Hardware Unboxed has a pretty good video of FSR4 from CES. It's off-screen, but you can still tell that it's a huge improvement. I'm not sure if it's as good on DLSS 3.8, but it's close.

Digital foundry has a video comparing FSR 3.1 to alternatives in the same game. It does a good job pointing out the flaws of FSR3

gaav87 · Feb 1, 2025

basix said:
I assume that FSR4 uses a Vision Transformer as well. The first ViT paper was published 2022, afaik. And when AMD did their option evaluation for FSR4 around a year ago, ViT was one of the options for sure, not only CNN.

But let's wait and see. The first footage from CES looks promising. But SR ist not the whole thing. RR is also in research at AMD. And what about an improved AFMF and RSR?
And because Nvidia has brought MFG to the table, AMD is likely to be "forced" to bring a MFG contender as well.

But it is true: FSR4 will likely close the gap to DLSS. Maybe not completely, but closer than before.

AFMF 2.1 is in beta drivers already looks better than 2.0 that is for sure still some ghosting but way better.

Tup3x · Feb 1, 2025

carrotmania said:
It really wasn't. DLSS was craptastic to 3.5, not long ago. 3.8 is still awful in many cases. FSR4 looks to better than 3.8, whereas Transformer (D4) is better in some regards and actually worse in other cases.

No one has done proper comparison yet. All we know that FSR4 looks better than the previous version. Also it sounds like you haven't tested latest DLSS/DLAA (K preset).

steen2 · Feb 1, 2025

carrotmania said:
It really wasn't. DLSS was craptastic to 3.5, not long ago. 3.8 is still awful in many cases. FSR4 looks to better than 3.8, whereas Transformer (D4) is better in some regards and actually worse in other cases.

DLSS had the added benefit of looking good in static scenes. Motion is blur/occlusion fest. FSR is naturally worse.

basix said:
And because Nvidia has brought MFG to the table, AMD is likely to be "forced" to bring a MFG contender as well.

But it is true: FSR4 will likely close the gap to DLSS. Maybe not completely, but closer than before.

FSR 3 has had the option for MFG, but not enabled in release drivers.

Tup3x said:
No one has done proper comparison yet. All we know that FSR4 looks better than the previous version. Also it sounds like you haven't tested latest DLSS/DLAA (K preset).

There's some regression J<->K. Work in progress with FP8 depending on E4M3/E5M2 datatypes for AD/GB.

branch_suggestion · Feb 2, 2025

GL1キャッシュがバッファへと変わる AMD RDNA 4 アーキテクチャ | Coelacanth's Dream

読み取り専用キャッシュとして RDNA 1 アーキテクチャから導入され、RDNA 3 アーキテクチャからは 256KiB (RDNA 1/2 は 128KiB) に増量された GL1キャッシュ (Graphics L1キャッシュ) だが、RDNA

www.coelacanth-dream.com

Nobody has posted this yet, anywhere.

SpudLobby · Feb 3, 2025

Joe NYC said:
BTW, the latest OpenAI O3, according to my understanding, the advances are mainly on inference. Like totally tipping the scales from biggest advances coming from training (in the past) to the biggest advances coming from inference (in o3).

And then, the best inference queries can consume huge amount of compute resources. Over and over...

Man you guys really are behind here. Adroc/Spec is the worst and had just straight up denied AI is useful and thinks the whole thing is a bubble but lol.

Test time compute is a more efficient way of doing inference as opposed to scaling inference laterally with much more parallel compute and memory (more parameters). That is literally the entire point — a bit more time as the model searches and reasons over the problem (which it is both trained and RL’d to do), the other beneficial side effect is that training (and/or using reinforcement learning for) a base model into a test time reasoning model is also less expensive to develop vs a vastly larger “equivalent” model (if such a thing can be said).

It trades off a good bit of space & energy with time at inference, which offers massively improved “efficiency” in the sense of performance/$/FLOP at complex tasks. Specifically, in one paper test time models beat 14x larger models by using a bit more time (being trained to reason). You can also just go see the DeepSeek R1 30-32B model on Macs blow everything else out for a local LLM. V3 base model was good, but this is about TTC.

The effective result here isn’t that larger models are going away, but rather that this is a new (efficient adjusting for the performance) scaling avenue across the board which makes even more expensive and humongous inference economically viable.

And O3-Mini is over an order of magnitude less expensive than O1, while offering more performance.

adroc_thurston · Feb 3, 2025

SpudLobby said:
and thinks the whole thing is a bubble but lol.

Because it is.

coercitiv · Feb 3, 2025

SpudLobby said:
Man you guys really are behind here.

Quoting a post from last year to tell folks they're behind, that's quite energy efficient. I'll chime in with a Merry Christmas! for @Joe NYC.

MS_AT · Feb 3, 2025

SpudLobby said:
You can also just go see the DeepSeek R1 30-32B model on Macs blow everything else out for a local LLM. V3 base model was good, but this is about TTC.

To be factually correct, there is no such thing as DeepSeek R1 30-32B model. What you are talking about are distills.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Member

Senior member

Member

Member

Senior member

Senior member

Senior member

Golden Member

Junior Member

Senior member

Golden Member

Diamond Member

Diamond Member

Senior member