Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

Ajay · May 28, 2024

branch_suggestion said:
I do say, trusting a Umineko poster on face value never ends well.
Unless it's Battler.

NVM, page didn't refresh.

Mahboi · May 28, 2024

blackangus said:
This is an interesting statement to me. I don't do any development on GPU's, and software is my second priority, so can you expand on this statement to help me understand a bit more of what is improving and how that will help game development? (Will it?)

Afraid I can't, I'm a hack.

But although more competent people surely will, I'll give you my 2c:
The history of gaming libs and indirectly GPU libs started very early in computing, most 3D rendering started in the early 90s (OpenGL 1.0 is a 1991 program and all the way to OpenGL 4.6, it is fully retrocompatible, meaning your 2015 OGL renderer is running C lines from 1991).

This means that a lot of "old" GPU stuff is really, really outdated. We're talking 10+ years before multithreading.
In the late 2000s NV started CUDA, and around 2014 AMD released Mantle which would become the core for Vulkan (and later DX12, which is a radical departure from DX 1-11, which is just like OpenGL fully retrocompatible).

Vulkan on the one hand and HIP/CUDA on the other all rebuilt the core of how a GPU operates from the software side, particularly dispatching/command buffer, batching calls, compute calls, etc. I can't give you any details (a real GPU programmer will certainly do better, ask Matias Goldberg or maybe Osvaldo Doederlein on Twitter and they'll probably know a lot more than me), but as a "broad rule", they've taken out a lot of simpler, single threaded, low optimised calls that divided the rendering process in Vertex shaders (make triangles), Texture shaders (give them textures) and Pixel shaders (I forgot what they do, I told you I'm a hack), and re-did them every frame, into a more versatile/modern architecture that batches compute calls, batches multiple shaders and reorders them, loads up a command buffer and dumps it all on the GPU at once...

NV's Mesh shaders (AMD's primitive shaders were very alike to them, but didn't win the fight for MS/Sony) essentially build 3D worlds differently, in a system that is much more parallel, batched and closer to the "compute model". It's a new paradigm on making graphics and can use GPUs far more freely/efficiently with batched calls, reorganisation of call order, and better parallelism (on CPU with multithreading and GPU with just using them more smartly).

Life of a triangle - NVIDIA's logical pipeline

developer.nvidia.com

Introduction to Turing Mesh Shaders | NVIDIA Technical Blog

Turing introduces a new programmable geometric shading pipeline, mesh shaders,, enabling threads to cooperatively generate compact meshes on the chip.

developer.nvidia.com

Primitive Shader: AMD's Patent Deep Dive

Primitive shader was mentioned in The Road to PS5 video, many suspected it was just another term for Mesh Shader, some said they are different concepts and thus PS5 would not support mesh shader. AMD's own marketing did not help clear the confusion, so I decided to take a look at their patent...

www.resetera.com

GPU synchronization in Godot 4.3 is getting a major upgrade

Say goodbye to placing barriers, hello to the new acyclic graph

godotengine.org

To think I've read all of those and forgot like 60% of it...I spend way too much time learning stuff and moving on to learn more stuff instead of making money.

beginner99 · May 29, 2024

leoneazzurro said:
If that was true, then a 3-chiplet (or even 4) variant should have been relatively simple, and it would have in part satisfied the high end market. So I am quite skeptic about that.

I don't think so. 2 dies is much easier than 4 due to communication between the dies. it adds up quickly and you don't want a connection between every single die and then maybe lag becomes a problem. 2 probably also don't need special packing technology and it limits the amount of redundance. N44 would be a lot larger if it needed to support 2 or more connections vs just one.

leoneazzurro · May 29, 2024

Larger, for sure. Much larger, I don't think so. In any case, the rumor was debunked as fake.

marees · May 29, 2024

Getting impatient to know potential release date of RDNA 4. Will it be put on hold until FSR 4/PS5 pro with AI upscaling ?

AMD used to leak like a sieve during Raja Koduri days
I wonder what Lisa Su did/is doing to the leakers 🤔

linkgoron · May 29, 2024

marees said:
Getting impatient to know potential release date of RDNA 4. Will it be put on hold until FSR 4/PS5 pro with AI upscaling ?

AMD used to leak like a sieve during Raja Koduri days
I wonder what Lisa Su did/is doing to the leakers 🤔

They moved to Intel and then left to a startup.

jpiniero · May 29, 2024

AMD Radeon PRO W7900 for AI with dual-slot design set to launch in June - VideoCardz.com

Slimmer Radeon W7900 AMD is set to be launching a new professional GPU. AMD is planning an update to its professional RDNA3 lineup. The Radeon PRO W7900 is said to be receiving a variant designed for AI workloads. This claim comes from Hoang Anh Phu, who has previously leaked several AMD...

videocardz.com

Looks like Lisa's got more important products to talk about at Computex... the W7900 AI Edition!

branch_suggestion · May 29, 2024

jpiniero said:
AMD Radeon PRO W7900 for AI with dual-slot design set to launch in June - VideoCardz.com

Slimmer Radeon W7900 AMD is set to be launching a new professional GPU. AMD is planning an update to its professional RDNA3 lineup. The Radeon PRO W7900 is said to be receiving a variant designed for AI workloads. This claim comes from Hoang Anh Phu, who has previously leaked several AMD...

videocardz.com

Looks like Lisa's got more important products to talk about at Computex... the W7900 AI Edition!

You laugh but it sets the foundations for N50.

Mahboi · May 29, 2024

branch_suggestion said:
You laugh but it sets the foundations for N50.

I laugh indeed.

Tuna-Fish · May 29, 2024

jpiniero said:
AMD Radeon PRO W7900 for AI with dual-slot design set to launch in June - VideoCardz.com

Slimmer Radeon W7900 AMD is set to be launching a new professional GPU. AMD is planning an update to its professional RDNA3 lineup. The Radeon PRO W7900 is said to be receiving a variant designed for AI workloads. This claim comes from Hoang Anh Phu, who has previously leaked several AMD...

videocardz.com

Looks like Lisa's got more important products to talk about at Computex... the W7900 AI Edition!

I don't really understand what specifically makes this a great AI product? Unless they managed to score 24Gb modules from somewhere, there seems to be nothing here but AI on the shroud and a slimmer cooler?

jpiniero · May 29, 2024

Tuna-Fish said:
I don't really understand what specifically makes this a great AI product? Unless they managed to score 24Gb modules from somewhere, there seems to be nothing here but AI on the shroud and a slimmer cooler?

I am wondering if 512-bit would be possible with N31.. but I think it's just a minor refresh so they could put AI in the name. Cuz you know, everything is AI now.

gdansk · May 29, 2024

More memory = AI.
AI = Premium.

Aapje · May 29, 2024

Tuna-Fish said:
I don't really understand what specifically makes this a great AI product? Unless they managed to score 24Gb modules from somewhere, there seems to be nothing here but AI on the shroud and a slimmer cooler?

Dual slot blower means that companies can put multiple of these card on a server motherboard and run jobs in parallel. Nvidia sells slim GPUs to companies for a similar reason.

Mopetar · May 29, 2024

Tuna-Fish said:
I don't really understand what specifically makes this a great AI product?

Huge memory and ability to crunch a lot of data even if it's something like INT4. If the GPU is doing those calculations using a 32-bit ALU it's wasting a lot of the hardware.

dhruvdh · May 30, 2024

Tuna-Fish said:
I don't really understand what specifically makes this a great AI product? Unless they managed to score 24Gb modules from somewhere, there seems to be nothing here but AI on the shroud and a slimmer cooler?

With two of these (and they want you to use two, because dual slot), you can run a Llama 3 70B class models in FP8 locally.

That means basically running it lossless. The Llama 3 70B model is much, much "smarter" than the original ChatGPT release.

If you want to have this model do something for you in a loop, be always listening, run generation on individual rows a CSV file with lots of rows - it increasingly becomes extremely expensive to use cloud.

Personally, I would want it for help with organizing my Zotero library. Add tags based on paper title, abstract.

I would also want to use it to ingest my bank/credit card statements, and categorize transactions, help turn them into text double-entry ledgers.

But the current W7900 is 4000$ and hard to buy. You would need 2. Let's see what they do with pricing.

SolidQ · May 30, 2024

Mean no RDNA4 soon?

ASUS and Gigabyte launch new Radeon RX 6750 GRE graphics cards - VideoCardz.com

Radeon RX 6750GRE gets new updates ASUS and Gigabyte are introducing new Radeon GPUs, seemingly only for Chinese market. The RX 6750 GRE is a special edition graphics card released in China last year as an alternative to the RX 7600, aimed at offloading RDNA2 models. These cards launched at...

videocardz.com

soresu · May 30, 2024

SolidQ said:
Mean no RDNA4 soon?

ASUS and Gigabyte launch new Radeon RX 6750 GRE graphics cards - VideoCardz.com

Radeon RX 6750GRE gets new updates ASUS and Gigabyte are introducing new Radeon GPUs, seemingly only for Chinese market. The RX 6750 GRE is a special edition graphics card released in China last year as an alternative to the RX 7600, aimed at offloading RDNA2 models. These cards launched at...

videocardz.com

They could just be trying to flush old inventory.

blckgrffn · May 30, 2024

soresu said:
They could just be trying to flush old inventory.

This was my thought. Time to flush RDNA2 as expediently as possible.

marees · May 30, 2024

blckgrffn said:
This was my thought. Time to flush RDNA2 as expediently as possible.

Ot they could launch navi 48 first & navi 44 later

soresu · May 30, 2024

Mopetar said:
Huge memory and ability to crunch a lot of data even if it's something like INT4. If the GPU is doing those calculations using a 32-bit ALU it's wasting a lot of the hardware.

Doesn't it just do many 4 bit calculations in a single 32 bit cycle?

blckgrffn · May 30, 2024

marees said:
Ot they could launch navi 48 first & navi 44 later

A staggered launch makes sense because you can keep your product in the news cycle for longer, but I don't think AMD wants to drag this out. Their best chance for the longest possible competitive relevance of these SKUs is to roll them out as quick as strategy and supply chain allows. This could be a rare window where they upstage the nvidia SKUs arrayed against their specific price points by a margin large enough to capture some mindshare.

I might be very wrong.

Rekluse · May 30, 2024

Leaks are in...

https://twitter.com/x/status/1796254005389312451

GodisanAtheist · May 30, 2024

Rekluse said:
Leaks are in...

https://twitter.com/x/status/1796254005389312451
View attachment 100020

Ok so guesstimation time:

AW2 RT = 42 FPS hey that's pretty good even if it's bad for a 250mm2 die!

Cyberpunk PT = 8.7 FPS... uhm, hey that's double unplayable!

Mahboi · May 30, 2024

Rekluse said:
Leaks are in...

https://twitter.com/x/status/1796254005389312451
View attachment 100020

Damn, you were faster.

Most Highly Credible And Believeable and True Leak of All Time™

Hans Gruber · May 30, 2024

What's the over/under on the 8900xtx (RDNA4 version of 7900xtx) reaching at least 4090 performance or better?

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Lifer

Golden Member

Diamond Member

Golden Member

Senior member

Platinum Member

Lifer

Senior member

Golden Member

Golden Member

Lifer

Diamond Member

Golden Member

Diamond Member

Junior Member

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Member

Diamond Member

Golden Member

Platinum Member