Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

sl0519 · 2025-03-10T09:06:33-0400

So, it doesn’t look as bad as I thought. Why are HUB’s performance figures lower than others, and why is its power consumption much higher than average (including his 9070 review having worse efficiency than 5070)?

linkgoron · 2025-03-10T09:06:36-0400

Glo. said:
Not for long, by the looks of it.

Let's wait and see. Blackwell is a disappointment, but if you want 5080 or 5090 levels of performance - you only have one place to go.

Also, it's not like Blackwell is Nvidia's last gen. Rubin might fix whatever issues Blackwell has, and RDNA5 still has to deliver.

H T C said:
Over here "in my neck of the woods", prices for the 9070 XT @ my usual online shop, range between 700€ and 900€: i'd have no problem with 800€ ... but 900€ seems a bit too much ...

9070 and 9070XT are also very expensive on my side (vs MSRP), but I expect prices to go down in a month or two. People are desperate for graphics cards. I've also been itching to replace my extremely aging system, but due to a large downpayment, I'm tightening the leash for the next few months. Also 9950x3d is just around the corner as well.

SolidQ · 2025-03-10T09:14:55-0400

sl0519 said:
Why are HUB’s performance figures lower than others, and why is its power consumption much higher than average

Yeah he's PowerConsumption is strange
Here is fine

Heartbreaker · 2025-03-10T09:20:33-0400

In hindsight, I think the lack of AMD 1st party model was a warning sign of incoming fake MSRP from AMD as well. Glad I got my 4070 in saner times.

This feels like Crypto Coin and COVID nonsense all over again.

GTracing · 2025-03-10T09:34:51-0400

Some musings about VRAM after the past few days of discussion here.

VRAM usage used to be decided by your monitor resolution and graphics settings. Back during the Polaris/Pascal days, the VRAM amount went up as you went up the product stack:

4GB 1050 ti ($140), 6GB 1060 ($300), 8GB 1070 ($400), 8GB 1080 ($600), 11GB 1080 ti ($700)

Lower VRAM on midrange cards like the 4GB RX 480 was largely seen as fine.

Nowadays, higher resolutions don't matter nearly as much. It still plays a small role in VRAM usage, but no one asks what resolution you're playing at to decide how much VRAM you need. Even at 1080p, 8GB is not enough. And looking at the other end of the product stack, 16GB is fine on the 9070 and 9070 XT. It's created this strange situation where Radeon's whole product lineup needs 16GB VRAM; any GPUs with less VRAM aren't worth buying.

H T C · 2025-03-10T09:36:43-0400

Heartbreaker said:
This feels like Crypto Coin and COVID nonsense all over again.

Yeah ... WITHOUT the Crypto Coin ... and WITHOUT the COVID ...

blckgrffn · 2025-03-10T10:00:42-0400

Heartbreaker said:
In hindsight, I think the lack of AMD 1st party model was a warning sign of incoming fake MSRP from AMD as well. Glad I got my 4070 in saner times.

This feels like Crypto Coin and COVID nonsense all over again.

The vibe in the MC line was very much this. A lot of agitation caused by the blink and you miss Blackwell launches and the talk of “adjusted” MSRPs inbound on the AMD side too. A lot of expectations of all MSRPs rising hundreds of dollars.

I don’t trust prices to go down in the coming months at all, especially here in the states. Tariff yo-yoing is going to ensure that retailers keep prices as high as possible to ensure they are able to cover their own positions, even if there is no active tariff. They will want to cover the replacement cost of the good and that will be an unknown literally until it clears customs.

@DAPUNISHER said it better but this is a weird time. Maybe there are other things in common with the crypto/covid time that weren’t as obvious then either.

I think there is way more belief that it could happen this time. We lived through $1200 3080’s and $900 3070 and here we are again…

carrotmania · 2025-03-10T10:13:37-0400

Link?

gaav87 said:
Also some guys already figured out counter strike2 lower scores...

coercitiv · 2025-03-10T10:17:29-0400

Heartbreaker said:
This feels like Crypto Coin and COVID nonsense all over again.

Ahh, the good old mining days with a rig like the one below. Except, wait a second, why is there a server board underneath?! Why are so many memory channels populated?! No.... not.... NOT AGAIN!

Jokes aside, the image above depicts a "homelab" AI rig, with 16x 3090 allegedly purchased from an old mining business, serviced by an Epyc 7663 /w 512 GB of RAM. People are building these as "low cost" alternatives for local AI compute. How much this affects the market is unknown, but I would wager it's definitely skewing the demand for 4090 / 5090 cards.

I would also not be surprised if at some point we find out that 16GB cards were used to power cheap home labs or worse, some kind of "datacenter" use. We live in times when not even AMD engineers have direct access to their own high-end AI products, they're instead given access to instances in the cloud. Demand of this scale will force all kinds of innovation from the little guys doing research or small scale business.

MrTeal · 2025-03-10T11:12:16-0400

sl0519 said:
So, it doesn’t look as bad as I thought. Why are HUB’s performance figures lower than others, and why is its power consumption much higher than average (including his 9070 review having worse efficiency than 5070)?

HUB/Techspot's raster numbers look fine. I'd imagine the difference in performance just comes down to game selection. HUB has a 6 game RT average, and it looks like they test both Wukong and Indy where RDNA4 still struggles. The other RT titles their numbers show it as faster than the 5070.

Reason #315 why big bar charts comparing 30 GPUs are not ideal.

marees · 2025-03-10T11:32:35-0400

Modders found a way to inject AMD FSR4 support to any DLSS2+/XeSS games

https://videocardz.com/newz/modders-found-a-way-to-inject-amd-fsr4-support-to-any-dlss2-xess-games

FSR 4 compatible games list: https://github.com/cdozdil/OptiScaler/wiki/FSR4-Compatibility-List

Link to code: https://github.com/cdozdil/OptiScaler/commit/2bede03904234e0d315e4f2cbfd32c37a6b90165

DisEnchantment · 2025-03-10T11:52:11-0400

coercitiv said:
Ahh, the good old mining days with a rig like the one below. Except, wait a second, why is there a server board underneath?! Why are so many memory channels populated?! No.... not.... NOT AGAIN!

View attachment 119405

Jokes aside, the image above depicts a "homelab" AI rig, with 16x 3090 allegedly purchased from an old mining business, serviced by an Epyc 7663 /w 512 GB of RAM. People are building these as "low cost" alternatives for local AI compute. How much this affects the market is unknown, but I would wager it's definitely skewing the demand for 4090 / 5090 cards.

I would also not be surprised if at some point we find out that 16GB cards were used to power cheap home labs or worse, some kind of "datacenter" use. We live in times when not even AMD engineers have direct access to their own high-end AI products, they're instead given access to instances in the cloud. Demand of this scale will force all kinds of innovation from the little guys doing research or small scale business.

We would be guilty of that too!!
Not sure where tech is heading, but like most of our competitors we are using AI/ML for almost everything now. Power point, Requirements analysis, Validation with Vision Model, Coding, etc..

DisEnchantment said:
I hope one thing for Zen 6 platform will come to fruition.
EPYC CXL.mem to come to DT.
Turin can interleave CXL and DDR memory regions already, so ideally GPU CXL device attached to the root can see the memory of the host.

Linux is getting patches for address translation for Zen 5. Hopefully this Zen 6 goes further in this direction

CXL Address Translation Support For AMD Zen 5 Sees Linux Patches - Phoronix

www.phoronix.com

With such a setup we could install 1 TB of DDR on CPU and let the GPU use all of that for some LLMs and other interesting use cases. Can turn your Linux PC to some LLM monster
Will not be the most performant but at least can run something interesting.

I was just commenting on this same thing in Zen 6 thread, to get CXL.mem support on DT and use your UDNA GPU for some AI at home.

AMD's mistake was splitting DC and Client GPUs are separate architectures but if unified architecture with UDNA is to fix all that then CXL memory support could be a turn in the opposite direction allowing even more exotic ML models/algorithms to be developed and run on client GPU.
I kind of hope it will happen otherwise tough for budding AI graduates to compete with MS/big corpos for GPU shipments.

adroc_thurston · 2025-03-10T12:04:18-0400

DisEnchantment said:
AMD's mistake was splitting DC and Client GPUs

That's not changing.

coercitiv · 2025-03-10T12:10:05-0400

DisEnchantment said:
We would be guilty of that too!!
Not sure where tech is heading, but like most of our competitors we are using AI/ML for almost everything now. Power point, Requirements analysis, Validation with Vision Model, Coding, etc..

I wasn't point fingers, just giving people a heads up that gamers might already be into yet another dark time of availability and pricing. We'll have to see if the industry can react and fill the gap.

coercitiv · 2025-03-10T12:11:36-0400

marees said:
FSR 4 compatible games list:

lol

GTracing · 2025-03-10T12:20:40-0400

MrTeal said:
HUB/Techspot's raster numbers look fine. I'd imagine the difference in performance just comes down to game selection. HUB has a 6 game RT average, and it looks like they test both Wukong and Indy where RDNA4 still struggles. The other RT titles their numbers show it as faster than the 5070.

Reason #315 why big bar charts comparing 30 GPUs are not ideal.

Their Ray Tracing results are valid, but their power consumption tests are way off, and frankly makes no sense. All three of their 9070 power consumption tests have weird anomalies. In the first one, the 5070 ti draws more power than the 5080. In the second, the 7900 GRE is drawing 74W(!) less than the 7700 XT. In the last one the 9070 draws 60W more than the 5070.

I honestly don't know why they published those results. If they can't do power consumption testing right, then they should just leave it out of the review.

article

Saylick · 2025-03-10T13:17:05-0400

Eurogamer has an interview with Mark Cerny regarding FSR4 and PSSR:
https://www.eurogamer.net/digitalfo...-part-in-the-next-evolution-of-pssr-upscaling

"The neural network (and training recipe) in FSR 4's upscaler are the first results of the Amethyst collaboration," Cerny told us. "And results are excellent, it's a more advanced approach that can exceed the crispness of PSSR. I’m very proud of the work of the joint team!"

Explains why FSR4 is so good for AMD's first AI upscaler attempt. From what I hear, Sony's AI division is really good. Their professional cameras have been using machine learning image recognition to improve autofocus speed and accuracy for years now.

adroc_thurston · 2025-03-10T13:18:52-0400

Saylick said:
Explains why FSR4 is so good for AMD's first AI upscaler attempt. From what I hear, Sony's AI division is really good.

No, it's all AMD. Sony just piggybacks off their roadmap.
They make the stuff, you know.
Including the big iron needed for training.

GaiaHunter · 2025-03-10T13:33:01-0400

jpiniero said:
I think that's what's going on. People just can't control themselves with the FOMO. With nVidia not producing much of anything, AMD is the only game in town.

At some point everyone is going to buy.
Their GPU died.
They bought a new higher Res/higher refresh rate monitor.
They want to play a new game and the performance of their system interferes with their fun.
They want to play a new game but it requires features their current GPU doesn't have.

I was a lot happier paying $200 for a GPU to play games.

My last upgrades were
rx480 to 5700xt - that was about double the performance for double the price.
5700xt to 9070xt - 2.5x to 3x performance (@4k since new monitor) plus additional features, for about 65% higher price.

Compared to CPU going from an i7 6700k to a 7700x for like double the cores and double the performance and additional features for the same price, it is atrocious.

If one is buying every generation to primarily play games, I hope you buying the halo products, because these days the only reason to upgrade every generation is for bragging rights.

Josh128 · 2025-03-10T14:58:09-0400

Saylick said:
Eurogamer has an interview with Mark Cerny regarding FSR4 and PSSR:
https://www.eurogamer.net/digitalfo...-part-in-the-next-evolution-of-pssr-upscaling

Explains why FSR4 is so good for AMD's first AI upscaler attempt. From what I hear, Sony's AI division is really good. Their professional cameras have been using machine learning image recognition to improve autofocus speed and accuracy for years now.

Sony has been researching and producing non-AI upscalers and upscaling techniques since the dawn of HD CRTs, back in the late '90s/early 00's. Their sets generally have better PQ than their competitors, even if they utilize the panel made by the competitor.

TESKATLIPOKA · 2025-03-10T15:08:56-0400

Josh128 said:
The full HD and QHD uplifts are almost solely due to 10% higher game clock and 5% higher boost clock.

And 4K still offers 33% increase. It's not like N44 can't be used for 4K although not in every game.
But If you want to limit It to Full HD or QHD then here is with RT. 55% and 63% difference.

TESKATLIPOKA · 2025-03-10T15:11:07-0400

Timorous said:
Because based on what we know the 9060XT die will be tiny so the thing will be dirt cheap. It would offer similar margin to the 9070XT in a higher volume product.

As for the stack it should be simple

9060XT - 16GB 20gbps ram $330
9060 - 12GB 20gbps ram, cut the bus to 96bit. $250
9050 - rebrand N33 and lower clocks to hit a sub $200 price point.

Any 8GB card for more than $200 will get absolutely shredded in reviews so no point in doing it. A 96bit 12GB card that manages 6700XT tier performance will be far better received than a 128but 8GB card that can sit in the gap between the 6700XT and 7700XT but then suffers from horrid performance/IQ in some of the latest titles due to Vram limits.

N33 was also dirt cheap to make. Cheaper than N44 to be honest and we all know what the price was.

TESKATLIPOKA · 2025-03-10T15:22:27-0400

basix said:
Ah, you seem to be sure about that...

- 32CU @ 3.0 GHz with 18Gbps and 128bit -> 100% bandwidth / FLOPS
- 28CU @ 2.9 GHz with 20Gbps and 96bit -> 98% bandwidth / FLOPS

The full N44 supposedly uses 20gbps memory, so It would be only 88.7% and you need a cutdown version for that 96-bit bus.
But yeah, I can agree that AMD could make such models.
9600 XT: 32CU @ 3.2 GHz with 20Gbps and 128-bit 16GB Vram
9600: 28CU @ 2.75 GHz with 20Gbps and 96-bit 12GB Vram
This doesn't look half bad actually. The problem is that for laptops you are still limited to 8GB Vram.

TESKATLIPOKA · 2025-03-10T15:31:48-0400

Timorous said:
You can do the same extrapolation from 3d centre data and you get the same result, a 9060XT that has an uplift over the 7600XT that is the same as the uplift the 9070XT had over the 7800XT is about 7700XT tier in raster and it will be faster still in RT. Further, given that the 9060XT, 7600XT, 7800XT and 9070XT should all have 16GB of VRAM it is not like we are in a situation where the 8GB performance is overstated due to poor testing practices that hides the stutters or hides the IQ degradation.

N48 has more CU than N32, but N44 didn't increase It. So N44 would need additional frequency to compensate.

Timorous said:
As such a 9060XT 16GB that performs like a 7700XT in raster should probably cost no more than $350 and ideally $330 like the 7600XT costs. Those who think it will cost a lot more I believe to be misguided and those who think it will be cheaper might have a more pessimistic view of where its performance will land.

N48 costs more than N32 by $100, that's in my opinion more than enough proof that N44 won't cost the same as N33. My bet is $50 more.

adroc_thurston · 2025-03-10T15:35:36-0400

TESKATLIPOKA said:
N48 has more CU than N32

Oh gawd 4 more CU.
Groundbreaking.

TESKATLIPOKA said:
that N44 won't cost the same as N33 let alone being cheaper. My bet is $50 more.

It depends.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Member

Platinum Member

Golden Member

Diamond Member

Senior member

Senior member

Diamond Member

Member

Diamond Member

Diamond Member

Senior member

Modders found a way to inject AMD FSR4 support to any DLSS2+/XeSS games​

Golden Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Modders found a way to inject AMD FSR4 support to any DLSS2+/XeSS games