Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

Saylick · Jul 15, 2023

Heartbreaker said:
IMO, AMD still doesn't have dedicated Machine Learning Tensor HW in RDNA3 cards. They are just using General FP Compute HW to brute force it, and RNDA 3 boosted FP compute a lot.

For AI, AMD is touting RDNA3 Bfloat16 improvements, over RDNA 2, but it's only proportional to their overall improvement in RNDA3 floating point improvements.

Here is the AMD AI improvement claim for RDNA3:

https://www.amd.com/en/technologies/rdna

This is just the proportion of General FP compute performance, not some new dedicated HW.

IMO RDNA 4 will get the dedicated AI-ML HW, that Phoenix APU already appears to have.

An FPGA inside a GPU? Seems interesting considering that the FPGA could also be reconfigured to do other things, like encoding or signal processing.

Ajay · Jul 15, 2023

Saylick said:
An FPGA inside a GPU? Seems interesting considering that the FPGA could also be reconfigured to do other things, like encoding or signal processing.

I don’t know where you saw FPGA, but it will be fixed function units. FPGA makes no sense on a consumer GPU. Even on a server grade GPU, it would only make sense as an on package 'chiplet'.

Saylick · Jul 15, 2023

Ajay said:
I don’t know where you saw FPGA, but it will be fixed function units. FPGA makes no sense on a consumer GPU. Even on a server grade GPU, it would only make sense as an on package 'chiplet'.

Mmm, you're right. It's a tiled approach with each tile having vector units. I had the impression that since it was Xilinx tech, it was an FPGA by default. I guess I was wrong.

Joe NYC · Jul 15, 2023

adroc_thurston said:
Wasn't really an expectation at all, it was what AMD told during their FAD.View attachment 83038

Thanks for the slide.

Since AMD projected > 50% performance per watt increase (just months prior to launch) and got almost nothing shows that RDNA3 has a quite e bit of untapped potential that might possibly be harnessed one day by fixes in the silicon or drivers.

Heartbreaker · Jul 15, 2023

Saylick said:
Mmm, you're right. It's a tiled approach with each tile having vector units. I had the impression that since it was Xilinx tech, it was an FPGA by default. I guess I was wrong.

Xilinx has a lot more than just FPGA. Along with the dedicated AI cores, they also have great media encoder cores, that AMD GPUs could really use.

I'm expecting RDNA 4 will at minimum have the AI cores, but hopefully the Xilinx Media encoder as well.

adroc_thurston · Jul 15, 2023

Heartbreaker said:
I'm expecting RDNA 4 will at minimum have the AI cores

No, and AIE tiles are very very very very very different things that aren't in any way related to GPUs.

Heartbreaker · Jul 15, 2023

adroc_thurston said:
No, and AIE tiles are very very very very very different things that aren't in any way related to GPUs.

If they can add them to an APU, they can add them to a GPU.

ML cores are table stakes now. If AMD can't figure out how to add ML cores to a GPU, they should just give up.

adroc_thurston · Jul 15, 2023

Heartbreaker said:
If they can add them to an APU, they can add them to a GPU.

Again, those are very very different things build for different reasons.

Heartbreaker said:
ML cores are table stakes now. If AMD can't figure out how to add ML cores to a GPU, they should just give up.

Is there a single workload in client GPUs that uses matrix cores?
Like, phones has matrix math acc piles for 5 years and workloads don't exist yet.

Heartbreaker · Jul 15, 2023

adroc_thurston said:
Again, those are very very different things build for different reasons.

Not really. AMD is obviously recognizing the need for ML cores. If they are incorporating them in their APUs, the will definitely be adding them either to their Desktop CPUs or Desktop GPUs. I would bet on GPUs.

adroc_thurston said:
Is there a single workload in client GPUs that uses matrix cores?

ML cores are used for high quality temporal scaling (DLSS, XeSS), and there are more non gaming use cases that people want to use them for like image processing, or generative image creation with applications like Stable Diffusion.

igor_kavinski · Jul 15, 2023

Heartbreaker said:
If AMD can't figure out how to add ML cores to a GPU, they should just give up.

How to accelerate AI applications on RDNA 3 using WMMA

This blog is a quick how-to guide for using the WMMA feature with our RDNA 3 GPU architecture using a Hello World example.

gpuopen.com

Stay tuned for future RDNA 3 WMMA support in rocWMMA. This library is portable with nvcuda::wmma and it supports MFMA and (soon) WMMA instructions, thus allowing your application to have hardware-accelerated ML in both RDNA 3 and CDNA 1/2 based systems.

Functionality is there. Software isn't ready yet.

GodisanAtheist · Jul 15, 2023

AMD's problem is and seemingly always will be in the software, not the hardware.

Adding ML hardware to their dies should be a piece of cake for the HW team. Making sure anything actually uses it on the other hand is where the real rub is.

Saylick · Jul 15, 2023

You know, now that I've read a little bit more on Phoenix's AI Engine, I doubt it will be useful for RDNA4 when the GPU itself is already a wide vector engine. Nvidia specifically uses a tensor unit within each SM that can do low precision matrix math much faster than executing as multiple vector instructions. For RDNA4 to be comparable, it needs a method to fuse vector units into a single tensor unit that has much higher throughput. RDNA3 already has instructions that let's it do matrix math via vector instructions but it's not going to have the same throughput as a dedicated tensor path. There's a reason why CDNA has dedicated tensor units.

Heartbreaker · Jul 15, 2023

GodisanAtheist said:
AMD's problem is and seemingly always will be in the software, not the hardware.

Adding ML hardware to their dies should be a piece of cake for the HW team. Making sure anything actually uses it on the other hand is where the real rub is.

They can't exactly have DL Temporal scaling software, until they have HW that can do it fast enough, so it actually improves the frame rate a competitive amount.

My bet is RDNA 4 gets a big leap in ML performance, and new DL scaling software to go with it.

adroc_thurston · Jul 15, 2023

Heartbreaker said:
AMD is obviously recognizing the need for ML cores.

Yes they need it for marketing.
This is a mere replica of 2018 phone AI rage.
Been there, seen that.

Heartbreaker said:
ML cores are used for high quality temporal scaling (DLSS, XeSS)

You can do it without ML just as well, see UE5 TSR.

Heartbreaker said:
and there are more non gaming use cases that people want to use them for like image processing, or generative image creation with applications like Stable Diffusion.

Margin of error percentage of relevant user base in client.

GodisanAtheist said:
AMD's problem is and seemingly always will be in the software, not the hardware.

AMD's problem in DC GPUs until MI300 has explicitly been the hardware.

Heartbreaker said:
My bet is RDNA 4 gets a big leap in ML performance, and new DL scaling software to go with it.

Lol no.

Heartbreaker · Jul 15, 2023

adroc_thurston said:
Yes they need it for marketing.

Not that I buy that theory, but even if you did, why don't think it matters at least equally for GPU marketing?

Testing Stable diffusion is kind of Normal now, and there will only be ML applications and benchmarks going forward. It looks pretty bad, when AMD trails all NVidia and Intel cards.

Even if you believe it's only for marketing, they still need it.

adroc_thurston said:
You can do it without ML just as well, see UE5 TSR.

As with FSR, TSR produces inferior image quality.

adroc_thurston said:
Margin of error percentage of relevant user base in client.

It's more than that and growing. I'm definitely in the camp, that AMD's lack of ML capability has relegated them to: "Only buy if it's at STEEP discount to NVidia".

Timorous · Jul 15, 2023

Heartbreaker said:
My bet is RDNA 4 gets a big leap in ML performance, and new DL scaling software to go with it.

I hope not.

It would be better for the hardware to be fast enough to not need upscaling at all. A better TAA solution is fine but upscaling is just a crutch for hardware that is too slow imo.

adroc_thurston · Jul 15, 2023

Heartbreaker said:
Not that I buy that theory, but even if you did, why don't think it matters at least equally for GPU marketing?

Because no OEM and no ISV mandates marketing points in client dGPs, unlike in mobile.

Heartbreaker said:
Testing Stable diffusion is kind of Normal now, and there will only be ML applications and benchmarks going forward. It looks pretty bad, when AMD trails all NVidia and Intel cards

We're talking active userbase and not benchmarkers being silly.

Heartbreaker said:
As with FSR, TSR produces inferior image quality.

Nah; and again, not worth the area spent.

Heartbreaker said:
It's more than that and growing

No lol, NV only ever does it for cheapo CUDA devkit purposes.

Heartbreaker said:
I'm definitely in the camp, that AMD's lack of ML capability has relegated them to: "Only buy if it's at STEEP discount to NVidia".

Good that you've said it outright, AMD will never ever make your green stuff cheaper.
Enjoy!

Heartbreaker · Jul 15, 2023

adroc_thurston said:
Because no OEM and no ISV mandates marketing points in client dGPs, unlike in mobile.

We're talking active userbase and not benchmarkers being silly.

Nah; and again, not worth the area spent.

No lol, NV only ever does it for cheapo CUDA devkit purposes.

Good that you've said it outright, AMD will never ever make your green stuff cheaper.
Enjoy!

If AMD listened to you, they would soon be in third place behind Intel. I think they are a little more interested in competing than that.

adroc_thurston · Jul 15, 2023

Heartbreaker said:
If AMD listened to you

It's the opposite, I quote them most of the time.
Their cost analysis teams are some of the best in the industry so they definitely know it better.

Heartbreaker said:
they would soon be in third place behind Intel

Intel is considered a running joke in both NV and AMD GPU circles.
just look at ponte trainwreccio.

Heartbreaker said:
I think they are a little more interested in competing than that.

AMD is a margin-driven company, 'competing' in your definition is making NV stuff cheaper by shedding dGP gm%% which ain't happening.

Tuna-Fish · Jul 15, 2023

adroc_thurston said:
They've cleaned off the inventory long ago, listen to their earnings calls.

AMD might have cleared it off their balance sheets, but N22 and N21 are still being actively sold in significant volume. AMD can't launch a new product that obsoletes them until they are gone, unless they want to really hurt their board/channel partners. If they launch N32 right now, they have to do it at a price point where it is no more attractive than N22 and N21.

adroc_thurston · Jul 15, 2023

Tuna-Fish said:
AMD can't launch a new product that obsoletes them until they are gone

Yes they can, things take a while to ramp in the channel.

Tuna-Fish said:
If they launch N32 right now, they have to do it at a price point where it is no more attractive than N22 and N21.

I mean they've launched N33 just fine into a ton of attractive N23 deals so...

RnR_au · Jul 15, 2023

GodisanAtheist said:
Adding ML hardware to their dies should be a piece of cake for the HW team. Making sure anything actually uses it on the other hand is where the real rub is.

From my understanding, AMD is reluctant in adding 'tensor' silicon to consumer hardware. They would rather have flexible silicon that can be used in multiple ways. The Unreal engine via Lumen is showing how you can solve a tough global illumination problem without dedicated hardware. I believe other game engines are working on similar solutions.

It would be interesting to see what silicon could come out if AMD went to Epic and asked what they would prefer to get hardware accelerated. But maybe this is already coming via their FPGA on their cpu's... if that is still coming.

FPGA on gpu's? Why not.

adroc_thurston · Jul 15, 2023

RnR_au said:
The Unreal engine via Lumen is showing how you can solve a tough global illumination problem without dedicated hardware

Lumen can be made a helluva lot faster with programmable RTRT accelerators which is something we're not really having due to both h/w and DXR immaturity.

jpiniero · Jul 16, 2023

adroc_thurston said:
I mean they've launched N33 just fine into a ton of attractive N23 deals so...

I still believe that the RX 7600 got a release because they overbought N33 wafers, expecting much more laptop demand than they got.

adroc_thurston · Jul 16, 2023

jpiniero said:
I still believe that the RX 7600 got a release because they overbought N33 wafers

Oh no, AMD is extremely careful on wafer allocation.

jpiniero said:
Whether there's still any demand from OEMs for N32 laptop will probably end up deciding if they go forward with N32's release.

It missed the cycle so the demand is none until the next year.
Better luck next time! haha

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Diamond Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member