Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

Keller_TT · Jan 12, 2025

Saylick said:
All_The_Watts (formerly rogane I think) says 9070XT is on par or slightly faster than the 4070 Ti Super but not quite the 4080. It also hits >3 GHz with power above 3000W.

9070 non-XT is faster than 4070 Super, but slower than 4070 Ti Super. Clocks in the 2 GHz range with power also in the 200s W.

https://twitter.com/x/status/1878490567014777077

If the 9070 XT is >300W for ~7900 XT performance for MBA card, and less than 4080, then it is massively underwhelming, and so much will have to be under scrutiny. Die size, density, bulk for ML and RT cores, PPA, PPW.
I hope this leak is wrong at least in wattage figures for the reference cards.

Looks like the maximum AMD can do is barely catch up to ADA in performance and efficiency after 2 years, not better it. Then there's the software gap for gaming (FSR), design, engineering. AMD's doctor always orders for less hopium and more copium.

GTracing · Jan 12, 2025

Saylick said:
All_The_Watts (formerly rogane I think) says 9070XT is on par or slightly faster than the 4070 Ti Super but not quite the 4080. It also hits >3 GHz with power above 3000W.

9070 non-XT is faster than 4070 Super, but slower than 4070 Ti Super. Clocks in the 2 GHz range with power also in the 200s W.

https://twitter.com/x/status/1878490567014777077

How long does he spend with all these emojis?

iLLusiveMan · Jan 12, 2025

Saylick said:
All_The_Watts (formerly rogane I think) says 9070XT is on par or slightly faster than the 4070 Ti Super but not quite the 4080. It also hits >3 GHz with power above 3000W.

9070 non-XT is faster than 4070 Super, but slower than 4070 Ti Super. Clocks in the 2 GHz range with power also in the 200s W.

https://twitter.com/x/status/1878490567014777077

Didn't he also say that N48 is 240mm^2 ? That aged like milk

gaav87 · Jan 12, 2025

Keller_TT said:
If the 9070 XT is >300W for ~7900 XT performance for MBA card, and less than 4080, then it is massively underwhelming, and so much will have to be under scrutiny. Die size, density, bulk for ML and RT cores, PPA, PPW.
I hope this leak is wrong at least in wattage figures for the reference cards.

Looks like the maximum AMD can do is barely catch up to ADA in performance and efficiency after 2 years, not better it. Then there's the software gap for gaming (FSR), design, engineering. AMD's doctor always orders for less hopium and more copium.

Worse he is saying 9070xt >300W OC (AIB) is >4070tiS not MBA.

adroc_thurston · Jan 12, 2025

Saylick said:
All_The_Watts (formerly rogane I think)

No, that's greymon. He doesn't know stuff.

soresu · Jan 12, 2025

To any of the gfx rumor mongers out there, please move to bluesky already 🙏

Saylick · Jan 12, 2025

gaav87 said:
Worse he is saying 9070xt >300W OC (AIB) is >4070tiS not MBA.

Since you already have drivers, does your TPU chart reflect what you’re seeing from final drivers or is that just based on leaks?

https://twitter.com/x/status/1878506410138361955

soresu · Jan 12, 2025

SolidQ said:
Your 8800GT lived 5 years? Mine like 1 year, then changed to HD 4850

Wow, super similar to my experience a gen later.

I had 9600GT and switched to HD 5770 about a year later.

Last time I had an nVidia GPU.

Heartbreaker · Jan 12, 2025

SolidQ said:
8800 series doesn't long live, alot gpu was died.

Mine was also extremely durable. It in was my daily PC for 14 years. It was still running when I retired that PC, though when I tried to boot that PC month later to check something the PSU gave a squeal and died...

Though after about 5 years, it could only play old games.

gaav87 · Jan 12, 2025

Saylick said:
Since you already have drivers, does your TPU chart reflect what you’re seeing from final drivers or is that just based on leaks?

https://twitter.com/x/status/1878506410138361955

Punisher said no name calling... And NO I dont have drivers.

Keller_TT · Jan 12, 2025

gaav87 said:
Worse he is saying 9070xt >300W OC (AIB) is >4070tiS not MBA.

If that turns out true, then the MBA card might be rated for 265W, perform like 7900GRE + 5% (~4070 Ti), then a +15% TDP AIB card with good cooler could push TDP up to 310W, and make it like a 7900XT raster for the same PPW.

It's 7800 XT all over again then. So, can only be <$500 because it could well flunk against the 5070 in RT+DLSS. That's why the reports of $479 for the base XT perhaps.
Beating 5070 Ti and 4080 huh?

Saylick · Jan 12, 2025

gaav87 said:
Punisher said no name calling... And NO I dont have drivers.

Not sure how what I posted was name calling but hey, a hit dog will holler, even if it wasn’t a hit I guess.

gaav87 · Jan 12, 2025

Saylick said:
Not sure how what I posted was name calling but hey, a hit dog will holler, even if it wasn’t a hit I guess.

guilty of what ? That i heard some ppl already have drivers since yesterday ?

adroc_thurston · Jan 12, 2025

Keller_TT said:
If that turns out true, then the MBA card might be rated for 265W, perform like 7900GRE + 5% (~4070 Ti), then a +15% TDP AIB card with good cooler could push TDP up to 310W, and make it like a 7900XT raster for the same PPW

All of that is wrong. Next.

Keller_TT said:
That's why the reports of $479 for the base XT perhaps.

The price is last minute. Always. It's a GPU war classic.

Keller_TT said:
Beating 5070 Ti and 4080 huh?

Well it depends.

soresu · Jan 12, 2025

Heartbreaker said:
real Tensor core equivalents

RDNA4 has drastically increased the performance of certain ML data types, but it's still not the matrix cores CDNA is using.

Just overhauled CUs for now.

Presumably UDNA will change that, though hopefully not at the expense of CU level integration as I would imagine that would introduce latency to any ML based techniques enhancing perf or image quality like FSR4 or neural radiance cache.

adroc_thurston · Jan 12, 2025

Heartbreaker said:
real Tensor core equivalents

MFMA isn't due until RDNA5 and it's there for M$ reasons anyway.

soresu said:
not at the expense of CU level integration

where do you even bolt the MFMA besides the SIMD itself.
It's a core per SIMD, but RDNA5 is client so probably only half the SIMDs have MFMA.

soresu · Jan 12, 2025

adroc_thurston said:
where do you even bolt the MFMA besides the SIMD itself

Dunno, but I do seem to remember Turing having some problems with latency due to jumping between CUDA cores and tensor cores.

Keller_TT · Jan 12, 2025

adroc_thurston said:
All of that is wrong. Next.

The price is last minute. Always. It's a GPU war classic.

Well it depends.

Ha ha. At least you give me a chance to hope for something better. But now my base expectations have been revised and I can take a pleasant surprise if it comes along.

Ofc I'm still curious about the actual silicon budget, density and the μarch and what they did to the CUs for ML and RT. So, if that's where the main focus & budget went, then I hope it was worth it and not below potential.

adroc_thurston · Jan 12, 2025

soresu said:
Dunno, but I do seem to remember Turing having some problems with latency due to jumping between CUDA cores and tensor cores.

It's not a problem with latency, just that the GEMM units hoards the VRF.

Keller_TT said:
So, if that's where the main focus & budget went

They main focus went into bumping per-CU and per-bit oomph.
It's very evident too given the perf.

Keller_TT · Jan 12, 2025

soresu said:
Dunno, but I do seem to remember Turing having some problems with latency due to jumping between CUDA cores and tensor cores.

This has been a focus area for Blackwell as it brings tighter integration between the CUDA cores and the Tensor cores. An overview is on their webpage, and the whitepaper is yet to be put up. But it felt like a 1st version of NV UDNA.

adroc_thurston · Jan 12, 2025

Keller_TT said:
This has been a focus area for Blackwell as it brings tighter integration between the CUDA cores and the Tensor cores

You're reading too much into puff marketing for a very very very underwhelming uarch.

gaav87 · Jan 12, 2025

Well GPU-Z leaked clocks match powercolor oc bios

ToTTenTranz · Jan 12, 2025

Keller_TT said:
This has been a focus area for Blackwell as it brings tighter integration between the CUDA cores and the Tensor cores. An overview is on their webpage, and the whitepaper is yet to be put up. But it felt like a 1st version of NV UDNA.

During the keynote what Jensen said was they're now running tensor ops in the cuda cores and not just the tensor cores, and that allows them to use AI in some game-specific instructions.
I don't think that means they can just add the tensor output from cuda/shader cores to the tensor cores, as they still share the same L1 and L2 IIRC.

adroc_thurston said:
You're reading too much into puff marketing for a very very very underwhelming uarch.

Underwhelming or not, they're completely alone from the ~$500 up, which means they can ask how much ever they want for the 5090.

adroc_thurston · Jan 12, 2025

ToTTenTranz said:
they're now running tensor ops in the cuda cores and not just the tensor cores

They could always do that? GEMM is GEMM.
A 'tensor core' is just a hardwired unprogrammable GEMM accelerator.

ToTTenTranz said:
which means they can ask how much ever they want for the 5090

They very much can't which is why 5090 price is right in line with the pretty massive BOM bump.

gaav87 · Jan 12, 2025

Damn you are on point.

ToTTenTranz said:
During the keynote what Jensen said was they're now running tensor ops in the cuda cores and not just the tensor cores, and that allows them to use AI in some game-specific instructions.
I don't think that means they can just add the tensor output from cuda/shader cores to the tensor cores, as they still share the same L1 and L2 IIRC.

Underwhelming or not, they're completely alone from the ~$500 up, which means they can ask how much ever they want for the 5090.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Member

Senior member

Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Member

Attachments

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member