Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

ToTTenTranz · Aug 28, 2024

Only 64MB Infinity Cache on Navi 48, 66% of what's in Navi 31 and the same as Navi 32:

https://twitter.com/x/status/1828511752939520028

I guess it was expected, if the chip is actually as small as some reports claim (240mm^2?).

branch_suggestion · Aug 28, 2024

https://twitter.com/x/status/1828795701712875562

Took long enough.
Top 4 is diename/membus/IC/memspeed
Strix Halo is Z5/Z5LP/total core count/membus/IC/memspeed
Bottom is Sonoma Valley, Z5c/Z5LP and/or CU/membus/memspeed

https://twitter.com/x/status/1828818286202773757

Kraken Z5/Z5c/CU/membus/memspeed

moinmoin · Aug 28, 2024

branch_suggestion said:
Kraken Z5/Z5c/CU/membus/memspeed

Wait... really 3x Z5 + 5x Z5c? That's be quite the odd amounts for a company sticking to even numbers up to now.

igor_kavinski · Aug 28, 2024

moinmoin said:
Wait... really 3x Z5 + 5x Z5c? That's be quite the odd amounts for a company sticking to even numbers up to now.

If Intel can have pentacore CPUs like 7305 and U300, AMD must've figured they can do better. With 16 threads, should be a decent performer/watt.

Arctic Islands · Aug 28, 2024

ToTTenTranz said:
Only 64MB Infinity Cache on Navi 48, 66% of what's in Navi 31 and the same as Navi 32:

https://twitter.com/x/status/1828511752939520028

I guess it was expected, if the chip is actually as small as some reports claim (240mm^2?).

RDNA3.5 WGPs are about 20-25% bigger than RDNA3, and I assume RDNA4 WGPs are somewhat similar in size or even bigger.

It doesn't seem like a 240mm^2 monolithic GPU could have enough space to put 4 SEs with 32 WGPs and 64MB MALL cache inside.

Tuna-Fish · Aug 28, 2024

Arctic Islands said:
RDNA3.5 WGPs are about 20-25% bigger than RDNA3, and I assume RDNA4 WGPs are somewhat similar in size or even bigger.

It doesn't seem like a 240mm^2 monolithic GPU could have enough space to put 4 SEs with 32 WGPs and 64MB MALL cache inside.

AMD has managed to dramatically shrink the L3 on Zen5 vs Zen4, I expect similar efforts for the MALL in RDNA4. N5 -> N4P is also a small shrink.

Mahboi · Aug 29, 2024

branch_suggestion said:
Kraken Z5/Z5c/CU/membus/memspeed

THREE Zen 5?
What?

Mahboi · Aug 29, 2024

Tuna-Fish said:
AMD has managed to dramatically shrink the L3 on Zen5 vs Zen4, I expect similar efforts for the MALL in RDNA4. N5 -> N4P is also a small shrink.

I thought SRAM was more or less unshrinkable anymore?

marees · Aug 29, 2024

Mahboi said:
THREE Zen 5?
What?

Z5 = 3
Z5c = 5

(The odd numbers Reminds me of Fibonacci series)

Mahboi · Aug 29, 2024

How do they even design it to end up with 3 + 5?
Ironically, it's perfect. I've been thinking that 4 big cores is a little much for gaming and 2 may be insufficient. With 5 smaller cores, it should be a really solid machine, and 6WGP on a 128 bit bus should be around a 8600g's perf.
It'll be in a million emulation boxes for sure.

marees · Aug 29, 2024

Mahboi said:
How do they even design it to end up with 3 + 5?
Ironically, it's perfect. I've been thinking that 4 big cores is a little much for gaming and 2 may be insufficient. With 5 smaller cores, it should be a really solid machine, and 6WGP on a 128 bit bus should be around a 8600g's perf.
It'll be in a million emulation boxes for sure.

Steam deck 2

Rog ally non-extreme version refresh

DisEnchantment · Aug 29, 2024

RDNA4 has FP8 and BF8 support, interesting.

[MLIR][AMDGPU] Add support for fp8 ops on gfx12 by giuseros · Pull Request #106388 · llvm/llvm-project

This PR is adding support for fp8 and bfp8 on gfx12

github.com

I guess RDNA5 will also have FP4/FP6 like MI350X

Mahboi · Aug 29, 2024

I believe Valve will wait for Zen 6.
But it'll be in the next ROG thing for sure.

branch_suggestion · Aug 29, 2024

Mahboi said:
How do they even design it to end up with 3 + 5?
Ironically, it's perfect. I've been thinking that 4 big cores is a little much for gaming and 2 may be insufficient. With 5 smaller cores, it should be a really solid machine, and 6WGP on a 128 bit bus should be around a 8600g's perf.
It'll be in a million emulation boxes for sure.

They would go 3+5 for floorplan reasons, don't want to waste any area.
Should be one CCX.

Mahboi · Aug 29, 2024

ToTTenTranz said:
Only 64MB Infinity Cache on Navi 48, 66% of what's in Navi 31 and the same as Navi 32:

I guess it was expected, if the chip is actually as small as some reports claim (240mm^2?).

RDNA 2 went pretty hard with the cache and didn't seem to get that much out of it.
RDNA 3 already lowered the cache vs bandwidth ratio by a lot (80Mo in 6950xt for 96Mo in the XTX, despite the XTX going from 256 bit to 384 bit bus).
I think they will keep shrinking mildly until they think they can't shrink anymore, or at least leave it as is.
GPUs just query too much data to be held in cache anyway, complex ops can be done a ton of times on the same instructions with CPUs, but with GPUs you go through a lot more simpler but heavy stuff.

marees · Aug 29, 2024

Mahboi said:
I believe Valve will wait for Zen 6.
But it'll be in the next ROG thing for sure.

The cu is not an upgrade on steam deck

But zen 5 is a massive upgrade for handheld. Even vs zen4

Cary Golomb was raving about it

https://twitter.com/x/status/1823478391951311139

https://twitter.com/x/status/1823436518041493607

marees · Aug 29, 2024

marees said:
The cu is not an upgrade on steam deck

But zen 5 is a massive upgrade for handheld. Even vs zen4

Cary Golomb was raving about it

https://twitter.com/x/status/1823478391951311139

https://twitter.com/x/status/1823436518041493607

So what we have is both cpu & gpu (rdna 3.5) is tuned for low power

What you are missing is RT goodness of RDNA 4. Dissapointing if you wanted to enable nvidia crippleware on a handheld or games made on nv ue5 such as bmw

Mahboi · Aug 29, 2024

marees said:
The cu is not an upgrade on steam deck

But zen 5 is a massive upgrade for handheld. Even vs zen4

Cary Golomb was raving about it

Wut?
The Deck runs on only 4 RDNA 2 WGPs. I do think that 6 RDNA 3.5 WGPs would already be a huge leap.

Also, yes Zen 5 is a huge leap over Zen 2 for sure, but I'm not really buying an upgrade this year for a few reasons. Steam upgraded the Deck with a die shrink and tons of QoL improvements a year ago, that doesn't sound like something they did just for a year. Steam has no immediate need for extra perf, the Deck sells well and the new Deck is great.

Steam's main problem is not running games in 720p on a PSP's screen, it's battery life, and in this regard, Zen 5 is mostly running off its node. N4P did make Zen 5 roughly equivalent and somewhat better than Zen 4, but Zen 5 is still far from being solid in terms of idle power, and it probably isn't going to get good until Zen 6. I think the whole breaking their CCD business and focusing on a server design and client design will be the occasion to yield real improvements across the board in power draw and latency. There's little point in doing it with Zen 5 as is IMO. It's a full on server oriented arch with a ton of transistors that don't do well for gaming. They'd almost be better off using cheaper Zen 4 for it, less silicon for the same perf lol. Not that it's possible.

SolidQ · Aug 29, 2024

AMD RDNA4 GPUs to feature 48 to 64MB of Infinity Cache - VideoCardz.com

Kepler_L2 reveals RDNA4 Infinity Cache sizes The new GPUs are said to feature either 48MB or 64MB. It looks like the past few days are finally giving us a glimpse of what to expect from RDNA4, AMD’s next-gen GPU architecture. While it’s well known that this architecture won’t expand its presence...

videocardz.com

They still thinks N44 with 192b lol

marees · Aug 29, 2024

SolidQ said:
AMD RDNA4 GPUs to feature 48 to 64MB of Infinity Cache - VideoCardz.com

Kepler_L2 reveals RDNA4 Infinity Cache sizes The new GPUs are said to feature either 48MB or 64MB. It looks like the past few days are finally giving us a glimpse of what to expect from RDNA4, AMD’s next-gen GPU architecture. While it’s well known that this architecture won’t expand its presence...

videocardz.com

They still thinks N44 with 192b lol

Those guys don't even refer back to their old rumours

I stick to 3d centre (& google translate)

Gerüchteküche: Erste Daten zu den einzelnen Modellen der Radeon RX 8000 Grafikkarten-Serie | 3DCenter.org

Der Twitter-Bot BenchLeaks weist auf Geekbench-Ergebnisse von "gfx1201" hin, welche früher schon der RDNA4-Architektur zugeordnet wurden. Als weitere technische Daten konnten aus diesen Geekbench-Ergebnissen wohl 28 WGP (entspricht 56

m-3dcenter-org.translate.goog

SolidQ · Aug 29, 2024

Only question cut down N48 is 7800XT or lower

igor_kavinski · Aug 29, 2024

Mahboi said:
I believe Valve will wait for Zen 6.

AMD could change their minds with a killer volume deal on Zen 5 chips

ToTTenTranz · Aug 29, 2024

Mahboi said:
RDNA 2 went pretty hard with the cache and didn't seem to get that much out of it.
RDNA 3 already lowered the cache vs bandwidth ratio by a lot (80Mo in 6950xt for 96Mo in the XTX, despite the XTX going from 256 bit to 384 bit bus).
I think they will keep shrinking mildly until they think they can't shrink anymore, or at least leave it as is.
GPUs just query too much data to be held in cache anyway, complex ops can be done a ton of times on the same instructions with CPUs, but with GPUs you go through a lot more simpler but heavy stuff.

Another reason to keep embedded cache amounts low is that they can then create different tiers with vcache on top, eventually.

ToTTenTranz · Aug 29, 2024

branch_suggestion said:
https://twitter.com/x/status/1828795701712875562
Took long enough.
Top 4 is diename/membus/IC/memspeed
Strix Halo is Z5/Z5LP/total core count/membus/IC/memspeed
Bottom is Sonoma Valley, Z5c/Z5LP and/or CU/membus/memspeed

https://twitter.com/x/status/1828818286202773757
Kraken Z5/Z5c/CU/membus/memspeed

That's not CU count, it's WGP count. Strix Halo has 20 WGP / 40 CU.

And if Kraken has 6 WGPs / 12 CUs then there's even less of a reason to put Strix Point into a handheld. At <30W and with the same memory bandwidth, the difference in gaming performance between the two might be negligible. Especially if the CPU cores demand less power and are less demanding clients of the memory controller.

Kraken is looking to be a much more interesting chip than I originally thought.

branch_suggestion · Aug 29, 2024

ToTTenTranz said:
That's not CU count, it's WGP count. Strix Halo has 20 WGP / 40 CU.

Could be either. The top 4 didn't use CU/WGP count so it would be a typical leaker move to do the switcheroo.

ToTTenTranz said:
And if Kraken has 6 WGPs / 12 CUs then there's even less of a reason to put Strix Point into a handheld. At <30W and with the same memory bandwidth, the difference in gaming performance between the two might be negligible.

I sure hope it is WGP, the NPU bloat scares me though.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Member

Senior member

Diamond Member

Lifer

Junior Member

Golden Member

Golden Member

Golden Member

Senior member

Golden Member

Senior member

Golden Member

Golden Member

Senior member

Golden Member

Senior member

Senior member

Golden Member

Senior member

Senior member

Senior member

Lifer

Member

Member

Senior member