Discussion RDNA4 + CDNA3 Architectures Thread

Page 153 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,749
6,614
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Arctic Islands

Junior Member
Apr 28, 2024
5
4
41
Only 64MB Infinity Cache on Navi 48, 66% of what's in Navi 31 and the same as Navi 32:



I guess it was expected, if the chip is actually as small as some reports claim (240mm^2?).
RDNA3.5 WGPs are about 20-25% bigger than RDNA3, and I assume RDNA4 WGPs are somewhat similar in size or even bigger.

It doesn't seem like a 240mm^2 monolithic GPU could have enough space to put 4 SEs with 32 WGPs and 64MB MALL cache inside.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,505
2,080
136
RDNA3.5 WGPs are about 20-25% bigger than RDNA3, and I assume RDNA4 WGPs are somewhat similar in size or even bigger.

It doesn't seem like a 240mm^2 monolithic GPU could have enough space to put 4 SEs with 32 WGPs and 64MB MALL cache inside.

AMD has managed to dramatically shrink the L3 on Zen5 vs Zen4, I expect similar efforts for the MALL in RDNA4. N5 -> N4P is also a small shrink.
 
Reactions: Tlh97 and Kepler_L2

Mahboi

Golden Member
Apr 4, 2024
1,035
1,900
96
How do they even design it to end up with 3 + 5?
Ironically, it's perfect. I've been thinking that 4 big cores is a little much for gaming and 2 may be insufficient. With 5 smaller cores, it should be a really solid machine, and 6WGP on a 128 bit bus should be around a 8600g's perf.
It'll be in a million emulation boxes for sure.
 

marees

Senior member
Apr 28, 2024
578
639
96
How do they even design it to end up with 3 + 5?
Ironically, it's perfect. I've been thinking that 4 big cores is a little much for gaming and 2 may be insufficient. With 5 smaller cores, it should be a really solid machine, and 6WGP on a 128 bit bus should be around a 8600g's perf.
It'll be in a million emulation boxes for sure.
Steam deck 2

Rog ally non-extreme version refresh
 

branch_suggestion

Senior member
Aug 4, 2023
414
907
96
How do they even design it to end up with 3 + 5?
Ironically, it's perfect. I've been thinking that 4 big cores is a little much for gaming and 2 may be insufficient. With 5 smaller cores, it should be a really solid machine, and 6WGP on a 128 bit bus should be around a 8600g's perf.
It'll be in a million emulation boxes for sure.
They would go 3+5 for floorplan reasons, don't want to waste any area.
Should be one CCX.
 

Mahboi

Golden Member
Apr 4, 2024
1,035
1,900
96
Only 64MB Infinity Cache on Navi 48, 66% of what's in Navi 31 and the same as Navi 32:

I guess it was expected, if the chip is actually as small as some reports claim (240mm^2?).
RDNA 2 went pretty hard with the cache and didn't seem to get that much out of it.
RDNA 3 already lowered the cache vs bandwidth ratio by a lot (80Mo in 6950xt for 96Mo in the XTX, despite the XTX going from 256 bit to 384 bit bus).
I think they will keep shrinking mildly until they think they can't shrink anymore, or at least leave it as is.
GPUs just query too much data to be held in cache anyway, complex ops can be done a ton of times on the same instructions with CPUs, but with GPUs you go through a lot more simpler but heavy stuff.
 

marees

Senior member
Apr 28, 2024
578
639
96

Mahboi

Golden Member
Apr 4, 2024
1,035
1,900
96
The cu is not an upgrade on steam deck

But zen 5 is a massive upgrade for handheld. Even vs zen4

Cary Golomb was raving about it
Wut?
The Deck runs on only 4 RDNA 2 WGPs. I do think that 6 RDNA 3.5 WGPs would already be a huge leap.

Also, yes Zen 5 is a huge leap over Zen 2 for sure, but I'm not really buying an upgrade this year for a few reasons. Steam upgraded the Deck with a die shrink and tons of QoL improvements a year ago, that doesn't sound like something they did just for a year. Steam has no immediate need for extra perf, the Deck sells well and the new Deck is great.

Steam's main problem is not running games in 720p on a PSP's screen, it's battery life, and in this regard, Zen 5 is mostly running off its node. N4P did make Zen 5 roughly equivalent and somewhat better than Zen 4, but Zen 5 is still far from being solid in terms of idle power, and it probably isn't going to get good until Zen 6. I think the whole breaking their CCD business and focusing on a server design and client design will be the occasion to yield real improvements across the board in power draw and latency. There's little point in doing it with Zen 5 as is IMO. It's a full on server oriented arch with a ton of transistors that don't do well for gaming. They'd almost be better off using cheaper Zen 4 for it, less silicon for the same perf lol. Not that it's possible.
 

SolidQ

Senior member
Jul 13, 2023
593
747
96
They still thinks N44 with 192b lol
 

marees

Senior member
Apr 28, 2024
578
639
96
They still thinks N44 with 192b lol
Those guys don't even refer back to their old rumours

I stick to 3d centre (& google translate)

 

ToTTenTranz

Member
Feb 4, 2021
182
313
106
RDNA 2 went pretty hard with the cache and didn't seem to get that much out of it.
RDNA 3 already lowered the cache vs bandwidth ratio by a lot (80Mo in 6950xt for 96Mo in the XTX, despite the XTX going from 256 bit to 384 bit bus).
I think they will keep shrinking mildly until they think they can't shrink anymore, or at least leave it as is.
GPUs just query too much data to be held in cache anyway, complex ops can be done a ton of times on the same instructions with CPUs, but with GPUs you go through a lot more simpler but heavy stuff.

Another reason to keep embedded cache amounts low is that they can then create different tiers with vcache on top, eventually.
 

ToTTenTranz

Member
Feb 4, 2021
182
313
106
Took long enough.
Top 4 is diename/membus/IC/memspeed
Strix Halo is Z5/Z5LP/total core count/membus/IC/memspeed
Bottom is Sonoma Valley, Z5c/Z5LP and/or CU/membus/memspeed
Kraken Z5/Z5c/CU/membus/memspeed

That's not CU count, it's WGP count. Strix Halo has 20 WGP / 40 CU.


And if Kraken has 6 WGPs / 12 CUs then there's even less of a reason to put Strix Point into a handheld. At <30W and with the same memory bandwidth, the difference in gaming performance between the two might be negligible. Especially if the CPU cores demand less power and are less demanding clients of the memory controller.

Kraken is looking to be a much more interesting chip than I originally thought.
 
Reactions: Tlh97 and marees

branch_suggestion

Senior member
Aug 4, 2023
414
907
96
That's not CU count, it's WGP count. Strix Halo has 20 WGP / 40 CU.
Could be either. The top 4 didn't use CU/WGP count so it would be a typical leaker move to do the switcheroo.
And if Kraken has 6 WGPs / 12 CUs then there's even less of a reason to put Strix Point into a handheld. At <30W and with the same memory bandwidth, the difference in gaming performance between the two might be negligible.
I sure hope it is WGP, the NPU bloat scares me though.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |