Discussion RDNA4 + CDNA3 Architectures Thread

Page 336 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,755
6,635
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

inquiss

Senior member
Oct 13, 2010
287
406
136
It is only a clean win over the 4070 Ti Super if FSR4 upscaling matches DLSS4.

IMO, the right play for AMD is an "msrp" of $550-600, and an initial retail price of $700-750 for most AIB models for long as 5070 Ti is unobtanium.
Exactly. The latest feature needs to be the same right? Always a new feature. Nvidia really inferior to AMD on Radeon chill though. Maybe Nvidia cards should be discounted in laptop if equal performance?
 
Reactions: marees

inquiss

Senior member
Oct 13, 2010
287
406
136
The feature which reduces two different $1600 Nvidia GPUs to ~17ms frametimes at 1920x1080? For some reason I'm not too worried about needing to turn that crap on. Even in the near future.
Exactly. Very sensible take. Just the next goal post shift for Nvidia fans to justify their purchase when it may be technically stronger but you'd never use the feature for a few generations. Just like the first RTX
 
Reactions: marees

gaav87

Senior member
Apr 27, 2024
550
957
96
Glued on what.

Whatever.
I think they will connect two N48 on the same package and create 128cu gddr6 monster.
Software already supports this with more than 2 micro engine schedulers for example.
What they need:
1. high bandwidth interconnect with minimum 1280GB/s rdna3 IF already supports ~883GB/s per mcd
2. memory system: unified memory pool mapped across both chiplets in firmware or keep the memory local to each die with a fast interconnect maped to each pool
3. Scheduling already suports it with more than 1 micro engine scheduler in kernel
4. Cache coherence would be hurt with increased latency could be minimalized by cross chiplet cache traffic but i remember reading something about this in kernel also

Still even if it scaled +60% vs 9070xt they would have, a winner
 

adroc_thurston

Diamond Member
Jul 2, 2023
4,946
6,829
96
I think they will connect two N48 on the same package and create 128cu gddr6 monster.
Software already supports this with more than 2 micro engine schedulers for example.
What they need:
1. high bandwidth interconnect with minimum 1280GB/s rdna3 IF already supports ~883GB/s per mcd
2. memory system: unified memory pool mapped across both chiplets in firmware or keep the memory local to each die with a fast interconnect maped to each pool
3. Scheduling already suports it with more than 1 micro engine scheduler in kernel
4. Cache coherence would be hurt with increased latency could be minimalized by cross chiplet cache traffic but i remember reading something about this in kernel also

Still even if it scaled +60% vs 9070xt they would have, a winner
It doesn't exist. It doesn't work.
N48 is a boring product made for boring reasons.
 

MrTeal

Diamond Member
Dec 7, 2003
3,805
2,342
136
I think they will connect two N48 on the same package and create 128cu gddr6 monster.
Software already supports this with more than 2 micro engine schedulers for example.
What they need:
1. high bandwidth interconnect with minimum 1280GB/s rdna3 IF already supports ~883GB/s per mcd
2. memory system: unified memory pool mapped across both chiplets in firmware or keep the memory local to each die with a fast interconnect maped to each pool
3. Scheduling already suports it with more than 1 micro engine scheduler in kernel
4. Cache coherence would be hurt with increased latency could be minimalized by cross chiplet cache traffic but i remember reading something about this in kernel also

Still even if it scaled +60% vs 9070xt they would have, a winner
That's a lot of ifs for stuff that isn't baked into N48 already.

It's going to be a lot easier for them to just make a 6-700mm² 128 CU die if that's what they want than screw around with dual GCDs. Nvidia's hand was forced with Blackwell because it's impossible to manufacture a 1600mm² die, and even with Nvidia's resources they've had manufacturing issues getting B200 out the door.
 

Josh128

Senior member
Oct 14, 2022
630
1,030
106
So what are the guesses / explanations on AMD achieving 60+% increase over the 7800xt despite being on the same process node 🤔

More cache !!??

They didnt achieve 60%+ over 7800XT. AMD's own numbers show +51% over 6900XT, almost half of which include RT on. TPU's 7800XT reviews show the 6900XT 3% faster in 4K raster, but the 7800XT as 3% faster in RT, which when combined, effectively makes 7800XT=6900XT. Therefore, AMD's own numbers indicate that 9070XT is ~+51% vs 7800XT, not 60%+.



 
Reactions: marees

reaperrr3

Member
May 31, 2024
66
215
66
Concur. Hope the B770 comes with 24GB VRAM and sells cheaper than 9070 XT.
If B770 comes out at all, it'll be 16GB, because all rumors are pointing to a 256bit GDDR6 mem interface.
32 GB variants for AI market are possible, but those will be priced up accordingly.

And of course it'll (have to) be cheaper than the 9070 XT, since it'll be much slower.

G31 has only 60% more EUs than B580, only ~35% more bandwidth and probably somewhat lower clocks, too.
B580 is only slightly faster than 7600XT and already CPU-bound in many games (sometimes even at 1440p), so B770 will probably struggle to beat even the 7800XT in raster consistently.

They'd probably have to price a B770-16GB at $399 if they want it to be competitive in perf/$ vs. the 9070 and 5070.
Why would they flip N44 once for N48 but not twice into another die if area was going to be around 500mm2 for 128CU?
The smallest chiplet-based SKU was probably cancelled long before the rest and replaced by N48.
But by the time they cancelled the bigger chiplet configs as well, it was too late from a time-to-market perspective to design another, bigger mono-N4x above N48 to replace them, because that would probably have come out a year too late and too close to the planned release time-frame of RNDA5.
 
Reactions: marees

PJVol

Senior member
May 25, 2020
812
796
136
We will know on 28th
IIRC a monolitic asic has only GMI-type links (or what it called in cdna), unlike MCM's which has additional bulky xGMI-like IFs to unify on-chip DFs.
But anyway, it's not possible to "glue" them with just this interconnect. There are too many things that need to be shared in GPU at a lower abstraction level than DF.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |