Discussion RDNA4 + CDNA3 Architectures Thread

Page 214 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,754
6,631
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

gaav87

Senior member
Apr 27, 2024
452
794
96
RDNA2 via DP4a?
No rdna2 will be not included it does not have wmma or will work on fp16 but very slow.
From what I heard FSR4 uses FP8 path on RDNA4 and what I speculate FP16 path on everything else (older AMD GPUs, Intel, Nvidia, Consoles, APUs). Maybe there is some additional "Block-FP16 = FP8 Throughput" path for AMD APUs with XDNA2 NPU (Strix Point & Halo).

FP8 and FP16 makes sense to me. It is more accurate than INT8 (=less DNN parameters required or higher DNN quality) and INT8 is a mess on AMD GPUs (e.g. N10 has lower INT8 rate than its smaller siblings, PS5 is unclear, RDNA3 has same INT8 rate as FP16, etc.) and would get killed by Nvidia Tensor Core INT8 (they simply have much higher throughput). FP8 throughput on RDNA4 should be on a similar level like FP16 Tensor on similarly sized Nvidia and Intel GPUs. With INT8 it would probably be slower on RDNA2/3/4 compared to their Nvidia counterparts. Not a good idea.


It seems to me to be more DLSS alike or what ARM does on mobile (see their Siggraph presentation from 2024). Parameter prediction, the rest is very similar to FSR2/3. XeSS seems to rely on a much heavier DNN.

FSR4 precursor:
https://gpuopen.com/learn/neural_supersampling_and_denoising_for_real-time_path_tracing/
Yes some features can work on wmma but slower than on rdna4 (they added fp8 and int4) also has swmma (sparsity)
 
Last edited:

Win2012R2

Senior member
Dec 5, 2024
647
609
96
It's what my brain does that lets me see.
Then you must wait for GeForce 6666 series - it will include a new high speed cable that will connect* right into yer cerebral cortex. I can't disclose where would mass market 6660 connect to, but it's pretty far away from where the brain is for most people, though this is currently subject of internal discussion, perhaps this is where the brain actually is on average...


* actual drilling operation is optional extra
 

basix

Member
Oct 4, 2024
41
75
51
At least all vendors support it (Intel, Nvidia, AMD). Currently, INT4 is rarely used but with potentially upcoming Ternary ML-Models (weights = [-1, 0, 1]) it might get more popular (https://arxiv.org/abs/2402.17764). Intel even supports INT2 which could be even better for ternary weights as long as you keep the INT4 accumulate output (INT2 is not sufficient for representing 1+1=2)

Why is ternary likely to happen: Money
 

adroc_thurston

Diamond Member
Jul 2, 2023
4,714
6,501
96
Currently, INT4 is rarely used but with potentially upcoming Ternary ML-Models (weights = [-1, 0, 1]) it might get more popular
It literally went poof from NV GPUs ever since H100.
Now, FP4/6 microscaled? Yeah maybe.

Please refrain from graphic language in the tech section. -Moderator Shmee
 
Last edited by a moderator:

basix

Member
Oct 4, 2024
41
75
51
Yes. Currently, FP8 is widely used. FP4 with microscaling makes it better.

Ternary as proposed in the linked paper has another big, big benefit: You get away with using solely adders. No multiplier necessary. That makes HW much simpler, smaller and more energy efficient. This is not a topic for todays HW but maybe for future HW generations or specialized accelerators. I look at Microsoft in-house silicon, e.g. specialized CDNA5 chiplets with ternary inferencing & training (ternary means, that you also train in ternary scaling) as sole use case or maybe an addition to XDNA3.

But I think we are drifting away from the RDNA4 topic
 
Last edited:

gaav87

Senior member
Apr 27, 2024
452
794
96
int4 isn't usable for anything anywhere anyway. Forget about it.
For ML (fsr4)

INT4 WMMA exists already on RDNA3. The only question is its rate. Same as INT8/FP8 or double.
https://gpuopen.com/learn/wmma_on_rdna3/
https://www.amd.com/content/dam/amd...r-instruction-set-architecture-feb-2023_0.pdf -> chapter 7.9
wmma for rdna4
a/16x16 int4
b/ 16x32 int4
result 16x32 int32
vs rdna3 16x16x16 int4 ->int32
and with swmma we get int4 a/16x16 b/16x64 result 32x64 int32
 
Reactions: Tlh97 and basix
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |