Discussion RDNA4 + CDNA3 Architectures Thread

Page 372 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,770
6,720
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,613
2,927
136
Something far far simpler instead.
OoO is by itself far more complicated to implement performantly and efficiently than in order compute.

Especially given the number of engineers that have worked on OoO GPU designs is far less than those which have worked on OoO CPUs in the history of the industry.

You're just substituting process/packaging complexity for µArch complexity - which kinda runs counter to the whole chiplet era ethos of keeping die sizes down for minimised mask costs and maximised yields, something they had already identified as an issue before Zen 2 was in development 7-9 years ago.
 

soresu

Diamond Member
Dec 19, 2014
3,613
2,927
136
Great arch, poor execution
Execution was fine if a bit delayed to see how RTX 50 launch panned out.

They weren't factoring in the possibility of nV screwing the pooch on a product launch to a once in a generation degree.

No one was.

They prioritised the market segments that move the most volume and the GPUs needed to fill it.

That left them with engineering muscle for executing RDNA5/UDNA faster to pursue their long term RTG goals.
 
Reactions: Tlh97

Josh128

Senior member
Oct 14, 2022
685
1,184
106
Execution was fine if a bit delayed to see how RTX 50 launch panned out.

They weren't factoring in the possibility of nV screwing the pooch on a product launch to a once in a generation degree.

No one was.

They prioritised the market segments that move the most volume and the GPUs needed to fill it.

That left them with engineering muscle for executing RDNA5/UDNA faster to pursue their long term RTG goals.
RTG has long term goals??
 

soresu

Diamond Member
Dec 19, 2014
3,613
2,927
136
RDNA5 X3D
If you look back at their exascale ambitions many years back (GCN days) they were looking at stacking HBM directly on top of GPU dies.

Eventually this will happen for datacenter GPUs at least, and stacked cache is definitely on the table for both CPU and GPU.

As vertical interconnect pitch decreases and density increases we'll see the area heavy cache move off the compute dies onto dedicated stacked dies which have their own SRAM optimised process.
 

gaav87

Senior member
Apr 27, 2024
638
1,251
96
Anyway this is max oced 9070xt +9800x3d(oced) 40% over 4080s+ oced 14900K
It also beats xtx and 4090...
Lower vram numbers cause of mpo windowed mode or some bs.
4k max settings 50% fsr no rt.
Im not even bothering to post my score cause its two exing it.


 
Last edited:

adroc_thurston

Diamond Member
Jul 2, 2023
5,237
7,318
96
OoO is by itself far more complicated to implement performantly and efficiently than in order compute.

Especially given the number of engineers that have worked on OoO GPU designs is far less than those which have worked on OoO CPUs in the history of the industry.

You're just substituting process/packaging complexity for µArch complexity - which kinda runs counter to the whole chiplet era ethos of keeping die sizes down for minimised mask costs and maximised yields, something they had already identified as an issue before Zen 2 was in development 7-9 years ago.
Oh no I'm not talking that.
I'm just saying they should loan Venice-D CCD floorplan.
If you look back at their exascale ambitions many years back (GCN days) they were looking at stacking HBM directly on top of GPU dies.

Eventually this will happen for datacenter GPUs at least, and stacked cache is definitely on the table for both CPU and GPU.

As vertical interconnect pitch decreases and density increases we'll see the area heavy cache move off the compute dies onto dedicated stacked dies which have their own SRAM optimised process.
DRAM is a really bad fit for that.
 

Mopetar

Diamond Member
Jan 31, 2011
8,282
7,236
136
How much does OoO realistically get you for the average GPU load? Most stuff on the GPU is already designed to be a massively parallel workload that doesn't have many branches or stalls due to random memory accesses. GPUs were created to deal with this kind of crunch work in the first place.

CPUs need OoO to keep their pipelines full because they have to deal with a lot of branches or other code that could lead to pipeline stalls. GPU code generally shouldn't have those sort of problems. I'm sure there's performance to gain by finding something that lacks dependencies that could be executed sooner rather than later, but I don't think there's nearly as much room for massive performance uplifts.

I guess AMD doesn't need anything nearly as sophisticated as they use in their CPUs. If it's low hanging fruit and there's a good enough return for the transistors used, then I'll trust them on it.
 

Bryo4321

Member
Dec 5, 2024
45
98
51
Anyway this is max oced 9070xt +9800x3d(oced) 40% over 4080s+ oced 14900K
It also beats xtx and 4090...
Lower vram numbers cause of mpo windowed mode or some bs.
4k max settings 50% fsr no rt.
Im not even bothering to post my score cause its two exing it.

View attachment 118783
View attachment 118784
The dlss/fsr comparison makes this a little wonky but I’m still having trouble making sense of how that’s possible lol.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |