Discussion RDNA4 + CDNA3 Architectures Thread

Page 363 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,770
6,720
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,613
2,927
136
For higher resolutions the 6800 has a big cache and bandwidth advantage though
This is my thinking on RT, that even with the improvements to memory management and compression in RDNA4 that N48's RT performance may be handicapped by cache.

Especially as AMD was eerily silent about it during the presentation yesterday.

I don't recall anyone mentioning infinity cache at all.
 

basix

Member
Oct 4, 2024
77
148
66
Does it need to be a different model? Sure, doubling memory requirements (space, bandwidth) and only a quarter of the TFLOPS per CU will hurt. But I did profile DLSS2 back then (still CNN) and the execution window on the tensor cores was very narrow. Maybe 0.25ms with ~50% of peak utilization on a 4090 (4K output). A 7900XTX deilvers roughly 35% of a 4090's Matrix FP16 TFLOPS with roughly the same raw bandwidth. If AMD can get to a litte bit better utilization rate, a DLSS2 style CNN can be executed in maybe 0.5ms, which ist not very much.
 
Reactions: scineram and Tlh97

Josh128

Senior member
Oct 14, 2022
685
1,183
106
The 7900 XT has a 315W TDP
TBP, not TDP according to AMD site, while 9070XT has 304W TBP according to the same site. According to TPU, 7900XT is 19% faster on average than 7900GRE Sapphire Pulse in both 4K raster, and 4K RT. AMD claims +42% vs 7900GRE across a mix of RT and non RT at 4K. Lets assume the typical first party benchmarks BS and use the obvious knowledge that RT has improved more and thus has heavier weight in AMDs comparison than raster. Chop off 10% overall and you have +32% "no BS" average uplift vs 7900GRE in mixed 4K usage, vs 7900XT's +19% from TPU benches. Compared to each other, that makes 9070XT ~11% faster than 7900XT, while using around 4% less power. Keep in mind this should be a pretty realistic comparison as I deducted the 10% "AMD tax" from AMDs perf claims.

How is Charlie seriously claiming worse efficiency? What numbers is he using?
 

Novacius

Member
Apr 27, 2015
32
47
91
This is my thinking on RT, that even with the improvements to memory management and compression in RDNA4 that N48's RT performance may be handicapped by cache.

Especially as AMD was eerily silent about it during the presentation yesterday.

I don't recall anyone mentioning infinity cache at all.
AMD clearly showed bigger gains with RT than without, so cache doesn't seem to be a huge hindrance. 7800 XT with 64MB was also a bit faster in RT than 6800 XT with 128MB.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,770
6,720
136
This patent below could help you with the box node compression
View attachment 95606

This patent below contains handling both BVH4 and BVH8.
View attachment 95607

I think for prebuilt BVH trees performance gain won't be much. For runtime generated BVH trees it could be a lot of boost.
Cache and memory subsystem will be key.
For BVH4 it fetches two cache lines, for BVH8, it fetches single cache line.



https://twitter.com/NIV_Anteru , who also presented the amazing workgraphs demo recently is inventor of these patents below for generating new kind of BVH trees





Will only help runtime generated trees it looks like to me




Updated way to perform intersection tests too with rotated boxes


I doubt PS5 will get all these with the weak CPU (if the BVH generation is not done on GPU)

Here’s a closer look at the RT improvements. Looks like it supports BVH compression and shader execution reordering?
View attachment 118359
View attachment 118360
View attachment 118361
View attachment 118362
View attachment 118363

Wow a lot of patents/applications made it to the product, pleasantly surprised. The new CU is indeed very potent.

Looking back at the patents...

Oriented Bounding Box was indeed a big part of the new RT arch

BOUNDING VOLUME HIERARCHY HAVING ORIENTED BOUNDING BOXES WITH QUANTIZED ROTATIONS
From <https://www.freepatentsonline.com/y2023/0099806.html>
TECHNIQUES FOR INTRODUCING ORIENTED BOUNDING BOXES INTO BOUNDING VOLUME HIERARCHY
From <https://www.freepatentsonline.com/y2023/0027725.html>
COMMON CIRCUITRY FOR TRIANGLE INTERSECTION AND INSTANCE TRANSFORMATION FOR RAY TRACING
From <https://www.freepatentsonline.com/y2023/0206541.html>
VOLUME INTERSECTION USING ROTATED BOUNDING VOLUMES
https://www.freepatentsonline.com/y2023/0410426.html
TECHNIQUE FOR TESTING RAY FOR INTERSECTION WITH ORIENTED BOUNDING BOXES
From <https://www.freepatentsonline.com/y2024/0221284.html>
EMULATING ORIENTED BOUNDING BOXES IN BOUNDING VOLUME HIERARCHIES
From <https://www.freepatentsonline.com/y2024/0221283.html>

BVH Compression

BVH8
VARIABLE WIDTH BOUNDING VOLUME HIERARCHY NODES
From <https://www.freepatentsonline.com/y2023/0206542.html>

Instance Transform

COMMON CIRCUITRY FOR TRIANGLE INTERSECTION AND INSTANCE TRANSFORMATION FOR RAY TRACING

From <https://www.freepatentsonline.com/y2023/0206541.html>

Listening to Andy on the Arch reveal seems to more or less in line with what the patents have been discussing.

Dynamic VGPR is not specific to RT but seems there in patents and patches been out for a year.


AMD is cooking some more stuffs for upcoming archs
New radical shader reordering and sorting mechanism to minimize divergence described here

Some patents like BVH nodes with delta instances, BVH overlaying etc. did not make it
 

eek2121

Diamond Member
Aug 2, 2005
3,270
4,798
136
Something I just now realized is that AMD is now ahead of NVIDIA in terms of performance/watt and performance/area for raster. Also possibly equal in RT. (NVIDIA has more cores so I suspect that is a big reason for the gap)

This is comparing the 9070XT to the 5070 Ti.

They were in the perfect position to drop a halo product. Of course the halo product got canned.
 

SolidQ

Golden Member
Jul 13, 2023
1,320
2,033
96
Spiderman RT results suspicious, almost double compare to RTX 4080
AMD saying it's maxed RT
Fabio was mention he was surprise in SM2 results even with RTX 5080


 

Mahboi

Golden Member
Apr 4, 2024
1,057
1,969
96
AMD sucks at CryEngine
I felt they did well this generation, typically it's more CringeEngine
Thats looks bad slower than 4080 xtx 5070ti slightly above 4070tis
what's bad about it? My expectation from forever ago was RT at the level of a 4070 Ti with clearly better raster. If we get 4070 Ti Super RT and ~10%> 7900 XT raster, it's a literal 1440p bomb. Weak at 4K, but seriously, past all the marketing, nobody should expect a 70 XT card to be a good 4K card...
 

Mahboi

Golden Member
Apr 4, 2024
1,057
1,969
96
I wonder if their improved h264 encoder is from Xilinx. It should be about the right time to switch IPs, 20% improvement is pretty massive, and AMD never was good h264. I tried streaming on Twitch with my 7900 XT and the quality is about on par with a Pascal or Turing era GPU. AV1 is excellent though, but that's not possible on Twitch......still......in 2025....
 
Reactions: scineram

Heartbreaker

Diamond Member
Apr 3, 2006
4,742
6,247
136
Something I just now realized is that AMD is now ahead of NVIDIA in terms of performance/watt and performance/area for raster. Also possibly equal in RT. (NVIDIA has more cores so I suspect that is a big reason for the gap)

This is comparing the 9070XT to the 5070 Ti.

They were in the perfect position to drop a halo product. Of course the halo product got canned.

Oddly driven by a massive leap in transistor density. I wonder how they did that.

A change of process might put NVidia back in the lead. GB203 is only 45.6B transistors vs 53.9B for N48.

Plus 5070 Ti is the cut down part on GB203, 5080 is the full part.
 

gdansk

Diamond Member
Feb 8, 2011
4,006
6,574
136
A change of process might put NVidia back in the lead. GB203 is only 45.6B transistors vs 53.9B for N48.
I'm not sure transistor counts are as comparable as people would like.
It's 2x the transistor density of Intel's B580 on what is supposed to be a similar density process. Is it really that many less transistors or are they counting differently?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |