Discussion RDNA4 + CDNA3 Architectures Thread

Page 145 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,746
6,586
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Ghostsonplanets

Senior member
Mar 1, 2024
666
1,080
96
What is the consensus on the process node for RDNA4?
Would it be N4P or N3E?

Lots of patents around those items listed by Kepler from AMD
N4P

N3E is good, but bloody expensive. No good for a consumer GPU right.
N4P, probably. The companies seem to act like N3 doesn't exist, so it's probably no good. If Nvidia doesn't use it, then AMD certainly won't either.
No dGPU manufacturer is using N3 for this generation. That's for late 26/27 dGPU generation.
 

SolidQ

Senior member
Jul 13, 2023
457
504
96
Explain from Guru3D about RT leaks
Double Ray Tracing Intersect Engine

The Double Ray Tracing Intersect Engine is expected to enable parallel processing of rays, potentially doubling the ray intersection performance compared to RDNA 3. This enhancement is designed to significantly boost the efficiency and speed of ray tracing computations.

RT Instance Node Transform​

The RT Instance Node Transform feature aims to improve the handling of geometries by the GPU, allowing for more efficient transformations such as translation, rotation, and scaling. This should enhance the overall performance of the ray tracing process.

64B RT Node​

A 64-byte RT node could enhance processing efficiency and reduce memory usage, allowing for more effective management of ray tracing data.

Ray Tracing Tri Pair Optimization​

This feature is designed to optimize the calculation of ray-triangle intersections, reducing the computational load and improving performance.

Change Flags Encoded in Barycentrics​

While the exact function of this feature is unclear, it likely simplifies the detection of procedural nodes, contributing to more efficient ray tracing operations.

BVH Footprint Improvement​

Bounding Volume Hierarchy (BVH) footprint improvement could lead to faster grouping and processing of scene geometry, enhancing the speed and efficiency of ray tracing.

RT Support for OBB and Instance Node Intersection​

Support for Oriented Bounding Box (OBB) and instance node intersection may offer higher precision and efficiency by using smaller bounding volumes, further optimizing ray tracing performance.
 

soresu

Diamond Member
Dec 19, 2014
3,183
2,453
136
It seems RDNA 3.5 includes the learning of porting RDNA to low powered samsung mobile
Alot of the problem there was porting RDNA from their long preferred fab provider TSMC to Samsung, not helped by the fact that Samsung was and still is significantly behind TSMC.

Had Samsung elected to fab Exynos + RDNA at TSMC it wouldn't have been so bad as it initially was.
 
Reactions: Tlh97 and marees

DisEnchantment

Golden Member
Mar 3, 2017
1,746
6,586
136
Explain from Guru3D about RT leaks
Double Ray Tracing Intersect Engine

The Double Ray Tracing Intersect Engine is expected to enable parallel processing of rays, potentially doubling the ray intersection performance compared to RDNA 3. This enhancement is designed to significantly boost the efficiency and speed of ray tracing computations.

RT Instance Node Transform​

The RT Instance Node Transform feature aims to improve the handling of geometries by the GPU, allowing for more efficient transformations such as translation, rotation, and scaling. This should enhance the overall performance of the ray tracing process.

64B RT Node​

A 64-byte RT node could enhance processing efficiency and reduce memory usage, allowing for more effective management of ray tracing data.

Ray Tracing Tri Pair Optimization​

This feature is designed to optimize the calculation of ray-triangle intersections, reducing the computational load and improving performance.

Change Flags Encoded in Barycentrics​

While the exact function of this feature is unclear, it likely simplifies the detection of procedural nodes, contributing to more efficient ray tracing operations.

BVH Footprint Improvement​

Bounding Volume Hierarchy (BVH) footprint improvement could lead to faster grouping and processing of scene geometry, enhancing the speed and efficiency of ray tracing.

RT Support for OBB and Instance Node Intersection​

Support for Oriented Bounding Box (OBB) and instance node intersection may offer higher precision and efficiency by using smaller bounding volumes, further optimizing ray tracing performance.

My layman take on these items​


Double Ray Tracing Intersect Engine

The Double Ray Tracing Intersect Engine is expected to enable parallel processing of rays, potentially doubling the ray intersection performance compared to RDNA 3. This enhancement is designed to significantly boost the efficiency and speed of ray tracing computations.

--> Because there are two nodes per cache line, so they need 2x fixed function HW to process the 2x nodes

RT Instance Node Transform​

The RT Instance Node Transform feature aims to improve the handling of geometries by the GPU, allowing for more efficient transformations such as translation, rotation, and scaling. This should enhance the overall performance of the ray tracing process.

--> The Intersection engine transforms the BVH node or does intersection test. The BVH node is transform into an oriented BVH node

64B RT Node​

A 64-byte RT node could enhance processing efficiency and reduce memory usage, allowing for more effective management of ray tracing data.

--> Due to tri and box compression, only 64B is needed for each node

BVH Footprint Improvement​

Bounding Volume Hierarchy (BVH) footprint improvement could lead to faster grouping and processing of scene geometry, enhancing the speed and efficiency of ray tracing.
--> Basically using overlay trees, delta instances, and tri and box compression. Reduces trips to memory and ease BW pressure. They can fit more nodes in the cache

RT Support for OBB and Instance Node Intersection​

Support for Oriented Bounding Box (OBB) and instance node intersection may offer higher precision and efficiency by using smaller bounding volumes, further optimizing ray tracing performance.

--> Oriented Bounding Boxes improves the chance of getting a hit on the triangles in the box compared to non oriented boxes which can trigger lots of box ray hits but not tri ray hits
--> Basically speed up the BVH traversal, fewer box needs to tested to get a tri hit, in general.
 

soresu

Diamond Member
Dec 19, 2014
3,183
2,453
136
Even if RDNA5 doesn't beat nVidia's contemporary RT implementation, by the time it is available the improvements to real time RT software techniques through contributions from nVidia, AMD, Intel, academia and various game studio research divisions will have made RT gaming much more viable to the point of (pure) raster back ends for game engine becoming dead weight.

I follow the software research side pretty closely, and IMHO it's making a far greater difference than the hardware side for the last several years, to the point of bringing Eevee viewport visual quality in Blender much closer to parity with Cycles while retaining really good responsiveness.
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,183
2,453
136
Nobody is boarding the RDNA4 train
I will as long as the top SKU isn't too high TDP.

After being unsatisfied with my 6600XT perf upon getting a 4K TV I then bought a 6800XT which I never got around to installing before eventually selling it to my dad, so now the 6600XT is even older and I would be happy just to get something with >6900XT raster perf and a huge uptick in RT for Blender Cycles and Eevee.
 

Hans Gruber

Platinum Member
Dec 23, 2006
2,292
1,211
136
I will as long as the top SKU isn't too high TDP.

After being unsatisfied with my 6600XT perf upon getting a 4K TV I then bought a 6800XT which I never got around to installing before eventually selling it to my dad, so now the 6600XT is even older and I would be happy just to get something with >6900XT raster perf and a huge uptick in RT for Blender Cycles and Eevee.
I think you only need 7800xt/GRU performance.
 

marees

Senior member
Apr 28, 2024
284
346
96
I will as long as the top SKU isn't too high TDP.

After being unsatisfied with my 6600XT perf upon getting a 4K TV I then bought a 6800XT which I never got around to installing before eventually selling it to my dad, so now the 6600XT is even older and I would be happy just to get something with >6900XT raster perf and a huge uptick in RT for Blender Cycles and Eevee.
I am guessing TDP of reference cards should be low (as they are rumoured to use gddr6)

However partner overclocked boards could end up using a lot of power given that the top chips are reportedly scrapped with no replacement in sight
 

moinmoin

Diamond Member
Jun 1, 2017
5,063
8,024
136
by the time it is available the improvements to real time RT software techniques through contributions from nVidia, AMD, Intel, academia and various game studio research divisions will have made RT gaming much more viable to the point of game engine raster back ends becoming dead weight.
The real (as in actually scalable) improvements to real time RT don't replace raster with RT but combine both in clever ways (like UE5's Lumen) without the high hardware requirements of the current brute force hardware RT approach as pushed by Nvidia.
 

soresu

Diamond Member
Dec 19, 2014
3,183
2,453
136
without the high hardware requirements of the current brute force hardware RT approach as pushed by Nvidia
I'm no expert in light transport techniques by any means, but even the most basic ReSTIR implementation is already moving away from brute force.

Absolute brute force being unbiased rendering, and ReSTIR introduces a hell of a lot of bias from reusing previous light paths - part of the reason the frames can look so splotchy.
 

KompuKare

Golden Member
Jul 28, 2009
1,157
1,413
136
The real (as in actually scalable) improvements to real time RT don't replace raster with RT but combine both in clever ways (like UE5's Lumen) without the high hardware requirements of the current brute force hardware RT approach as pushed by Nvidia.
And at risk of seriously offending the RT purists, this is probably the correct approach. No point in abandoning decades of raster experience to go pure RT especially while the hardware to go full RT isn't really there.


And even if the hardware does eventually catch up then the next hurdle is full path tracing. At which stages raster "cheats" might help somewhere in there.

Cheating? In the days of the hype for upscalers cheating is the best thing ever!
 

soresu

Diamond Member
Dec 19, 2014
3,183
2,453
136
And even if the hardware does eventually catch up then the next hurdle is full path tracing. At which stages raster "cheats" might help somewhere in there.
IMHO raster is not cheating - it does at least allow the game dev to reliably reproduce on screen what they intended to.

Upscaling is not so reliable as that - that for me is cheating.

Unfortunately the necessary denoising of RT samples to gain a clean image can sanitise or blur surface details, so the greatest focus seems to be simply getting sampling variance down to such a low level at real time frama rates that this detail loss is no longer a problem.

Also it's far more likely given nVidia's overall research direction that they would pivot to ML as a means of "cheats" for hybrid RT perf going forward.
 
Last edited:
Reactions: Tlh97

coercitiv

Diamond Member
Jan 24, 2014
6,587
13,872
136
Would never have guessed Jon Peddie release an article like this but it’s got some good info:
Sigh, they put an AI generated pic with the following caption:
AMD’s protype RDNA 4 test board with ray-tracing improvements. (Source: Jon Peddie)


I would not be surprised if they used a LLM to expand the info in the leaks into a full article.
 
Reactions: moinmoin

poke01

Golden Member
Mar 8, 2022
1,952
2,478
106
hot take but graphics peaked. It’s time for devs to make more creative games.

I’ll reserve my comment about the state of Ray tracing after Blackwell launches.
 

soresu

Diamond Member
Dec 19, 2014
3,183
2,453
136
hot take but graphics peaked
Not a hot take really.

I've been saying that for years.

Everything beyond what we have now is just throwing money down the drain to chase the last few % of photorealism so that GPU manufacturers can keep making money.

It's basically the "perfect is the enemy of good enough" phase.

They would do far better to pursue animation improvements, ironically something ML is well suited for.
 

jpiniero

Lifer
Oct 1, 2010
15,067
5,638
136
Graphics in games has definitely stagnated, but that's because the current gen consoles aren't powerful enough to do a meaningful improvement in fidelity and do 60 fps.

GTA 6 will be interesting to see how much it ends up being downgraded.
 

soresu

Diamond Member
Dec 19, 2014
3,183
2,453
136
Graphics in games has definitely stagnated, but that's because the current gen consoles aren't powerful enough to do a meaningful improvement in fidelity and do 60 fps.

GTA 6 will be interesting to see how much it ends up being downgraded.
Chasing resolution like a lunatic doesn't help.

The 8K thing for PS5 was just a cruel joke.

So many games can barely manage a workable frame rate at 4K.

Even with VR they wouldn't be putting such a priority on resolution if screen door wasn't a thing.

IMHO they could probably just stop now with the current state of the art for HMD displays if they could just find a way to pack the pixels closer together and eliminate that issue.

Interestingly it seems like DMD MEMS projection does not suffer from screen door.

I know that DMD is a completely different technology from LCD and OLED, but it seems rather strange that all transmissive and direct emissive display technologies suffer the same effect, as if there is an intrinsic flaw in a standard design layout that they all use that makes it impossible to have each pixel directly next to its neighbor 🤔
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |