Discussion RDNA4 + CDNA3 Architectures Thread

Page 80 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,659
6,101
136





With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.



Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Mahboi

Senior member
Apr 4, 2024
741
1,313
96

"The traditional performance IPC of RDNA4 is expected to increase by about 12% compared to RDNA3, while the improvement in light pursuit will be huge (hardware BVH traversal), and the IPC is expected to increase by about 25%.
RDNA4 should be a single-chip design, using TSMC N4P process, with a smaller area, so the cost is very low. The video memory should be GDDR6. The graphics card will be very cost-effective."
My personal copium about RDNA 4 then (bad maths incoming):

Assuming a baseline 20% extra clocks for RDNA 3 (so, 3Ghz across the board), and 15% general better performance.
Assuming that the general RT performance promised here is correct, so 25% better across the board.
Assuming also that RDNA 4 will have an extra 10% clocks, so it won't be 3Ghz base but rather above 3.2Ghz. It's on N4P, so that's not an unrealistic expectation. And also the promised 12% extra raster.
And assuming that the lack of BVH walker indeed is the reason that RDNA suffers so damn much in NV-RT games like Cyberpunk.

With a die that, in raster, will provide somewhat above 7900 xt performance:

Raster:

So let's cut the apple in half and say that, counting the extra it's right in the middle between an XT and XTX.
150 FPS base at 1440p for N48.
Remove 12% extra perf iso clocks, and 15% perf for extra clocks.
150 x .12 == 18
132

132 x .15 == 19.2
112.8 FPS

We're in the ballpark of a 7800 xt at 109.3 FPS base. Since N48 is a 256 bit bus thing, I don't expect it to shine hard at 4K, my 7900 xt and its 320 bit bus already takes more Ls than it should vs an XTX's 384 bit.

Now taking a 7800 xt as base:

Taking 41 FPS at 1440p base, adding 15 + 12 general perf:
41 * 1.12 = 45.92 (46)
42 * 1.15 = 52.9 (53)

I don't think the RT rep is "25% more on top of 12%" so that's 13%:
53 * 1.13 = 59.89 fps, so we're nearing 60 FPS average, with RT on, at 1440p.

If my copium proves true, then it won't be just 20% more clocks but more like 30%, from ~2.5GHz to ~3.3Ghz. So we can add a broad 10% improvement to all of that, up to around 66 FPS average with RT on at 1440p. Don't have a clue how well it'll fare on 4K with that bus, but just upscale it.

Now the real question is how many of these games are crippled from the lack of a BVH walker. It used to be that an XTX did an abysmal 9 FPS at 4K with Path Tracing on in CP2077. Now I see a 7800 xt doing 26.9 in the tomshardware article at 1440p, so maybe something was fixed/accelerated somewhere since, or maybe "RT Ultra" doesn't mean Path Tracing on.
My understanding was "AMD raytracing is poor, but not even close to unusable for 90% of RT, and when you have too many light bounces, it falls off a cliff and becomes unusable". BVH HW walker was meant to get the unusable into playable and to up the general perf.

I can assume that a general 25% improvement + 30% more clocks so around 25% better perf is there. That's good, but it's just reaching a 4070 Ti and its 68 FPS average here. Reaching a 4070 Ti tier of RT with a 240mm² die is damn impressive, but nobody's gonna clap for AMD yet again reaching last generation's performance.
So my only hope at this point, that we'd go above a 4070 Ti and at least to a Super's general perf, is that the BVH walker goes beyond "just 25%" and sometimes takes unusable RT with too many light bounces into usable RT. As in, you have 25% general RT improvement, but in some games, it's 100% or more. Possibly up to 150% more.

If AMD has effectively fixed the last thing that made their RT worse outside of software (which still takes years of work), then they have a strong contender that'll reach circa 4070 Ti Super and possibly get closer to a 4080's RT performance, while raster is already 4080 level anyway. For a card they'll sell for $600 and that could be financially viable for $400, that's a really great product, and a great heir to the 7800 xt which I think is by far the best offering AMD has had this gen.

The Copium is on. Make me dream of functional 1440p path tracing, AMD. 4K I know won't happen without maximum upscaling, but let me dream. And the raster perf, well, it'll make a near XTX level of perf, so I'm not worried that it'll handle everything raster well enough, especially for that price.
 
Last edited:
Reactions: Tlh97 and Tarkin77

Saylick

Diamond Member
Sep 10, 2012
3,361
7,059
136
the # of channels.
And cache hit rate, no?

If your hit rate was always 0%, your effective bandwidth matches the bandwidth of the VRAM. If you always got a cache hit, your effective bandwidth matches the IF cache bandwidth. Of course, we're going to be somewhere in the middle.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,415
1,736
136
Even for the MCM parts though, I can't seem to figure out what actually makes up the "effective bandwidth" number. The closest I can come up with is this, where Cache BW is Effective BW - Memory Bandwidth.
If anyone knows the exact calculation for effective BW in RDNA3 I'd love to see it.

It's more complex and depends on cache hit rate. IIRC something like ram bw + min (1/(cache miss rate) * ram bw, ∞cache bw). It's supposed to model how much traditional ram bw you'd need to match the combination of ∞cache and ram bus.

I think it was in the slides when AMD first released the 6000-series, I can't go looking for it now.
 
Last edited:

Mahboi

Senior member
Apr 4, 2024
741
1,313
96
If they didn't redesign anything, we'd have Navi 41, 42 and 43. AMD is not that creative in their naming.

So Navi 48 is definitely a redesign. Whether Navi 44 is so, is unclear, because we don't know whether they intended to make 3 or 4 chips.

And if RDNA4 is only a fix of RDNA3 with 15% improvement, that would be very disappointing.
Wut?
Navi 41: chipletized, top tier, murdered
N42: chipletized, mid tier, murdered
N43: monolithic, entry tier, murdered
N44: monolithic, low power gaming laptop and cheapest node that's still gaming worthy, kept
N45: something different
N46: something different
N47: something different
N48: something different that turned out to be a monolithic die with roughly the same cost and power goals as what N43 would've been, but cheaper and still marketable

Where's your logic now?
Corpos during R&D phase throw everything possible that they can afford. That's why it's such a money hole, because basically any viable idea in the office will be pursued and killed off once a better idea has come forth. It just happens that a small sized, midrange power monolithic die was the 4th extra option past the main goals. Nothing about that proves that it's a "redesign".
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,415
1,736
136
Yeah but that's not a real metric.

It's a thing AMD marketing likes to talk about. I'll bet you that it will be up on the slides again when RDNA4 launches. It might not be real, but it's something that will be mentioned, and it would probably match up well with the number on that leak, which is why people think that's what the number is.
 
Reactions: Tlh97 and Elfear

Mahboi

Senior member
Apr 4, 2024
741
1,313
96
It's a thing AMD marketing likes to talk about. I'll bet you that it will be up on the slides again when RDNA4 launches. It might not be real, but it's something that will be mentioned, and it would probably match up well with the number on that leak, which is why people think that's what the number is.
Marketing likes big numbers.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,157
4,525
96
I care, because it's interesting to know, and bandwidth relates to performance.
Very very very gamey metrics. Don't.
It's a thing AMD marketing likes to talk about.
Yeah but it's not relevant.
Perf, power area. Three ultimate metrics for any GPU.
Not really sure why you say no, when the rest of your sentence is you admitting that they made a new chip.
It's a replacement.
So where are you getting 5 chips from?
We're talking RDNA4 aren't we?
Navi 21/22/23/24
Navi 31/32/33
You forgot NV36 but I forgive you.
 

MrTeal

Diamond Member
Dec 7, 2003
3,584
1,743
136
It's more complex and depends on cache hit rate. IIRC something like ram bw + min (1/(cache miss rate) * ram bw, ∞cache bw). It's supposed to model how much traditional ram bw you'd need to match the combination of ∞cache and ram bus.

I think it was in the slides when AMD first released the 6000-series, I can't go looking for it now.
There's this PDF that goes over it somewhat for the Pro cards

I can't find published values for the infinity fabric clock to confirm they're running at different speeds on N32 and N31 though.
If it depended on hit rate, it shouldn't scale linearly between the three N31 GPUs as hit rate isn't linear.
 

Saylick

Diamond Member
Sep 10, 2012
3,361
7,059
136
It's more complex and depends on cache hit rate. IIRC something like ram bw + min (1/(cache miss rate) * ram bw, ∞cache bw). It's supposed to model how much traditional ram bw you'd need to match the combination of ∞cache and ram bus.

I think it was in the slides when AMD first released the 6000-series, I can't go looking for it now.
I don't think effective bandwidth is going to be higher than the IF Cache bandwidth, simply because all data has to go through the IF Cache to enter the GPU. There aren't two pipes of bandwidth, only one which flows from VRAM to IF Cache to GPU. What bandwidth the GPU sees depends on how often it is able to tap into the full speed of the IF Cache (i.e. the hit rate). Unfortunately, the hit rate isn't some static number and it changes depending on workload.

I think the math is as follows:

Eff. Bandwidth = VRAM Bandwidth * (1 - Hit Rate) + IF Cache Bandwidth * (Hit Rate)
 

Aapje

Golden Member
Mar 21, 2022
1,464
2,028
106
Wut?
Navi 41: chipletized, top tier, murdered
N42: chipletized, mid tier, murdered
N43: monolithic, entry tier, murdered
N44: monolithic, low power gaming laptop and cheapest node that's still gaming worthy, kept
N45: something different
N46: something different
N47: something different
N48: something different that turned out to be a monolithic die with roughly the same cost and power goals as what N43 would've been, but cheaper and still marketable

Where's your logic now?
Corpos during R&D phase throw everything possible that they can afford. That's why it's such a money hole, because basically any viable idea in the office will be pursued and killed off once a better idea has come forth. It just happens that a small sized, midrange power monolithic die was the 4th extra option past the main goals. Nothing about that proves that it's a "redesign".

Well, my logic is not that I made up some supersecret research program at AMD where they develop 8 chips every gen and count up to Navi x8, but somehow have only ever released up to Navi x4 for each of the previous Navi generations.

It's beyond obvious that at least Navi 48 was a late development, which also explains why the fastest chip has the highest number, even though in every previous Navi generation, the higher the number of the chip, the weaker it got.
 

Aapje

Golden Member
Mar 21, 2022
1,464
2,028
106
It's a replacement.

Replacement = redesign.

We're talking RDNA4 aren't we?

Yes, but where do you get your information that it was going to have 5 chips? I'm looking at earlier gens, because there we know what they released

You forgot NV36 but I forgive you.

Not sure what an Nvidia chip from 2004 has got to do with any of this, or why you act so arrogant when saying something so silly.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,157
4,525
96
Replacement = redesign.
No. A replacement.
but where do you get your information that it was going to have 5 chips?
It came to me in the dream.
I'm looking at earlier gens, because there we know what they released
How is this relevant to RDNA4? or 5 for that matter.
Not sure what an Nvidia chip from 2004 has got to do with any of this, or why you act so arrogant when saying something so silly.
NV stands for Navi.
You can figure out the rest yourself.
 
Reactions: Tlh97 and Mahboi

Mahboi

Senior member
Apr 4, 2024
741
1,313
96
Well, my logic is not that I made up some supersecret research program at AMD where they develop 8 chips every gen and count up to Navi x8, but somehow have only ever released up to Navi x4 for each of the previous Navi generations.
"Super secret"
Cause internal R&D is just something that every single corpo makes public now.
Yes, they don't document or publish what their research teams are doing. If you think I'm somehow inventing this, try to go on an AMD official channel and politely ask what are their R&D teams doing and what projects are on, and you'll get a very polite "not your business, bye".

Also they were developing as many chips as it made sense. Back in Vega/RDNA 1 days, the amount of chips was incredibly small because the budget limited it. I can guarantee that Nvidia could afford multiple times the R&D targets that AMD could, and I never set foot in either company, it's just how business works. More money means more attempts.

By your non-logic, somehow, someday, an AMD engineer was walking to the coffee machine, stumbled, the XTX die and Zen 5 die he had in his pocket cracked and mixed together, and when he took it back up, he had accidentally made Strix Halo. Corpos try random stuff all the time and see how well it goes or doesn't go, and shuffle teams or reconvert them into other projects. You don't even need to look into the semiconductor industry for this: it happens literally all the time in any company. Games get started and abandoned all the time. Markets get opened by commercials and shut by management all the time. New locations for retailers get started, then shut due to bad business after a few years all the time, etc.
It's beyond obvious that at least Navi 48 was a late development, which also explains why the fastest chip has the highest number, even though in every previous Navi generation, the higher the number of the chip, the weaker it got.
No. You don't just pile on "later developments" in multi year dev cycles. You make broad plans for everything in advance, keep what works, eat what doesn't. Big corpos are ogres.
 

Aapje

Golden Member
Mar 21, 2022
1,464
2,028
106
@Mahboi

Why would AMD spend gen after gen designing chips that they were never going to use? It doesn't make any sense.

You are just using the classic conspiracy theory nonsense: "It's obviously true, because they is zero evidence. Top Secret, yo."
 

Mahboi

Senior member
Apr 4, 2024
741
1,313
96
Why would AMD spend gen after gen designing chips that they were never going to use? It doesn't make any sense.
Because it's Research and Development.
Not Development.

They don't know what's going to work or not. They just fling money and time at the possibilities and see what works best.
This is why R&D is such a costly thing.
You are just using the classic conspiracy theory nonsense: "It's obviously true, because they is zero evidence. Top Secret, yo."
Ok I'm done with this conversation.
 

Aapje

Golden Member
Mar 21, 2022
1,464
2,028
106
It came to me in the dream.

I guess that you are just trolling now?

How is this relevant to RDNA4? or 5 for that matter.

Things that happened during previous generations typically stay roughly the same. If a clear patter that lasted for 3 gens of a pattern, that indicates that something special happened.

NV stands for Navi.
You can figure out the rest yourself.

No, I cannot, because there is no Navi 36 chip. If you evidence that you are not just openly lying here, present it.
 

Aapje

Golden Member
Mar 21, 2022
1,464
2,028
106
They don't know what's going to work or not. They just fling money and time at the possibilities and see what works best.

Except that if you were right, you'd see those extra designs getting chosen now and then. Otherwise it makes no sense.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |