Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 211 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146

Saylick

Diamond Member
Sep 10, 2012
3,385
7,149
136
Hey, perfect timing. A few of y'all were wondering if N33 had big differences vs N31. Well, C&C just released a new article to tell you:
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,149
136
@Saylick

They seem to consistently find that compilers produce pretty poor code, which suggests that AI may really help to optimize that.
Yeah, that was the part that stood out to me, too. Looks like the extra FP units aren't being taken advantage of, which many of us here could naturally deduce just based off how N31 seemed to have little IPC gains over N21. It's rather a shame, really. Historically, AMD GPU architectures that rely on compilers to extract ILP for maximize performance tend to fare poorly against their Nvidia counterparts, and this is no exception. It's just a natural consequence of having a weaker software division than Nvidia.
 
Jul 27, 2020
17,849
11,642
116
It's just a natural consequence of having a weaker software division than Nvidia.
Which begs the question, why do they keep pulling the compiler optimization dependent architectures? It's like they have no interest in actually competing. Or the architects who get mental orgasms from these ideas somehow keep getting back into the driver's seat in their GPU division.
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,149
136
Which begs the question, why do they keep pulling the compiler optimization dependent architectures? It's like they have no interest in actually competing. Or the architects who get mental orgasms from these ideas somehow keep getting back into the driver's seat in their GPU division.
Why? Perf/area, but only if you can optimize for it.
 

Aapje

Golden Member
Mar 21, 2022
1,467
2,031
106
Which begs the question, why do they keep pulling the compiler optimization dependent architectures?
It suggests that the hardware guys have most power within the company and that there is a disconnect between the hardware and software divisions, where the former are fairly flippant about the latter. "It's just software, it's not that hard."

Arguably, Intel had the same issue with Arc where management appeared convinced that Arc could release on the schedule set by the hardware guys, until it turned out that it is hard to play games without a working driver.

All these companies are founded and ran by hardware people, so it's not surprising that they would have a natural affinity for hardware, but not so much for software.

Perhaps the reason why Nvidia is so successful and probably the biggest strength of Jensen is that he doesn't just see the software as this thing you have to do to make the hardware usable, but as something that can itself add something to the product that makes it more valuable. That's pretty open-minded from an electrical engineer for whom software is rather foreign.
 
Last edited:

tajoh111

Senior member
Mar 28, 2005
304
320
136
I calculated It based on the table Locuza made, and I got 50% more, but didn't change interconnects, CP and MM&PCIe. Hopefully, he is right with this table.
View attachment 81317
If Ada102 with comparable specs is comfortable with 384bit bus and 96MB L2, I don't think this beefed up N31 would need a couple more MCDs, but who knows.
Naturally, I wouldn't keep the same clocks as N31, but ~10-15% lower. This will help to keep TBP to 450W.
RTX 4090 is only 23% faster than RX 7900 XTX, so this should be at worst as fast as RTX 4090 even with 15% lower clocks in raster.

If I wanted to be sure to win by a few % against full Ad102, then I could go big with N30: 428mm2 GCD + 8 MCDs + 32GB Vram with 10% lower clocks and 500W TBP.
I could pack inside a 428mm2 GCD 160CU(+66.7%) : 10240SP(+66.7%) : 640TMUs(+66.7%) : 256ROPs(+33.3%) + 8 MCDs would provide me with 128MB IC(+33%) and 1280 GB/s BW(+33%).

There would be no monstrous Die, that's Ad102, only a monstrous package.
428mm2 5nm GCD + 300mm2(8*37.5mm2) 6nm MCD = 728mm2 in total, but 41% of that is on a much cheaper process than Ada102, so making It shouldn't cost more than Ad102.

With these specs, It would win against full Ad102 in raster and I think It would be at worst pretty close to RTX 4090 in RT.
Price It at $1399(+$400), this would be more than enough for the higher production cost and extra Vram, they would have even a decent extra profit on It compared to RTX 7900 XTX.
RTX 4090 with $1599 would need to go down with the price, because It wouldn't be worth the extra +$200 cost even with DLSS3, considering AMD is probably doing something similar to Frame generation.

Of course, this N30 GCD is only a wishful thinking, even that beefed up N31. AMD will at most release N32, but there was this possibility, which they didn't use.
Hopefully RDNA4 will be much better.

I think it would be quite a bit larger than the that die size and even with those specs, there a good chance it would lose to full ADA102.

Just look at Navi 31 vs AD103 at the moment.

12288(6144) vs 9728(4854) Cores 25.8% advantage.
384 vs 304 TMU 26.3% advantage
192 vs 112 ROPS 71.4% advantage
960 vs 716GB/sec 34.1% advantage.

You would think Navi 31 would smash AD103 with that advantage, yet there is only a 2-5% advantage for Navi 31 vs AD103 in raster. Compare your imaginary N30 die and you will see the advantages for N30 are even less than navi 31 vs AD103 by quite a bit.

20480(10240) vs 18432(9216) 11% advantage
640 vs 576 11% advantage
256 vs 192 33% advantage
1280 vs 1084GB/sec 18% advantage.

With this is mind, there is nothing to suggest your imaginary N30 would beat full AD102. If your only evidence is the poor scaling of AD103 vs AD102, there is nothing to suggest that Navi 30 would scale well that high, particularly since one of the biggest bottlenecks(bandwidth) would only increase 33% vs Navi 31. Also as I mentioned, the RTX 4090 only has 12.5% more L2 cache than a RTX 4080. This along with the modest increase of bandwidth vs last gen is likely a bottleneck which Full AD102 with 33% more cache won't have. Add in the reduction of clocks to accommodate that huge increase in specs for N30 and the problems NAVI 31 is already having being increased from being a more complex chip, I think you would likely get something that combines some of the worst parts of VEGA and Fermi and ultimately a loss.
 
Reactions: igor_kavinski

menhera

Junior Member
Dec 10, 2020
21
66
61
Reading a dozen of modern game profiles, I've formed an opinion that all RDNA GPUs should've been 12 WGPs per SE like XSX or Ada. Radeon GPU Profiler shows how long your GPU spends with the fixed-function units in a single frame. They're only busy during vertex shaders, and after vertex shaders, they mostly sit idle with nothing to do in the rest of the frame. Moreover, vertex shaders occupy a decreasing proportion as resolutions scale.

My RX 6800 is always stuck in heavy compute shaders and pixel shaders, not ROPs or rasterizers in modern games, and my GPU has 10 WGPs per SE. 8 WGPs per SE in 2023 feels like AMD lives in 2014 thinking of Maxwell nonstop.
 
Mar 11, 2004
23,173
5,639
146
It suggests that the hardware guys have most power within the company and that there is a disconnect between the hardware and software divisions, where the former are fairly flippant about the latter. "It's just software, it's not that hard."

Arguably, Intel had the same issue with Arc where management appeared convinced that Arc could release on the schedule set by the hardware guys, until it turned out that it is hard to play games without a working driver.

All these companies are founded and ran by hardware people, so it's not surprising that they would have a natural affinity for hardware, but not so much for software.

Perhaps the reason why Nvidia is so successful and probably the biggest strength of Jensen is that he doesn't just see the software as this thing you have to do to make the hardware usable, but as something that can itself add something to the product that makes it more valuable. That's pretty open-minded from an electrical engineer for whom software is rather foreign.

I don't think that was JHH's thinking at all. It was more "how do we lock people into using our hardware and keep them buying year after year?"

They wanted to buy ARM for software lock-in. It wasn't about value add (as they are free to make ARM CPUs, and that'd be a lot cheaper), but about trying to make their graphics hardware/software the defacto for ARM designs. GSync wasn't about value adding, it was about lock-in. Buying PhysX wasn't about adding value, it was about trying to get access to a budding market (hardware accelerated physics) for lock-in. CUDA wasn't about adding value, it was about lock-in.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
I don't think that was JHH's thinking at all. It was more "how do we lock people into using our hardware and keep them buying year after year?"

They wanted to buy ARM for software lock-in. It wasn't about value add (as they are free to make ARM CPUs, and that'd be a lot cheaper), but about trying to make their graphics hardware/software the defacto for ARM designs. GSync wasn't about value adding, it was about lock-in. Buying PhysX wasn't about adding value, it was about trying to get access to a budding market (hardware accelerated physics) for lock-in. CUDA wasn't about adding value, it was about lock-in.
In a way it's both.

JHH is a big fan of Apple's approach. There all software is lock-in, you officially just can't use Apple software without Apple hardware, outside of Apple's ecosystem. But many people don't perceive this as lock-in. Instead it's promoted and commonly perceived as a value add, often perceived as inseparable from the hardware. Even though the connection is artificial. But e.g. by doing away with x86 hardware (and boot camp) Apple will be able to push this angle even more.

Nvidia would love to be able to get into the same position.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
It suggests that the hardware guys have most power within the company and that there is a disconnect between the hardware and software divisions, where the former are fairly flippant about the latter. "It's just software, it's not that hard."

Arguably, Intel had the same issue with Arc where management appeared convinced that Arc could release on the schedule set by the hardware guys, until it turned out that it is hard to play games without a working driver.

All these companies are founded and ran by hardware people, so it's not surprising that they would have a natural affinity for hardware, but not so much for software.

Perhaps the reason why Nvidia is so successful and probably the biggest strength of Jensen is that he doesn't just see the software as this thing you have to do to make the hardware usable, but as something that can itself add something to the product that makes it more valuable. That's pretty open-minded from an electrical engineer for whom software is rather foreign.
This goes back to the 90's where Carmack gave an interview where he said ATI generally has better hardware on paper, but Nvidia performed better in actual games. Seems like a long time to still be screwing this up.

FP workloads in games are about 70% of compute, so an extra FP unit would, optimally, give a 40% boost. Obviously, it wasn’t going to generally be that high since there are scheduling conflicts and such that lower that number. Still, that’s allot of performance left on the table based on the Chips and Cheese article. They mention Nvidia's game ready drivers' program where NV engineers embed in a company making a AAA title and optimizing critical shader paths by hand to get the best performance.

Too bad the newer (larger) Chinese driver team isn't that much better than the old Boston team.
 

Leeea

Diamond Member
Apr 3, 2020
3,695
5,428
136
I dont know if it was China or Taiwan - but yes. I’ll see what I can find.
um...
https://www.indeed.com/viewjob?jk=60a15a8aeb35b83c&tk=1h25s7uh428iq000&from=serp&vjs=3






at this point I could keep making links but I feel your post is without merit.


ok, here is the way to disprove it:

put up the stuff for China. There is nothing GPU driver there. The closest is a machine learning dev. Probably Chinese localization, which on machine learning is going to be a bit of a project.


took me 20 minutes to put this together.
 
Last edited:
Jul 27, 2020
17,849
11,642
116
If the hardware guys really do feel superior at AMD, it must be a real headache for the driver writers to get advice from them on how best to use the hardware they designed. I bet the standard response from anyone on the hardware team to a query from the software team is,

"RTFM!".

I once completed a task in three days for which I was given a week. I told my manager, expecting to at least receive a "pat on the back" kind of response. What did I get? "That's why you were hired". So he thought that it was just me doing my job. If that kind of attitude is pervasive in the software division at AMD, no wonder that few, if any, on that team are interested in doing more than the absolute minimum.
 
Last edited:

deasd

Senior member
Dec 31, 2013
553
867
136

gdansk

Platinum Member
Feb 8, 2011
2,489
3,379
136
For APUs at least. It also lists Zen 3 with Vega but Zen 3 launched simultaneously with RDNA2 discrete GPUs. So I get the impression it is talking about APUs. It doesn't confirm anything about discrete GPUs but it wouldn't be surprising.

Does this also imply we will be getting Phoenix on AM5 this year? Or is the socket stuff unrelated? weird slide.
 
Last edited:
Reactions: Tlh97 and Ajay

jpiniero

Lifer
Oct 1, 2010
14,831
5,444
136
For APUs at least. It also lists Zen 3 with Vega but Zen 3 launched simultaneously with RDNA2 discrete GPUs. So I get the impression it is talking about APUs. It doesn't confirm anything about discrete GPUs but it wouldn't be surprising.

Does this also imply we will be getting Phoenix on AM5 this year? Or is the socket stuff unrelated? weird slide.

Yeah it's about the APUs. It does imply that they might release Phoenix on AM5.
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,149
136
It's happening! Finally Navi3.5 (RNDA3+) confirmed to be exist

Nice.

But to be pedantic, RDNA 3+ was confirmed to exist way beforehand
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
um...
https://www.indeed.com/viewjob?jk=60a15a8aeb35b83c&tk=1h25s7uh428iq000&from=serp&vjs=3






at this point I could keep making links but I feel your post is without merit.


ok, here is the way to disprove it:

put up the stuff for China. There is nothing GPU driver there. The closest is a machine learning dev. Probably Chinese localization, which on machine learning is going to be a bit of a project.


took me 20 minutes to put this together.
Well, I guess I'll have to eat my hat. Google search is coming up with allot of bizarre results for me that don't make any sense given my search text. There was a fair bit a noise about this a few years ago, but all I can find are a few reddit posts that simply say blah blah blah chinese driver team. No details.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,423
2,914
136
I think it would be quite a bit larger than the that die size and even with those specs, there a good chance it would lose to full ADA102.

Just look at Navi 31 vs AD103 at the moment.

12288(6144) vs 9728(4854) Cores 25.8% advantage.
384 vs 304 TMU 26.3% advantage
192 vs 112 ROPS 71.4% advantage
960 vs 716GB/sec 34.1% advantage.

You would think Navi 31 would smash AD103 with that advantage, yet there is only a 2-5% advantage for Navi 31 vs AD103 in raster. Compare your imaginary N30 die and you will see the advantages for N30 are even less than navi 31 vs AD103 by quite a bit.

20480(10240) vs 18432(9216) 11% advantage
640 vs 576 11% advantage
256 vs 192 33% advantage
1280 vs 1084GB/sec 18% advantage.
Why do you think It would be a lot larger than 428mm2? This is already 42% bigger than N31 GCD.

12288(6144) vs 9728(4854) Cores 25.8% advantage is really only on paper.
The shaders increased only by 20%(96CU vs 80CU), but now are capable of dual-issue, but honestly It does very little to performance.
N31 vs N21:
18% higher median clockspeed from TPU reviews(1,2), 20% more shaders and 49% higher performance.
I got about 5% for that dual-issue(149/1.18/1.2=1.05).
On the other hand, moving from Turing to Ampere provided ~25% of performance.(RTX 2080 vs RTX 3070)
So in reality It is 6144*1.05 vs 4854*1.25 = 6% difference.

If shaders are a bottleneck, then It doesn't really matter how much more TMUs, ROPs or BW N31 has over AD103, It won't increase performance in my opinion.

With this is mind, there is nothing to suggest your imaginary N30 would beat full AD102. If your only evidence is the poor scaling of AD103 vs AD102, there is nothing to suggest that Navi 30 would scale well that high, particularly since one of the biggest bottlenecks(bandwidth) would only increase 33% vs Navi 31. Also as I mentioned, the RTX 4090 only has 12.5% more L2 cache than a RTX 4080. This along with the modest increase of bandwidth vs last gen is likely a bottleneck which Full AD102 with 33% more cache won't have. Add in the reduction of clocks to accommodate that huge increase in specs for N30 and the problems NAVI 31 is already having being increased from being a more complex chip, I think you would likely get something that combines some of the worst parts of VEGA and Fermi and ultimately a loss.
This imaginary N30 wouldn't be limited by BW, both IC and BW would increase by 33% compared to N31.
You think that's not enough?
Imaginary 160CU N30 vs 80CU RX 6900XT would have a more capable IC of the same size and BW would be 2.5x higher. I think that's more than enough.

If RTX 4090 had a memory bottleneck as you think, then It doesn't make much sense for It to use slower GDDR6x chips than RTX 4080 or to have only 12.5% more L2 in my opinion.
Even If there is memory bottleneck, then I don't think performance would be more than 20% higher than RTX 4090, but in this case It should be still faster than N30. If the scaling is bad for N30, then I see N30 somewhere in the middle between RTX 4090 and Full Ada102.

Only CUs saw a huge increase in specs(+66.7%) in my imaginary N30.
ROPs, BW and IC were increased by only 33%, I also reduced clocks by 10% and precisely high clocks are a big problem for N31, so this will actually help.
RTX 4090 having +68% SM(Cuda, TMU), +57% ROPs, +12.5% L2, +41% BW, +50% Vram compared to RTX 4080 resulted in only 41% higher TBP.
Imaginary N30 has +66.7% SM(Shaders, TMUs), +33.3% ROPs, +33.3% L2, +33.3% BW, +33.3% Vram and -10% clockspeed should be enough for 500W TBP at worst.
Full Ada102 should also have 500W TBP in my opinion.

So why would this imaginary N30 be a FLOP combining some of the worst parts of Vega and Fermi?
The only real disadvantage I see is the performance and perf/W with RT enabled, where the difference would be larger than 15%, but this could be mitigated by a competitive price I mentioned before.
 
Last edited:
Reactions: Tlh97

menhera

Junior Member
Dec 10, 2020
21
66
61


I thought RDNA 3 was a failure at first, but seeing my RDNA 2 GPU suffer from a lack of registers over and over changed my mind. AMD was right to increase VGPR in RDNA 3. TLOU loves the register-hungry wave64 mode. 8 WGPs per SE in 2023 is still odd though.
 

Leeea

Diamond Member
Apr 3, 2020
3,695
5,428
136
AI job but they prefer GPU performance optimization experience:

View attachment 81381

Next one requires GPU experience so presumably the person could get some driver work if they have nothing else to do?

View attachment 81382

Linux driver developer job

View attachment 81383
nice find! Nicer then you think!

Android dev is super interesting.

So it isn't just Samsung!

Care to guess who else is dropping AMD graphics onto an Android system? It is in China though. Limits the options. Oneplus? ZTE?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |