Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146

Mopetar

Diamond Member
Jan 31, 2011
8,104
6,730
136
It's possible that the I/O die is 6/7nm and that's why AMD saying "advanced node".

We now have kopite saying Rdna3 is a bigger increase over RDNA2 than RDNA2 was over RDNA1. Kitty saying RDNA3 is aiming over 2.5x performance and Redgamingtech also saying it's over 2.5x.

Based on the MacOs driver leak I think we are looking at 240 CU's for Navi31, basically 3x 80CU's chiplets. This just make more sense than 160CU's.

To hit above 2.5x performance with 160CU's you need 1.2x clocks and 1.3x ipc gain as you don't get perfect scaling when increasing Teraflops. This I think is possible but 240CU's is more likely.

Kopite is a good deal more credible when it comes to rumors, but there's still a matter of what 2.5x performance actually means. The easiest answer is that it's just a matter of raw FLOPs because that's easiest to measure from the theoretical perspective before you even have the finished silicon. There's also the possibility that it's in reference to something more specific and if you were to tell me it was in reference to RT performance, then that's not too difficult to believe since there's far more room for improvement there than in a general sense.

A 240 CU GPU is almost assuredly a professional card where the absolute FLOPs is a better indication of performance than it would be in gaming where cards with fewer shaders at higher clock speeds will have better performance over something designed to be run at much lower clocks in order to keep the power draw under control. Apple doesn't care much about gaming performance, but they would want a high-end GPU for their professional users regardless of how many CUs it ultimately has.
 
Reactions: Tlh97

Trumpstyle

Member
Jul 18, 2015
76
27
91
Kopite is a good deal more credible when it comes to rumors, but there's still a matter of what 2.5x performance actually means. The easiest answer is that it's just a matter of raw FLOPs because that's easiest to measure from the theoretical perspective before you even have the finished silicon. There's also the possibility that it's in reference to something more specific and if you were to tell me it was in reference to RT performance, then that's not too difficult to believe since there's far more room for improvement there than in a general sense.

A 240 CU GPU is almost assuredly a professional card where the absolute FLOPs is a better indication of performance than it would be in gaming where cards with fewer shaders at higher clock speeds will have better performance over something designed to be run at much lower clocks in order to keep the power draw under control. Apple doesn't care much about gaming performance, but they would want a high-end GPU for their professional users regardless of how many CUs it ultimately has.
I think all 3 of them were talking about gaming performance (raster) and not RT or Teraflops. People are mentioning RT or Teraflops because 2.5x does indeed look very high.

I consider Kitty just as reliable as Kopite this is why I think we're looking at 240CU's for Navi31 after his latest tweet a few days ago saying RDNA3 is above 2.5x performance.


Most people are assuming atm 160CU's for Navi31.
 

Trumpstyle

Member
Jul 18, 2015
76
27
91
Zero chance Navi31 has 240 CUs as that is the same config that MI200 uses, and both front-end and back-end are significantly larger in RDNA vs CDNA/Vega.
You're the dude that spread the 160CU's rumor, I don't see what MI200 has anything to do with RDNA3, I don't follow that stuff.

What is the different between navi31 and Navi32 you think?
 

soresu

Diamond Member
Dec 19, 2014
3,208
2,480
136
I think all 3 of them were talking about gaming performance (raster) and not RT or Teraflops. People are mentioning RT or Teraflops because 2.5x does indeed look very high.

I consider Kitty just as reliable as Kopite this is why I think we're looking at 240CU's for Navi31 after his latest tweet a few days ago saying RDNA3 is above 2.5x performance.


Most people are assuming atm 160CU's for Navi31.
240 CU's seems extremely unlikely given AMD probably had a good idea of fab capacity constraints coming before Zen3 and RDNA2 were announced last October.

Much more likely is 10-13% clock and 10-13% IPC (FPS per Mhz per CU) boost for each GCD chiplet.

Multiplied by 2 CU's gives a more or less a 2.5x performance increase from the RDNA2 flagship GFX card.

Obviously this is not counting overheads and they may be comparing using a favourable game engine.

I would not at all be surprised to see RT performance gain by more than 25% per CU though - given how early we are in the RT HW saga and how much low hanging fruit likely left to be picked with a base µArch to build on it seems guaranteed that 2.5x would be conservative on that score if they can manage so much with raster gfx.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
240 CU's seems extremely unlikely given AMD probably had a good idea of fab capacity constraints coming before Zen3 and RDNA2 were announced last October.
240 CUs would indeed be a behemoth, I would hazard a wild guess of something upwards of 75 billion transistors for a complete chip if there is one. ~1000mm2 at N5
Not something they would make in current supply environment.
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
240 CUs would indeed be a behemoth, I would hazard a wild guess of something upwards of 75 billion transistors for a complete chip if there is one. ~1000mm2 at N5
Not something they would make in current supply environment.
...I mean, I'd argue that 160CUs on a single package is also a bit too much for the current supply environment as well, but you know.
 

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
Once AMD is going chiplets for its GPUs you can bet they will also max out the technical possibility of the design. AMD isn't one to hold back in that regard (remember when many thought AMD would really stop at 12 cores for AM4 when they drip fed the Zen 2 announcements?).
 

Kepler_L2

Senior member
Sep 6, 2020
525
2,131
136
You're the dude that spread the 160CU's rumor, I don't see what MI200 has anything to do with RDNA3, I don't follow that stuff.

What is the different between navi31 and Navi32 you think?
A 240 CU RDNA3 GPU would be significantly larger than a 240 CU CDNA GPU, and there's no chance AMD would sell a larger die for a lower price to gamers vs selling the insane high profit margins datacenter GPUs.

So far I haven't seen Navi32 anywhere so I don't know what it is.
 
Reactions: Tlh97

soresu

Diamond Member
Dec 19, 2014
3,208
2,480
136
-Dont see why they'd have to with IC basically relieving the bandwidth pressure in a huge way.
There's more to it than that bandwidth alone though.

If they can reduce cost significantly with HBM3 the benefits of total package size are significant vs GDDRx which takes up far more room on a PCB in combination with the main chip package.

Going forward to more 3/2.5D integration and chiplets it seems a natural move to go with HBM in the long term if the price is right.

In fact I do wonder about Raphael and whether it uses HBM or not.

IC in RDNA2 will reduce the bandwidth constraints but it will still have to share with the CPU cores, which even on DDR5 won't be great unless Raphael isn't much of an increase in CU count from Rembrandt.
 
Reactions: Tlh97

soresu

Diamond Member
Dec 19, 2014
3,208
2,480
136
A 240 CU RDNA3 GPU would be significantly larger than a 240 CU CDNA GPU
Where did you get this idea from?

CDNA adds at least as much as it takes away from Vega I think with the matrix units.

The huge 120 CU size of Arcturus (750mm2?) seems to me viable only because of the greater expense datacenter/enterprise/supercomputer customers are willing to pay for them.
 
Reactions: Tlh97

Kepler_L2

Senior member
Sep 6, 2020
525
2,131
136
Where did you get this idea from?

CDNA adds at least as much as it takes away from Vega I think with the matrix units.

The huge 120 CU size of Arcturus (750mm2?) seems to me viable only because of the greater expense datacenter/enterprise/supercomputer customers are willing to pay for them.
Literally everything is larger in RDNA vs CDNA/Vega. CUs, caches, command processor, wavefront launcher... Not to mention everything that RDNA has that CDNA doesn't (display engine, ROPs, etc).

My point is there is no chance AMD would launch a 1200mm² total die size GPU for gamers for $2000 when they could launch a 1000mm² GPU for $9000 for datacenter clients.
 

Tup3x

Golden Member
Dec 31, 2016
1,079
1,079
136
Crazy speed increase might be possible if it's some kind of multi die thing and they run those dies at ideal speeds (power consumption wise). In any case massive monolithic chips are not the future.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
My point is there is no chance AMD would launch a 1200mm² total die size GPU for gamers for $2000 when they could launch a 1000mm² GPU for $9000 for datacenter clients.
To add to this, god knows how much money they are sinking in ROCm, LLVM, amdgpu, upstreaming all compute framework changes etc.
Not to say that the Gaming side of things is lesser expenditure on SW side, there too lots of development work, Drivers, FidelityFX suite, AMF, etc
 

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
Let's ignore demand for a second: AMD creates designs and orders a specific amount of chips/wafers in advance that strike a balance between recovering the cost spent for R&D on the design and the cost of producing the resulting chip packages. (We also know AMD in the past was curiously conservative about the order size.) AMD uses the chiplet approach to leapfrog the competition as has been seen with Epyc, Threadripper as well as the chiplet based Ryzen. If RDNA3 is chiplet based then I expect AMD to worry about the cost for single chiplets, not the total die size. If easily increasing the latter (the whole point going chiplet imo) allows them to leapfrog the competition I fully expect them to go for it.
 

Bigos

Member
Jun 2, 2019
151
365
136
To add to this, god knows how much money they are sinking in ROCm, LLVM, amdgpu, upstreaming all compute framework changes etc.
Not to say that the Gaming side of things is lesser expenditure on SW side, there too lots of development work, Drivers, FidelityFX suite, AMF, etc

I think you meant amdkfd, which is the AMD compute driver, as amdgpu is the graphics kernel-space driver for gfx6+ hardware.

I believe there are many parts shared between these drivers (like hardware description headers, initialization routines, etc.) but otherwise they are separate. They implement different kernel interfaces and I don't think they interop well. amdgpu is used as the kernel interface for both closed-source and open-source AMD user-space graphics drivers, while amdkfd is used only by ROCm, IIRC.

BTW, the LLVM AMDGPU backend is used both by ROCm compute stack, radeonsi OpenGL driver and amdvlk Vulkan driver. Though radeonsi is gearing towards using Valve-developed aco (because it is better, at least for graphics workloads) and amdvlk should not be used in practice (radv, that already uses aco, should be the preferred AMD Vulkan driver).
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
I think you meant amdkfd, which is the AMD compute driver, as amdgpu is the graphics kernel-space driver for gfx6+ hardware.

I believe there are many parts shared between these drivers (like hardware description headers, initialization routines, etc.) but otherwise they are separate. They implement different kernel interfaces and I don't think they interop well. amdgpu is used as the kernel interface for both closed-source and open-source AMD user-space graphics drivers, while amdkfd is used only by ROCm, IIRC.
You are right it is amdkfd, but a lot of the things are shared, smu, psp, ras, sdma, mes, umc, df, gmc, atombios... quite a lot actually.
 

scineram

Senior member
Nov 1, 2020
361
283
106
It's possible that the I/O die is 6/7nm and that's why AMD saying "advanced node".
Thsi doesn't actually makes sense. They had no problem putting 7 nm on Zen 3 and 5 nm on Zen 4 roadmaps.

Although thinking about this some more in 2023 I suppose AMD would want a monolithic APU based on Zen 4. That chip would have to have N5 graphics IP. So I guess RDNA 3 being 5 nm is more likely than I thought, and they really just want to hide it from Nvidia.
 

DiogoDX

Senior member
Oct 11, 2012
747
279
136
Kopite is a good deal more credible when it comes to rumors, but there's still a matter of what 2.5x performance actually means. The easiest answer is that it's just a matter of raw FLOPs because that's easiest to measure from the theoretical perspective before you even have the finished silicon. There's also the possibility that it's in reference to something more specific and if you were to tell me it was in reference to RT performance, then that's not too difficult to believe since there's far more room for improvement there than in a general sense.

A 240 CU GPU is almost assuredly a professional card where the absolute FLOPs is a better indication of performance than it would be in gaming where cards with fewer shaders at higher clock speeds will have better performance over something designed to be run at much lower clocks in order to keep the power draw under control. Apple doesn't care much about gaming performance, but they would want a high-end GPU for their professional users regardless of how many CUs it ultimately has.
2.5X raw TFLOP makes much more sense. Maybe the same 2x Fp32 pipes as Ampere.
 
Reactions: Tlh97 and Gideon
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |