Question Speculation: RDNA2 + CDNA Architectures thread

Page 13 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,698
6,393
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
Speculation unless proven otherwise:

5nm RNDA2+ (Not RDNA3)
~100 mm2
>3 GHz GPU boost (~3.8 non-FMA TFlops)
3.2 GHz HBM2E (409.6 gigabyte/s)
MP by Q2 2021.

Supposedly, the secret is that the successor will be scaled to 8x MGPU via InfArch => 8 small GPUs with 8 HBM2E stacks (~1440(if same die size and HBM2E) on 1700 interposer(gen 2 version)).
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,413
2,906
136
Speculation unless proven otherwise:

5nm RNDA2+ (Not RDNA3)
~100 mm2
>3 GHz GPU boost (~3.8 non-FMA TFlops)
3.2 GHz HBM2E (409.6 gigabyte/s)
MP by Q2 2021.

Supposedly, the secret is that the successor will be scaled to 8x MGPU via InfArch => 8 small GPUs with 8 HBM2E stacks (~1440(if same die size and HBM2E) on 1700 interposer(gen 2 version)).
These specs are at first glance simply wrong.
1. >3Ghz GPU boost? Very very unlikely.
2. 10CU -> 640SP at 3GHz that's 3.84TFlops FP32. A 10CU GPU with HBM can't have only 100mm2 on 5nm process, when Navi 14 has 24CU 128bit GDDR6 and It's size is just 158mm2 on 7nm process.
3. A GPU with 3.8TFlops certainly doesn't need 410GB/s when a Navi 10 with 2.5x more TFlops has only 448GB/s.
 

soresu

Platinum Member
Dec 19, 2014
2,921
2,141
136
>3 GHz GPU boost (~3.8 non-FMA TFlops)
Hhhahahahha, damn I needed a good laugh.

Ladies and gentlemen, I think we found our next space heater!

Unless those are DP FLOPS that can't be more than about 10 CU's at 3 Ghz.

Is it meant to be Rembrandt or something?

Because 3 ghz GPU boost for an APU sounds insanely excessive given it's more than a 70% clock increase from Renoir.

Edit: If it is supposed to be derivative of the PIM Exascale chiplet design, with HBM stacked on logic then it seems even less likely given that the PIM design was very thermally sensitive (ie logic hotspots cooking HBM is bad) and didn't go above 800 mhz efficiently (the logic, not the memory).
 
Last edited:
Reactions: Tlh97 and FaaR

soresu

Platinum Member
Dec 19, 2014
2,921
2,141
136
A GPU with 3.8TFlops certainly doesn't need 410GB/s when a Navi 10 with 2.5x more TFlops has only 448GB/s.
While I do think that the rest is unlikely, it does seem likely that such a GPU would need HBM to achieve some degree of areal efficiency, and HBM2E simply has that bandwidth capacity even without going to the absolute limits, which I believe are currently 3.6 ghz?

I'm not sure if higher bandwidth would be necessary with up to 8 chiplets sampling data from a single stack of HBM, but it seems possible that separating the logic would incur some bandwidth overhead in memory access.
 

Saylick

Diamond Member
Sep 10, 2012
3,362
7,062
136
Speculation unless proven otherwise:

5nm RNDA2+ (Not RDNA3)
~100 mm2
>3 GHz GPU boost (~3.8 non-FMA TFlops)
3.2 GHz HBM2E (409.6 gigabyte/s)
MP by Q2 2021.

Supposedly, the secret is that the successor will be scaled to 8x MGPU via InfArch => 8 small GPUs with 8 HBM2E stacks (~1440(if same die size and HBM2E) on 1700 interposer(gen 2 version)).
<4 non-FMA TLOPS at the price of a 100mm2 5nm die... That's pretty bad area efficiency, no? I mean, Navi 10 already achieves ~10 TFLOPS using ~250mm2 on 7nm (or 4 TLOPS/100mm2). From using 5nm alone, AMD should be able to get at least 1.5x perf/area if we assume 5nm is 1.7x the density of 7nm, even with everything scaled properly. Are you implying there's a ton of other dedicated, fixed function acceleration hardware in there that we're not aware of?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
I mean, Navi 10 already achieves ~10 TFLOPS using ~250mm2 on 7nm (or 4 TLOPS/100mm2).
251 mm2
1.905 GHz GPU Boost * 40 CUs * 64 ALUs => 4.8768 TFlops

251 / 4.8768 => 51.468175853 mm2 per non-FMA TFlop

~100 mm2
~3.000 GHz GPU * 20 CUs * 64 ALUs => ~3.8400 TFlops

~100 / 3.8400 => 26.041666667 mm2 per non-FMA TFlop

Which is 1.976377953 more area efficiency.

since Area and Frequency isn't really defined but if you lock frequency exactly at 3 GHz. Then, the actual die will probably be closer to 112 mm2.

28nm(early) => 185.8108 mm2 per TFlop
28nm(later) => 155.5398 mm2 per TFlop
~1.95x density
14nm(early) => 79.5375 mm2 per TFLop
14nm(later) => 65.1744 mm2 per TFlop
~1.25x density
7nm => 51.4682 mm2 per TFlop
 
Last edited:
Reactions: Tlh97 and Saylick

soresu

Platinum Member
Dec 19, 2014
2,921
2,141
136
Navi 10 - 251 mm2
1.905 GHz GPU Boost * 40 CUs * 64 ALUs * 2 => 9.7536 TFlops FP32
You didn't multiply the result by 2, so that's why you got only 1/2 TFlops. No wonder everyone was confused, when It wasn't a 10CU but actually 20CU GPU.
Mathematically the TFLOPS number has always worked out the same since the first iteration of GCN (SI/Southern Islands):

x CU's * 128 * y Ghz = z TFLOPS.

Currently no change from RDNA has upset this metric, and I doubt it will for the foreseeable future because of the need of retaining b/w compatibility on future game consoles for Sony/MS.

Sometimes I really do wonder just how much the console co ventures are restricting AMD in terms of uArch development for their GPU's.

At least CDNA is free of that leash, and perhaps that was the main driving force for it's development in the first place - giving them free reign to develop new compute uArch iterations unencumbered by GFX specific considerations, or any specific customer considerations.

Going forward we might see RDNA dividing into a 'legacy' branch as it were for console compatibility, and a more CDNA inspired branch as and when such features may become beneficial with increased use if nVidia decides to push them.
 
Last edited:

Geranium

Member
Apr 22, 2020
83
101
61
Sometimes I really do wonder just how much the console co ventures are restricting AMD in terms of uArch development for their GPU's.

At least CDNA is free of that leash, and perhaps that was the main driving force for it's development in the first place - giving them free reign to develop new compute uArch iterations unencumbered by GFX specific considerations, or any specific customer considerations.

Going forward we might see RDNA dividing into a 'legacy' branch as it were for console compatibility, and a more CDNA inspired branch as and when such features may become beneficial with increased use if nVidia decides to push them.
Console and Apple Laptop/Desktop kept AMD's graphics division afloat compared to their dedicated gpu business. With the money from this two sources Radeon division wouldn't have the money for R&D.
 
Reactions: Tlh97 and moinmoin

DisEnchantment

Golden Member
Mar 3, 2017
1,672
6,150
136
Currently no change from RDNA has upset this metric, and I doubt it will for the foreseeable future because of the need of retaining b/w compatibility on future game consoles for Sony/MS.
The compatibility with Sony/MS has nothing to do with the flops per cycle. Since a very long time ago each SIMD ALU can do two flops per cycle.
xCUs * 4 SIMDs/CU * 16ALUs/SIMD * 2 Flops/Hz * y GHz = z TFLOPs [GCN]
xCUs * 2 SIMDs/CU * 32ALUs/SIMD * 2 Flops/Hz * y GHz = z TFLOPs [RDNA]

Compatibility with GCN has to do with wave size which is 64 in GCN and 32 in RDNA. RDNA can operate in wave 64 to handle the shaders written for GCN wave64 mode.
In native mode wave 32, RDNA dispatch one wave in one cycle compared to 4 cycles for GCN to dispatch a wave 64. RDNA can therefore achieve very high shader occupancy in native wave 32 and has lower latency for wave 64.

Going forward we might see RDNA dividing into a 'legacy' branch as it were for console compatibility, and a more CDNA inspired branch as and when such features may become beneficial with increased use if nVidia decides to push them.
CDNA cannot evolve into a graphics architecture because it loses all the geometry processing capability and the Micro Engine and Micro Engine Scheduler (ME and MES) aka graphics command processor.
CDNA will retain the Micro Engine Compute / MEC which is just a command processor for Compute shaders. The MEC also known as ACE is quite suitable for Compute loads.
RDNA is the future of graphics for AMD
 
Reactions: Tlh97 and Elfear

soresu

Platinum Member
Dec 19, 2014
2,921
2,141
136
Console and Apple Laptop/Desktop kept AMD's graphics division afloat compared to their dedicated gpu business. With the money from this two sources Radeon division wouldn't have the money for R&D.
I fail to see the point you are making.

Financing for RDNA iteration R&D need not have anything to do with CDNA development directions - quite the opposite in fact, which was my point, CDNA frees them from uArch design dependency on the likes of Sony and MS.

Besides which, the Apple part of their custom division only accounts for a small fraction of their Sony/MS business, and likely did not even exist in the time of RDNA's early development - the fact that Navi 12 is a more errata free iteration on the same RDNA1 uArch in Navi 10 and 14 supports this.

Given Apple are breaking away from contracting AMD it was and is clearly a good direction to be going in - doubly so if there is any lack of certainty over mid term console refreshes, let alone another full console generation after PS5/XSX.

Though returning to your "keeping afloat" point, to be brutally accurate, the entire Radeon/RTG division first kept the CPU division afloat during the Bulldozer mess - back then consoles were not the only driving earner for AMD during early the GCN era, before either intrinsic uArch shortfalls or iteration R&D mismanagement broke its scalability.

The pendulum then swung around during the late Volcanic Islands to Polaris/Vega timeframe, when the newly rebounding CPU group under Zen was keeping the GPU group afloat, in combination with semi custom deals from various sources*.

*Including mid term and next gen Sony/MS console SoC's, Subor, Apple, and the Hygon licensing deal.
 

soresu

Platinum Member
Dec 19, 2014
2,921
2,141
136
CDNA cannot evolve into a graphics architecture because it loses all the geometry processing capability and the Micro Engine and Micro Engine Scheduler (ME and MES) aka graphics command processor.
Cannot is a big and very final word in such a changeable industry.

Just because it looks that way now does not mean that it will always be that way.

Future RDNA need not take all of CDNA, it may just take pieces ala the ML optimised tensor silicon in smaller doses.

ML compute is the next big thing - quite possibly even bigger than fixed RT acceleration, and while it is not there in real time graphics yet, there are already indications of it becoming important in the future, even if just to reduce the considerable load that RT brings to graphics.

Something that is even more important in the mobile arena - where power consumption is everything, and regular RDNA CU's will not be as efficient as what they are cooking up for CDNA tensor silicon.

People are so distracted by the more obvious denoising/DLSS angle that nVidia are taking with ML - to the point that they don't realise there is further potential for ML to become a far more important part of graphics and gaming compute in the future.

This extends to areas like fluid procedurally generated animation, NPC AI, and player avatar generation (ie accurate personal photo based avatars, not the cludgy parameter based avatars provided by the game developer).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |