Question Speculation: RDNA2 + CDNA Architectures thread

Page 22 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136


Some Observations

Cache

Size of L2 slice is same like Navi10. Well that is a bummer. I was hoping to see big jumps there for BW amplification.
Hoping for more passthrough modes from L0 to L2.

Multi core Command Processor.
It is the same Dual GFX pipe like in Sienna. If this is implemented in XSX So RDNA2 could possibly schedule and keep track of multiple shaders wavefronts in flight. In addition the ACE can already do Compute shaders without using the Command Processor.
This is something.

Unified Geometry Engine
I think they finally got NGG in the shape they envisioned. I heard devs saying the GE doubled the number of culling of primitives/clock compared to N10.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
Power is a cubic relation to clocks, if I'm not mistaken.

130W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 263W
140W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 283W
150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 303W

Also, assuming the PS5 is similar in architecture:

130W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 164W
140W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 177W
150W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 189W

EDIT: Adding in some more power ranges.
I believe power rises as a square not cube.

130W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 240W
140W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 258W
150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 277W

140W * (80 CUs / 52 CUs) * (2.2 GHz / 1.825 GHz)^2 = 312W

Now slap in some HBM and we could see some really impressive power figures.
Issue though could be the 7nm heat density which is troublesome on Zen2 as well.
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
So a 72CU GPU running at 2Ghz would have 18.43 TFLOPS of compute power. The PS5 shows us that RDNA2 can support clocks of up to 2.23 Ghz, though thermal constraints might limit that. A 2.2 Ghz 72CU card would have around 20 TFLOPs of FP32 performance. By comparison, the 5700XT has 9.754 TFLOPS of FP32 performance.

EDIT: based on available information, it looks like Big Navi will be around 30% faster than a 2080ti.
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,611
1,813
136
I believe power rises as a square not cube.

130W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 240W
140W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 258W
150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 277W

140W * (80 CUs / 52 CUs) * (2.2 GHz / 1.825 GHz)^2 = 312W

Now slap in some HBM and we could see some really impressive power figures.
Issue though could be the 7nm heat density which is troublesome on Zen2 as well.
P=CV²f
You need more voltage for higher frequencies, so if voltage rises linearly with frequency then you end up with a cubic relation for power; the square from increased voltage and power dissipated per cycle, and another from the increased frequency itself.
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
For ever 10% clock frequency increase power rises by 23%.

Also CU power does not scale linearly. You may have 50% higher number of CUs in a particular GPU, clocked at the same frequency and power may increase only by 30%.

Big Navi Power targets are 250-275W, at this very moment. And Im confident about this last information.
 

exquisitechar

Senior member
Apr 18, 2017
683
940
136
So a 72CU GPU running at 2Ghz would have 18.43 TFLOPS of compute power. The PS5 shows us that RDNA2 can support clocks of up to 2.23 Ghz, though thermal constraints might limit that. A 2.2 Ghz 72CU card would have around 20 TFLOPs of FP32 performance. By comparison, the 5700XT has 9.754 TFLOPS of FP32 performance.

EDIT: based on available information, it looks like Big Navi will be around 30% faster than a 2080ti.
Navi21 has 80 CUs, why are you using 72 CUs? Don’t tell me it’s because a YouTuber said it...
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
Yields? Also last I checked, AMD hasn’t released the official specs.

EDIT: It has nothing to do with YouTube. The only channel that isn’t complete garbage there is GN.

Also, power constraints. AMD could definitely sell an 80CU part, but at lower clocks. 80CUs at 2.2 Ghz would still consume more than 400W assuming a 50% power improvement over RDNA1.
 
Reactions: Tlh97

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
P=CV²f
You need more voltage for higher frequencies, so if voltage rises linearly with frequency then you end up with a cubic relation for power; the square from increased voltage and power dissipated per cycle, and another from the increased frequency itself.

You can’t compare console power consumption. The console chip will always be more efficient than a GPU because some components are shared, such as memory.

EDIT: That was meant as a general comment for those who are attempting to estimate power consumption.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136

130 to 140W of power for the GPU portion.

Raghu78 was right .

I had estimated the Series X GPU with 16GB GDDR6 at 140-150w. In reality its slightly better. Series X GPU power draw is same as Xbox One X. But since Series X SoC has 8 Zen 2 cores at 3.66 Ghz with SMT (3.8 Ghz SMT off) the CPU portion will draw roughly 55w. The entire SoC will draw around 200w.

Based on this data I am even more confident that Navi 21 with 80CU can deliver 21 TF at 275w. Nvidia is gong to require > 350w to deliver the same performance if the current rumours are true.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
P=CV²f
You need more voltage for higher frequencies, so if voltage rises linearly with frequency then you end up with a cubic relation for power; the square from increased voltage and power dissipated per cycle, and another from the increased frequency itself.

While P=CV²f is totally correct - V and f are not proportional. In essence you cannot simply assume a cubic relation.
 
Reactions: Tlh97

Konan

Senior member
Jul 28, 2017
360
291
106
Hmmmm. RDNA2 implementation of RT shows that the RT operations is sharing with Textures saying you can either do one or the other but not both at the same time. Won't that impact overall RT performance delivery??
Why can't they be done at the same time?
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Just to put in perspective the improvement in GPU perf from Xbox Series X vs Xbox One X.

Since RDNA2 is expected to deliver higher perf/clock if we assume that RDNA2 delivers 1.15x perf/clock vs RDNA

12 RDNA2 TFlops = 13.8 RDNA TFlops

According to the latest pcgameshardware review a 13.4 TF Radeon VII (avg clock of 1750 Mhz) is 3% faster than a 9.2 TF Radeon RX 5700XT (avg clock of 1800 Mhz)



Thats roughly 3% better perf for 45% raw flops for Radeon VII. 1.45/1.03 = 1.41. It would be reasonable to say 1 RDNA flop = 1.4 Vega GCN flop

Multiplying this 1.4x we get

12 RDNA2 TFLOPS = 13.8 * 1.4 = 19.32 GCN TFLOPS

Thats 3.22x the perf of Xbox One X GPU in the same power. This has been delivered in 3 years (Xbox One X - Nov 2017, Xbox Series X - Nov 2020). BTW Xbox One X is a mid gen console refresh. If we take the OG Xbox One at 1.3 TF the improvement is roughly 15x. This next gen console is going to challenge PC gaming like no other previous console gen as it has desktop class CPU with 8 Zen 2 cores at 3.66 Ghz. With a mid gen console refresh in 2023 or 2024 the mainstream PC GPU in the $350-$400 price range will continue to be challenged till atleast 2025. PC gaming will never be the same again.
 
Last edited:

DXDiag

Member
Nov 12, 2017
165
121
116
RDNA2 implementation of RT shows that the RT operations is sharing with Textures saying you can either do one or the other but not both at the same time. Won't that impact overall RT performance delivery??
Yes it will impact it, RT with RDNA 2 will be slower than RT in Turing.
 

DXDiag

Member
Nov 12, 2017
165
121
116
Nope DirectML is just an API, AMD will need to provide a model that competes with DLSS 2 on top of it (a tall order), and then provide the hardware (tensor units) to allow it to run fast enough on their GPUs, which means AMD won't have any DLSS 2 competitior any time this gen, as they lack both the model and the tensor units with RDNA 2.
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
Yes it will impact it, RT with RDNA 2 will be slower than RT in Turing.
Based on?

You do realise modern GPUs heavily over-budget on texture units, right? Re-allocating some of those TMUs to work on RTRT shouldn't cause a noticable bottleneck in practice.

I would suggest waiting until the final product before making claims on RTRT performance. It could be better, it could be worse. Who knows? What we do know is: it's a noticably different method to Turing so we can't judge them just yet.
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
Nope DirectML is just an API, AMD will need to provide a model that competes with DLSS 2 on top of it (a tall order), and then provide the hardware (tensor units) to allow it to run fast enough on their GPUs, which means AMD won't have any DLSS 2 competitior any time this gen, as they lack both the model and the tensor units with RDNA 2.

What makes you think they haven't been working on a DLSS competitor for a while?

As for fast enough hardware, again, you're assuming that tensor cores are 100% necessary, but ever since Navi14 (so Navi12 as well) rapid packed math has been extended to INT4 and INT8 as well (packing 8 and 4 at the same time respectively).

Depending on the model, this could be more than enough processing power to be able to perform the algorithm on RDNA2. Remember how DLSS1.9 worked on just shaders? Sure, the algorithm wasn't as potent as DLSS2.0, but to my knowledge Turing doesn't support INT4 or INT8 packing on shaders anyway.

It'll depend on the game and how computationally expensive DLSS2.0-tier algorithms. They're a step in the middle of the pipeline after all. The key will just be on how much upscaling they can do at a minimal cost to time to render the frame.
 
Reactions: Tlh97 and lobz

DXDiag

Member
Nov 12, 2017
165
121
116
You do realise modern GPUs heavily over-budget on texture units, right? Re-allocating some of those TMUs to work on RTRT shouldn't cause a noticable bottleneck in practice.
Your post in nothing more than damage control at this point, a shared unit can never beat a dedicated unit, worse yet, RDNA2 does BVH traversal on the shaders as well (Turing does it on RT cores). So RT acceleration is shared on two levels with RDNA2, not one.

And no Textures units are not overbudgeted on modern GPUs, they are just the right amount for regular texturing, 16X AF filtering, and texture heavy shaders and effects.






As for fast enough hardware, again, you're assuming that tensor cores are 100% necessary,
More damage control, Tensores are not necessary, but they are fast enough to offset any performance loss due to using ML to upscale the image, without tensors the loss would be bigger.
 

exquisitechar

Senior member
Apr 18, 2017
683
940
136
Based on?

You do realise modern GPUs heavily over-budget on texture units, right? Re-allocating some of those TMUs to work on RTRT shouldn't cause a noticable bottleneck in practice.

I would suggest waiting until the final product before making claims on RTRT performance. It could be better, it could be worse. Who knows? What we do know is: it's a noticably different method to Turing so we can't judge them just yet.
Indeed, plus Turing’s implementation has its own deficiencies that RDNA2 doesn’t.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |