Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 221 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146

adroc_thurston

Diamond Member
Jul 2, 2023
3,324
4,794
96
Still uses up transistors.
Yea but again, it's very spendy on raw xtors with none the perf to back it up.
50% perf/watt increase never means a 50% performance increase.
No, it means exactly what it means.
comparison like running 7900 XTX at lower power than standard, then comparing to a 6950 XT running flat out...
RDNA1 and RDNA2 perf/W quotes are both for biggest configs running at prod clocks so...
...you shouldn't be doing some really esoteric cope.
AMD did a dookie and by doing that accidentally lied to their investors.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
No it doesn't; it's a speed daemon design and the N5 implementation clocks nowhere near as fast as it should.

You can see that by the way it keeps clocking and gaining performance the more power you feed it.

I do wonder what they would have done with memory and cache had it clocked higher at 355W because it does seem like it needs more bandwidth as you approach 3 GHz core speeds. Makes me wonder if that is where the 3d stacked rumours came from, a design that would have been built if it clocked higher but was not needed because it wasn't needed clocking high enough.
 
Last edited:
Reactions: Tlh97 and Joe NYC

Kepler_L2

Senior member
Sep 6, 2020
474
1,927
106
You can see that by the way it keeps clocking and gaining performance the more power you feed it.

I do wonder what they would have done with memory and cache had it clocked higher at 355W because it does seem like it needs more bandwidth as you approach 3 GHz core speeds. Makes me wonder if that is where the 3d stacked rumours came from, a design that would have been built if it clocked higher but was not needed because it wasn't needed.
3D cache stuff isn't really a rumor, it's been in the Linux drivers since April last year.
 

MrTeal

Diamond Member
Dec 7, 2003
3,587
1,748
136
It is exactly 8 months from the N31 announcement, and it does not mean that they could not have started working on N32 fix prior to that. So I think there was enough time to make necessary changes / fixes.

Or, if no fixes were made, just release it months ago in the same state as N31, which did not happen.

IMO, connecting the dots, I think there will be a N32 release, it will perform measurably better to N31 (proportionally), but after all this time, the release date will be whenever it is ready.
Yeah 8+ months since announcement and products were in stores a hair over 7 months ago today; AMD must have had test silicon in lab closer to a year at this point. If they don't know what the issue is by this point, they're going to be in for a bad time.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,947
106
Looks like only a little over a month until we find out the mystery of Navi 32.

According to MLID latest video
- Navi 32 will launch at GamesCon, August 23-27 with availability in September
- at 260W for full Navi 32
- and apparently, it has been ready for about a month

Above is a lossless compression of the 25-minute video. Tom does not seem to know any other details.

 

Saylick

Diamond Member
Sep 10, 2012
3,392
7,156
136
Okay, so what's the performance and power expectations for N32? I've been out of the loop for so long since it's been MIA for... well, so long.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,430
2,915
136
The MCM approach means that the connecting chiplets need transistors for extra communication links that wouldn't otherwise need to be there.

Even though the CUs didn't increase a lot, they each are now capable of executing two FP operations per cycle. This isn't always possible or useful, but the raw compute power did increase.

They probably threw in some more RT stuff as well. Again, not always useful, but it does increase the transistors.
Look at the amount of transistors N31 has!

RDNA3 based GPU with the same specs need only 20% more transistors.
Navi 23 -> 11.06B transistor with 237mm2 die size using 7nm process.
Navi 33 -> 13.3B transistor with 204mm2 die size using 6nm process.

Navi 21 -> 26.8B transistor with 520mm2 die size using 7nm process.
The same RDNA3 based GPU should have 26.8*1.2 = 32.16B transistors.

N31 has 45.7B transistors in GCD and another 6*2.05B = 12.3B transistors in MCDs for a total of 58B transistors.

This GCD alone has 42% more transistors than a 80CU, 256bit, 128ROP, 128MB IC RDNA3 based GPU.
This GCD alone has 14.5% more transistors than 3x N33 combined, If I compared the whole N31 then It would have 45% more transistors than 3x N33 combined while having the same specs.

N31 has a ridiculous amount of transistors for Its specs, why? You can't say It's because of MCM or that N31 WGPs have 50% more registers. The latter needs just a few hundred million transistors more and for the former, If It really needed so much transistors and die space then It's simply worthless.

I do wonder what they would have done with memory and cache had it clocked higher at 355W because it does seem like it needs more bandwidth as you approach 3 GHz core speeds. Makes me wonder if that is where the 3d stacked rumours came from, a design that would have been built if it clocked higher but was not needed because it wasn't needed.
N31 has 66.7% higher BW.
That should be enough for 66.7% higher TFLOPs or 20% more CU + 39% higher clocks than RX 6950XT.
N31 can be clocked at 3.2GHz and have enough BW to feed It. Yeah, I know I excluded Dual-issue capability, but It's not important when N33 has only 3% higher BW compared to N23.
 
Last edited:

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
Okay, so what's the performance and power expectations for N32? I've been out of the loop for so long since it's been MIA for... well, so long.

I am going to assume that N32 clocks a litle better relative to N22 than N31 does relative to N21 at similar ish power envelopes. AIB 6950xt seems to sustain around 2.6Ghz and AIB 7900XTX seems to sustain around 2.9Ghz so lets just call this a 12% increase at similar power draw. For N32 over N22 I expect better than that so lets call it an 18% bump at similar power draw.

So For a 12% clock bump + 20% CU bump + 50% SE bump you get about 33% more performance.

N22 vs N32 is an assumed 18% clock bump + 50% CU bump + 50% SE bump. Alternatively vs the 60CU 6800 it would be a 46% clock bump with the same CU count and SE count.

Just by roughing it that would make me think that full N32 could be in the ballpark of 70% faster than the 6750XT and maybe 20-30% faster than the 6800. In both cases that would be in and around 6950XT performance.

Of course if it clocks more like how N31 does relative to N21 then it will probably perform closer to the 6800XT maybe the 6900XT.

The 18% clock bump would mean it clocking around 3.25Ghz. We know N31 can clock that high with enough power so RDNA3 can hit those speeds, the question really comes down to can it hit them in a 200-250W envelope.

N31 has 66.7% higher BW from 384-bit GDDR6.
That should be enough for 66.7% higher TFLOPs or 20% more CU + 39% higher clocks than RX 6950XT.
N31 can be clocked at 3.2GHz and have enough BW to feed It.

It really does not. Look at TPU who show the OC'd score of various AIB models and it takes a memory and core overclock to get the highest scores. Once you are hitting 2.9Ghz core clocks you need the ram to be around 2.8Ghz or you start seeing bottlenecks. If the core was hitting 3.5ghz in the 355W window then it would need faster ram and or more cache to give it enough bandwidth to feed it.
 
Reactions: Tlh97 and Joe NYC

TESKATLIPOKA

Platinum Member
May 1, 2020
2,430
2,915
136
It really does not. Look at TPU who show the OC'd score of various AIB models and it takes a memory and core overclock to get the highest scores. Once you are hitting 2.9Ghz core clocks you need the ram to be around 2.8Ghz or you start seeing bottlenecks.
I am looking at It and there is no example of OC-ed GPU + stock memory, everything is OC-ed GPU+memory.

Or do you want to say that It's bottlenecked, because XFX has higher score than Asrock or Sapphire? Difference is <=1%.
I need a better proof that this.
 
Last edited:
Reactions: Tlh97

Heartbreaker

Diamond Member
Apr 3, 2006
4,263
5,260
136
If you didn't notice, those "AI accelerators" are also present in N33 or in the 20% increase in transistor count for the same specs.
Anything else?

IMO, AMD still doesn't have dedicated Machine Learning Tensor HW in RDNA3 cards. They are just using General FP Compute HW to brute force it, and RNDA 3 boosted FP compute a lot.

For AI, AMD is touting RDNA3 Bfloat16 improvements, over RDNA 2, but it's only proportional to their overall improvement in RNDA3 floating point improvements.

Here is the AMD AI improvement claim for RDNA3:
Based on AMD internal measurements, November 2022, comparing the Radeon RX 7900 XTX at 2.505 GHz boost clock with 96 CUs issuing 2X the Bfloat16 math operations per clocks vs. the RX 6900 XT GPU at 2.25 GHz boost clock and 80 CUs issue 1X the Bfloat16 math operations per clock. RX-821.

This is just the proportion of General FP compute performance, not some new dedicated HW.

IMO RDNA 4 will get the dedicated AI-ML HW, that Phoenix APU already appears to have.
 
Last edited:
Reactions: Tlh97 and Tigerick
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |