Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 31 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146

DisEnchantment

Golden Member
Mar 3, 2017
1,684
6,227
136
AMD have not sync the repo but many folks went ahead and checked the sources from the mailing lists.
I pulled that branch but nothing is there yet. Only mailing lists.

One thing I noticed is the gigantic patch set for MES support.
I was under the impression that MES was always working, seems not to be the case, they probably never had the resources to work on them from software and firmware side.

I guess they needed it work for RDNA3

There is a new microcontroller for power management. IMU

New major blocks
  • IMU
  • GC
  • SDMA
  • SMUIO
  • MES
  • HDP
  • GMC
  • IH
  • SOC
  • PSP
  • ATHUB
  • GFXHUB
DCN and VCN yet to come I think

SMU seems based around the Yellow Carp version

Once LLVM adds support for GC 11 should shed more light on its capabilities

Recap of the bunch of new RT related patents too
 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,385
7,149
136

jpiniero

Lifer
Oct 1, 2010
14,831
5,444
136

Navi 31 might be ~92 TF. I think AD102 will be more than this but of course compute power is only one part of the equation.
 

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146
Looks like a power saving feature. Basically, if the screen isn't changing, the display will be refreshed using the previous frame that is already stored in cache, so that the rest of the GPU can be powered down. It's more useful for mobile GPUs where power consumption is important, e.g. as shown in this older AT article.
Bingo, it's for saving idle power to be even more specific. I like this addition, much like PSR with Rembrandt.
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,149
136

Navi 31 might be ~92 TF. I think AD102 will be more than this but of course compute power is only one part of the equation.
Dang. Crazy that we're getting 100 TFLOPS of FP32 in a consumer product. I remember buying a Radeon 5770 in 2009 and it had 1 TFLOP of compute, and that was considered a big deal. We're going to get 3 GHz GPUs and 100x the floating point perf. Absolutely ridiculous.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136

Navi 31 might be ~92 TF. I think AD102 will be more than this but of course compute power is only one part of the equation.

6900XT has 2.35x the Tflops of the 5700XT and it is about 2x faster at 4k.

If (big big big IF here) 7900XT has the same scaling then 4x the Tflops should be around 3.4x the performance. Makes me think there will be a fairly large regression in perf/Tflop like there was with Turing -> Ampere since I doubt N31 will exceed 2.5x performance of the 6900XT
 

xpea

Senior member
Feb 14, 2014
447
142
116
Baseless speculation time.
.../...
This means you have 500mm^2 of silicon on the top so you need around 500mm^2 of silicon under
So you fight a ~600mm2 N4 AD102 with 500mm2 6N + 500mm2 5N + 3D stacking + substract N31
Better for AMD to be much much faster than AD102 because this thing is way more expensive !

PS: I don't have any source at AMD so I don't have any idea if the leak is possible. However, one thing is sure, if it's true, then N31 will be over $2500 for sure
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
So you fight a ~600mm2 N4 AD102 with 500mm2 6N + 500mm2 5N + 3D stacking + substract N31
Better for AMD to be much much faster than AD102 because this thing is way more expensive !

PS: I don't have any source at AMD so I don't have any idea if the leak is possible. However, one thing is sure, if it's true, then N31 will be over $2500 for sure

If that 600mm^2 AD102 is pushing 600W or more TBP then that cooling solution is going to be very very expensive vs what a 375W card needs. Then there is GDDR6X ram as well which is not cheap.

Far more goes into a GPU bom than just the size of the main chip(s).

As for price. If it is 2.5x faster than a 6900XT @ 4k in raster and even faster still with RT on then a $2,000 MSRP is still a perf/$ improvement.

Quickly playing with a die per wafer calculator with 0.05 defects / sq CM (I think that is N6/N7 and I know N5/N4 is a tad better but I don't think we have an official number) means for a 16x16 (256mm^2) die I get 229 good dies and 27 defective 195 good dies and 26 defective vs a 25x24 (600mm^2) die I get 63 good and 21 defective.

That means for 1 wafer NV can make 63 full die AD102 parts and AMD can make 97 full N31 parts. Considering how expensive N5/N4 is that seems a lot more cost effective to me and probably more than makes up for the extra N6 silicon AMD need especially if you factor in what I mentioned above regarding cooling and GDDR6X. I don't think AD102 will actually have a lower BOM than N31.

EDIT Provided the numbers for a 15x15 die not 16x16. corrected now.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
The best part about AMD's strategy is, you can just use one die and sell mostly that. Obviously some $2k+ card isn't going to sell super well unless there's yet another crypto rush, but hey any you don't sell for $2k you can easily sell for $600 or so individually. Less profit margin but a sale is a sale.

Totally aside. What if there's no IO die at all. Each of two main chips just has all their memory PHYs on them, 256bit bus apiece. No loss from bad packaging alignments, no worries about designing tons of dies. We know it works, works on TSMC, and has low power overhead because that's how Apple's M1 Ultra works. Plus the packaging of just connecting two main chips is cheaper.
 

Kepler_L2

Senior member
Sep 6, 2020
460
1,895
106

Ooh speculation that AMD cut the chiplet core counts in the end. I think this is wrong though, and these are the intended core counts for the N31/N32 cut down products.

As to why they would do this, the obvious answer is TSMC prices and a lack of confidence of being able to sell for over 2k.
Nope people assumed that Navi31 would use the same 5WGP/SA design as Navi21 and they were wrong. Navi23 clearly shows that 4WGP/SA is more efficient.
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,149
136
Even with the revised/correct specs, seems like >72 TF was always the intended target. That's still a respectable ~3.5x improvement over Navi 21 (20.6 TF). It will just come down to how well RDNA 3 can scale upwards. N33 is rumored to be slower than N21 at 4K, and if N31 is 3x N33, then it does seem within the realm of possibility that N31 is ~2.5x over N21.

 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
Even with the revised/correct specs, seems like >72 TF was always the intended target. That's still a respectable ~3.5x improvement over Navi 21 (20.6 TF). It will just come down to how well RDNA 3 can scale upwards. N33 is rumored to be slower than N21 at 4K, and if N31 is 3x N33, then it does seem within the realm of possibility that N31 is ~2.5x over N21.


N33 is supposed to be > 6900XT at 1080p and 1440p but lose at 4k. This makes sense because it is also supposed to have 128MB cache bit with half the bus width and only 8GB of VRAM it is bound to bottleneck at 4k. Might see some issues at 1440p as well due to VRAM amount which leada me to think N33 will be the 7600XT part rather than 7700XT.
 

jpiniero

Lifer
Oct 1, 2010
14,831
5,444
136
Even with the revised/correct specs, seems like >72 TF was always the intended target. That's still a respectable ~3.5x improvement over Navi 21 (20.6 TF).

Still presumably decently lower than AD102. Although I do agree that it's questionable as to how much that would translate into games. I do think you can say that AD102 will be a better miner for sure now.
 

Karnak

Senior member
Jan 5, 2017
399
767
136
Still presumably decently lower than AD102. Although I do agree that it's questionable as to how much that would translate into games.
Yeah you just need to ignore the fancy double FP32 TFLOPs marketing speech for now. It's just pointless (3090 vs. 6900XT, >=35 TFLOPs vs. ~20 TFLOPs...).

Without that and going by "real" TFLOPs we have RDNA3 with >73 (12288x3000x2 or 15360x2400x2, whatever will happen) and Ada with like 46 TFLOPs assuming clock speed of 2.5GHz (9216x2500x2).

Maybe add up a few % for Ada but I'm just not a fan of arguing with double FP32 TFLOPs as an indicator for gaming performance. Because it's not one.
 
Reactions: Tlh97

Saylick

Diamond Member
Sep 10, 2012
3,385
7,149
136
N33 is supposed to be > 6900XT at 1080p and 1440p but lose at 4k. This makes sense because it is also supposed to have 128MB cache bit with half the bus width and only 8GB of VRAM it is bound to bottleneck at 4k. Might see some issues at 1440p as well due to VRAM amount which leada me to think N33 will be the 7600XT part rather than 7700XT.
Given that N23 was 6600XT, I think we'll see N33 as the 7600XT. Spec-wise, they are quite comparable: N23 and N33 both have 16 WGP and a 128-bit bus. The improvement areas are the doubling of CUs due to RDNA 3 shader structure, a quadrupling of Infinity Cache, and higher clocks. TPU has the 6900XT being 80% faster than the 6600XT. If the 6600 XT successor can match the 6900XT with less power, less cost, and only on a slightly better node, that's a pretty good improvement in my opinion. AMD seems poised to achieve yet another 50% perf/W improvement per generation.

Still presumably decently lower than AD102. Although I do agree that it's questionable as to how much that would translate into games. I do think you can say that AD102 will be a better miner for sure now.
Slower than AD102 on a TFLOP basis, yes, and I agree it will come down to how well Lovelace can translate TFLOPs into fps. If it does use Hopper's SM, then we should see an improvement in fps/TFLOP over Ampere.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Even with the revised/correct specs, seems like >72 TF was always the intended target. That's still a respectable ~3.5x improvement over Navi 21 (20.6 TF). It will just come down to how well RDNA 3 can scale upwards. N33 is rumored to be slower than N21 at 4K, and if N31 is 3x N33, then it does seem within the realm of possibility that N31 is ~2.5x over N21.

The TFLOP nomber remains the same, even for updated specs: 92 TFLOPs.

3.7 GHz clock speed?
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,149
136
The TFLOP nomber remains the same, even for updated specs: 92 TFLOPs.

3.7 GHz clock speed?
Sounds like the 92 TFLOP rumor came from people putting 3 GHz and the previous core count rumor together, so just because the core count got reduced doesn't mean the clocks went up to compensate. Again, the performance goal for N31 appear to have been >2x N21 from the start. 75 TFLOPs should still be enough to achieve those goals.
 
Reactions: Mopetar

jpiniero

Lifer
Oct 1, 2010
14,831
5,444
136
Yeah you just need to ignore the fancy double FP32 TFLOPs marketing speech for now. It's just pointless (3090 vs. 6900XT, >=35 TFLOPs vs. ~20 TFLOPs...).

Clearly it's not translating into that much faster gaming performance. Some of that might be because RDNA2 has a higher fill rate due to the clock speed.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Sounds like the 92 TFLOP rumor came from people putting 3 GHz and the previous core count rumor together, so just because the core count got reduced doesn't mean the clocks went up to compensate. Again, the performance goal for N31 appear to have been >2x N21 from the start. 75 TFLOPs should still be enough to achieve those goals.
I cannot tell you anything more just besides this:

"3 GHz clock speed for RDNA3 GPUs is... conservative."

Do with this hint what you like.
 
Reactions: Tlh97 and Stuka87

Saylick

Diamond Member
Sep 10, 2012
3,385
7,149
136
I cannot tell you anything more just besides this:

"3 GHz clock speed for RDNA3 GPUs is... conservative."

Do with this hint what you like.
We might see something like 3.1 or 3.2 GHz, but I would be blown away if we get 3.7 GHz out of N31.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
We might see something like 3.1 or 3.2 GHz, but I would be blown away if we get 3.7 GHz out of N31.
To be honest, me too.

I know that 3 GHz is conservative. The question remains: what exactly is AMD capable of achieving with the 5 nm process and their physical design team.

If its 92 TFLOPs, and if its 12280 ALUs - its 3.7 GHz clock speed.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |