Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 67 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146

eek2121

Diamond Member
Aug 2, 2005
3,053
4,281
136
So I was thinking about what we can glean from AMD official information re top N31.

All we have is the greater than 50% perf per watt increase but we don't know the baseline and we don't know the wattage so here are some potential numbers.

Baseline 6900XT (reference)
At 375W we have 1.5x perf / watt x 1.25x more watts which is 1.875x more performance than baseline. A direct match for the 4090 in raster performance and probably a low end estimate given AMD sandbagging on claims recently.

If AMD push to 450W and performance scaling is still decent then we see 1.5x X 1.5x for a 2.25x performance gain over the 6900XT which is in the middle of the rumoured gains.

If perf/watt is closer to 1.6x then at 375W we get 2x performance and at 450W we get 2.4x performance.

6950XT baseline (reference 335W) according to TPU this is 1.07x 6900XT and the 3090Ti is 1.1x ahead of the 6950XT. That puts the 4090 1.76x faster than the 6950XT.

A 375W N31 here with that 1.5x perf/watt would be 1.68x faster than the 6950XT and about 0.9x 4090 performance. 450W would be 2x faster than yhe 6950XT.

1.6x perf/watt would be 1.79x for 375W and 2.15x for 450W.

Also worth noting that in measured power draw the reference 6950XT is actually very frugal and is more efficient than the 6900XT.

Ultimately I think we have reasonable lower to upper bounds here and my estimate is that at worst top N31 will match the 4090 and at best it will be a performance tier ahead depending on final TDP and the exact perf/watt scaling.

Bottom line is I think AMD have levers to pull here and plenty of choices on how to approach the SKU list.

The 4090 is only 40-50% faster than the 3090. Given what we know so far about N31, 4090 performance numbers are an easy target.

What I suspect we will get is a very power efficient card that comes close to (or matches) the 4090 with a > 20% lower TGP. Next year AMD will drop an even faster card.
 
Reactions: Kaluan

eek2121

Diamond Member
Aug 2, 2005
3,053
4,281
136
Nice, but it's just a $20-50 price cut for the 6800/6800XT. Doesn't bode well for upcoming 7000 series pricing.

I bet they will cost less than NVIDIA's offerings. The 4090 die size is almost twice the size of N31's GCD. While packaging will cost more, AMD's card will be much cheaper to produce thanks to the use of N6 for the MCDs. The only question is if AMD will pass the savings on to consumers.

EDIT: The 6900xt is $300 cheaper, not $50 cheaper btw. The 6800XT is $100 cheaper.

The 6900xt would be a steal if we weren't getting new GPUs soon.
 

fleshconsumed

Diamond Member
Feb 21, 2002
6,485
2,363
136
I bet they will cost less than NVIDIA's offerings. The 4090 die size is almost twice the size of N31's GCD. While packaging will cost more, AMD's card will be much cheaper to produce thanks to the use of N6 for the MCDs. The only question is if AMD will pass the savings on to consumers.

EDIT: The 6900xt is $300 cheaper, not $50 cheaper btw. The 6800XT is $100 cheaper.

The 6900xt would be a steal if we weren't getting new GPUs soon.
My bad, I was off by $10. 6800 went from 579 to 549, 6800xt went from 649 to 599. That's a $30-50 cut for the most commonly purchased AMD cards.

Yes, 6900xt got a larger price cut, but it was always a poor value proposition, most gamers went for the sweet spot 6800xt/6800/6700xt, and those barely got any price cut.

I'm sure AMD 7000 will undercut nvidia as they typically do, but the question is by how much. Given very minor price cuts to the 6000 sweet spot cards there is a good chance AMD will follow suit and raise MSRP. If they keep MSRP the same I'll probably be first in line at microcenter trying to get one (although who am I kidding, there will probably be a huge line of people camping out overnight just like for 6800 release).
 

SteveGrabowski

Diamond Member
Oct 20, 2014
7,127
5,998
136
Reactions: Tlh97 and Leeea

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
I assume this is a response to the NVidia launch, and in certain cases getting rid of inventory where it's needed before the RDNA3 launch. Rather than being reflective of where we will see RDNA3 pricing

I had already been looking forward to seeing RDNA3 as I am expecting something quite interesting, I am even more interested now with how disappointing the 40 series is.
 

Mopetar

Diamond Member
Jan 31, 2011
8,024
6,479
136
NVidia launch is only the 4090 until November, so that doesn't do much to anything besides the 6900XT and up. If there's a lot of leftover Ampere stock, a price cut now might help them move more of their old cards now. The 6600/XT at $240 looks really good after the last few years. Even better if the 3050 is still hanging out at $300.
 

SteveGrabowski

Diamond Member
Oct 20, 2014
7,127
5,998
136
NVidia launch is only the 4090 until November, so that doesn't do much to anything besides the 6900XT and up. If there's a lot of leftover Ampere stock, a price cut now might help them move more of their old cards now. The 6600/XT at $240 looks really good after the last few years. Even better if the 3050 is still hanging out at $300.

It's strange, 6600 XT is still $350 everywhere except one AsRock model that's $330. Two models of 6650 XT can be found for $300 though.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,947
106
CoWoS-R doesn't have an silicon interposer, right? It uses a high pitch RDL layer right in the organic substrate itself, so it eliminates much of the cost of traditional 2.5D methods due to the lack of extra silicon, i.e. cheaper than a full interposer and cheaper than using embedded bridges. The bandwidth isn't going to be as high as using silicon due to the lower interconnect density but it's likely enough for the Infinity Cache to the GCD.

It does not have an interposer, but it has an interposer-like wafer on top of which the individual dies are placed, and then additional layers on the bottom can be added.

Then, the whole unit can be strengthened and made into something that resembles a larger chip, that is then placed on top of the organic substrate.

So, I think this method can likely have quite a high bandwidth, there will be some additional power needed to cross between the dies and the carrier wafer with RDL (vs. a monolithic die). Likely using micro bumps. But we will see if AMD somehow pulls some rabbit out of a hat on this connection...

I think this approach offers high bandwidth and can save layers of organic substrate - which (organic substrate) according to the latest AMD presentation is still a bottleneck (not wafers from TSMC anymore).
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
It's strange, 6600 XT is still $350 everywhere except one AsRock model that's $330. Two models of 6650 XT can be found for $300 though.

I think this is partly due to those being purchased by the reseller when prices were high, and them not wanting to lose money on them.

That or they have lost touch with reality and still think they can con people out of money
 

jpiniero

Lifer
Oct 1, 2010
14,845
5,457
136
Yes, 6900xt got a larger price cut, but it was always a poor value proposition, most gamers went for the sweet spot 6800xt/6800/6700xt

That's why I am expecting AMD to 'correct' that with RDNA 3.

I suspect N31 is going to be much faster than the 4090 in raster and maybe even faster in RT at 4K (ignoring the frame projection). So I don't see why they should charge much less than the $1599 that the 4090 is for any N31. People are gonna be mad, but lets face it, unless it was $100 people are going to be mad regardless.
 

Kaluan

Senior member
Jan 4, 2022
503
1,074
106
My bad, I was off by $10. 6800 went from 579 to 549, 6800xt went from 649 to 599. That's a $30-50 cut for the most commonly purchased AMD cards.

Yes, 6900xt got a larger price cut, but it was always a poor value proposition, most gamers went for the sweet spot 6800xt/6800/6700xt, and those barely got any price cut.

I'm sure AMD 7000 will undercut nvidia as they typically do, but the question is by how much. Given very minor price cuts to the 6000 sweet spot cards there is a good chance AMD will follow suit and raise MSRP. If they keep MSRP the same I'll probably be first in line at microcenter trying to get one (although who am I kidding, there will probably be a huge line of people camping out overnight just like for 6800 release).
Wacha mean? RX 6600-6700 are what the bulk of gamers get. And those got pretty sizeable cuts. And it's not like (if you're from the US at least 😡) we won't be seeing even lower prices in some places soon after (like how 5800X dropped to $300 but you can current find it for $260 or even less).
It's strange, 6600 XT is still $350 everywhere except one AsRock model that's $330. Two models of 6650 XT can be found for $300 though.
That's likely because RX 6600 XT got silently phased out, 6650 XT permanently took it's place.

Still, it's a shame RX 6700 never became a official SKU. $300-330 MSRP would've been pretty good.

Interesting that they specify 4GB for the 6500 XT (and also not the RX 6400), official 8GB SKU finally coming? Sapphire's custom 8GB one still hasn't been reviewed yet sadly, but I expect a 8GB N24 to smooth out most of it's limitations.

BTW $700 RX 6900 XT already looks like a better (upper price range) deal than nVidia's $900 '4070 rebranded as a 4080'. Another big oof for nVidia lmao. Hope AMD continues this trend with RX 7000.
 
Reactions: Tlh97

jpiniero

Lifer
Oct 1, 2010
14,845
5,457
136
Sapphire's custom 8GB one still hasn't been reviewed yet sadly, but I expect a 8GB N24 to smooth out most of it's limitations.

That's because I suspect it was intended to be a mining card and a mining card only. I did see them on Newegg but I imagine they stopped production once mining slid and the # of units they did produce wasn't much.
 
Reactions: Kaluan

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136
Lel AMD devs leaking stuffs themselves inadvertently


Now I am wondering if the L0 and VGPR sizes are precise as mentioned by @Kepler_L2

Seems @Kepler_L2 was right on the money about VGPRs

C++:
unsigned getTotalNumVGPRs(const MCSubtargetInfo *STI) {
  if (STI->getFeatureBits().test(FeatureGFX90AInsts))
    return 512;
  if (!isGFX10Plus(*STI))
    return 256;
  bool IsWave32 = STI->getFeatureBits().test(FeatureWavefrontSize32);
  if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
    return IsWave32 ? 1536 : 768;
  return IsWave32 ? 1024 : 512;
}

This code shows that VGPR --> 1536 * 32 * 4 = 192KiB (+50%) / 256 * 6 * 32 * 4 = 192KiB
Since the number of VGPRs per bank has not changed (i.e 256) this means full 6 bank VGPRs for a fullblown dual x32 ALUs in one SIMD.

And it also hint at 1-cycle wave64 mode because it seems they can now band two adjacent VGPR banks (see num VGPRs is halved when not in wave32) to form 3 banks of wave64 operands for full 1 cycle wave64.

C++:
unsigned getVGPRAllocGranule(const MCSubtargetInfo *STI,
                             Optional<bool> EnableWavefrontSize32) {
  if (STI->getFeatureBits().test(FeatureGFX90AInsts))
    return 8;

  bool IsWave32 = EnableWavefrontSize32 ?
      *EnableWavefrontSize32 :
      STI->getFeatureBits().test(FeatureWavefrontSize32);

  if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
    return IsWave32 ? 24 : 12;
Allocation granule is also 24 (2*2*6)

Looks like N31 and N32 will be compute monsters. As expected, 11.0.2 and 11.0.3 (N33) don't have this feature and they go the VOPD route.

Another unique thing of N31 is native fp16 ops, Vector Registers from 0-127 contains Lo and Hi 16 bit floats. In theory they can do 4x native fp16 ops (not matrix) per cycle per SIMD, This will be great of FSR kind of stuffs
 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,390
7,156
136
Seems @Kepler_L2 was right on the money about VGPRs

C++:
unsigned getTotalNumVGPRs(const MCSubtargetInfo *STI) {
  if (STI->getFeatureBits().test(FeatureGFX90AInsts))
    return 512;
  if (!isGFX10Plus(*STI))
    return 256;
  bool IsWave32 = STI->getFeatureBits().test(FeatureWavefrontSize32);
  if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
    return IsWave32 ? 1536 : 768;
  return IsWave32 ? 1024 : 512;
}

This code shows that VGPR --> 1536 * 32 * 4 = 192KiB (+50%)
Since the number of VGPRs per bank has not changed (i.e 256) this means full 6 bank VGPRs for a fullblown dual x32 ALUs in one SIMD.

And it also hint at 1-cycle wave64 mode because it seems they can now band two adjacent VGPR banks (see num VGPRs is halved when not in wave32) to form 3 banks of wave64 operands for full 1 cycle wave64.

C++:
unsigned getVGPRAllocGranule(const MCSubtargetInfo *STI,
                             Optional<bool> EnableWavefrontSize32) {
  if (STI->getFeatureBits().test(FeatureGFX90AInsts))
    return 8;

  bool IsWave32 = EnableWavefrontSize32 ?
      *EnableWavefrontSize32 :
      STI->getFeatureBits().test(FeatureWavefrontSize32);

  if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
    return IsWave32 ? 24 : 12;
Allocation granule is also 24 (2*2*6)

Looks like N31 and N32 will be compute monsters. As expected, 11.0.2 and 11.0.3 (N33) don't have this feature and they go the VOPD route.

Another unique thing of N31 is native fp16 ops, Vector Registers from 0-127 contains Lo and Hi 16 bit floats. In theory they can do 4x native fp16 ops (not matrix) per cycle per SIMD, This will be great of FSR kind of stuffs

Fully dual pumped FP32...


So 12288 shaders for N31 are actually a true 12288 shaders... That's a true 75 TFLOPS in terms of gaming performance, not some sort of Ampere-like dual pumping that results in only 1.33x increase in FPS when the TFLOPS are doubled.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
So, solely based on the sheer throughput increase we should see not only the higher utilization, but higher throughput of those ALUs/Shaders.

2.5 Times more shaders, 30-50% higher clock speeds(?), 50% higher memory bandwidth.

I think it(full fat N31) will be faster than 4090. But how much more performance is there?
 

DiogoDX

Senior member
Oct 11, 2012
746
277
136
So, solely based on the sheer throughput increase we should see not only the higher utilization, but higher throughput of those ALUs/Shaders.

2.5 Times more shaders, 30-50% higher clock speeds(?), 50% higher memory bandwidth.

I think it(full fat N31) will be faster than 4090. But how much more performance is there?
I think with all this shader power maybe the rumored 384bits and 96mb cache are not enough. Scalling should be better than Ampere but how much?

Looking on the Nvidia performance slides AD102 with >2X Tflops is about ~50-60% faster than A102 in native 4K even with the huge increase on the L2 cache but the same 384bits memory.
 

Saylick

Diamond Member
Sep 10, 2012
3,390
7,156
136
So, solely based on the sheer throughput increase we should see not only the higher utilization, but higher throughput of those ALUs/Shaders.

2.5 Times more shaders, 30-50% higher clock speeds(?), 50% higher memory bandwidth.

I think it(full fat N31) will be faster than 4090. But how much more performance is there?
The memory bandwidth question, i.e. can RDNA 3 be fully fed, is the million dollar question (2 million dollars now, adjusted for inflation).

Goal = 3x more effective bandwidth over N21

We have, thus far....
- 50% wider memory bus
- 11% higher memory clocks
- Higher bandwidth on the Infinity Cache
- Better caching algorithms so that the Infinity Cache is better utilized
- End-to-end data compression

Might be an effective 2x at the end of the day, hence why we should only expect a doubling over N21.
 
Reactions: GodisanAtheist

biostud

Lifer
Feb 27, 2003
18,407
4,968
136
The memory bandwidth question, i.e. can RDNA 3 be fully fed, is the million dollar question (2 million dollars now, adjusted for inflation).

Goal = 3x more effective bandwidth over N21

We have, thus far....
- 50% wider memory bus
- 11% higher memory clocks
- Higher bandwidth on the Infinity Cache
- Better caching algorithms so that the Infinity Cache is better utilized
- End-to-end data compression

Might be an effective 2x at the end of the day, hence why we should only expect a doubling over N21.
And maybe a 3D cache version with the double amount of cache? Or is that rumor buried again?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |