Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 45 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146

Karnak

Senior member
Jan 5, 2017
399
767
136
Considering the 384-bit bus for N31 and if we're going with 20 Gbps GDDR6 that's a total of 960 GB/s bandwidth without any kind of IF$.

And with that in mind you really don't need more than 192 MByte of IF$. That's still a +50% increase over N21 on top of the already way higher bandwidth. Just quoting GN here: anything else would be "a waste of sand". I'm 100% certain that 192 MByte will be the maximum we'll see.
 
Reactions: Tlh97 and Kepler_L2

Saylick

Diamond Member
Sep 10, 2012
3,390
7,156
136
Considering the 384-bit bus for N31 and if we're going with 20 Gbps GDDR6 that's a total of 960 GB/s bandwidth without any kind of IF$.

And with that in mind you really don't need more than 192 MByte of IF$. That's still a +50% increase over N21 on top of the already way higher bandwidth. Just quoting GN here: anything else would be "a waste of sand". I'm 100% certain that 192 MByte will be the maximum we'll see.
What are you basing this assertion that 192 MB is enough on? Is it from the hit rate chart that AMD provided for N21? If so, the curve for 4K resolution still didn't look like it was plateauing just yet. Furthermore, I assume the hit rate is a function of the number of local caches it needs to service. With more CUs, the Infinity Cache needs to scale up proportionally at the least in order to maintain the same hit rate.
 
Reactions: Tlh97 and maddie

Kepler_L2

Senior member
Sep 6, 2020
473
1,927
106
What are you basing this assertion that 192 MB is enough on? Is it from the hit rate chart that AMD provided for N21? If so, the curve for 4K resolution still didn't look like it was plateauing just yet. Furthermore, I assume the hit rate is a function of the number of local caches it needs to service. With more CUs, the Infinity Cache needs to scale up proportionally at the least in order to maintain the same hit rate.
N21 is 128MB for 80 CUs, N31 is 192MB for 96 CUs.
 

Karnak

Senior member
Jan 5, 2017
399
767
136
What are you basing this assertion that 192 MB is enough on?
Again: There's a 384-bit bus and assuming 20 Gbps GDDR6 (which is most likely IMO by the end of 2022, if not even 24 Gbps) that already would be a total of 960 GB/s.

512 GB/s (N21) vs. 960 GB/s (N31) = around +88%
128 MByte IF$ (N21) vs. 192 MByte IF$ (N31) = +50%

Literally I see no reason - since you need to consider both the raw bandwidth and not just the IF$ - why you need more than 192 MByte for the latter if you're having almost 1 TB/s of raw bandwidth already. Waste of sand and I'm sure that's how AMD feels and why we won't see more than that.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,015
1,610
136
IC provides for bandwidth amplification. We are probably looking at an effective bandwidth @4K of around 3-3,5TB/s even with only 192 Mbytes of cache. It depends then on how much BW the chip needs with the increased FP capability (including all the other improvements). A 32Mb cache die + 64b VRAM channel is certainly small but not so small as the Vcache on Ryzen because it will integrate also the RAM controller (which scales quite badly compared to the rest ). So maybe a 40-50mm^2 on N6 if 32MB and 70mm^2 if 64Mb? 64 bit channel is due if it has to be reused on N32...
 

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136
Literally I see no reason - since you need to consider both the raw bandwidth and not just the IF$ - why you need more than 192 MByte for the latter if you're having almost 1 TB/s of raw bandwidth already. Waste of sand and I'm sure that's how AMD feels and why we won't see more than that.
No reason doesn't seem like an engineering principle to me.
IC provides for bandwidth amplification.
DRAM BW is one design consideration, the moment you allow client requests to get to RAM, it will get oversaturated. DRAM has recovery times, row and column strobes take time and there are burst length.
The purpose of cache design is not only to provide the data on a hit, but to cluster loads and stores so that the DRAM does not come to a crawl.
On CPU with 8 core it won't be visible. On GPUs it is totally different game.
Imagine a thread running on a CU requesting memory if not there, stalls then wave controller switch to another thread and it request memory again. How many threads a wave controller can support, then how many CUs.
 
Last edited:
Reactions: Tlh97 and Ajay

tomatosummit

Member
Mar 21, 2019
184
177
116
The rdna2 IC graph certainly looked like 4k resolutions needed 192MB for n21.
But n31 could need more for any number of reasons.
Why stop at 4k, ultra widescreen and super sampling are an option for halo graphics cards.
Ray tracing was suggested to benefit from cache.
The memory controller chiplets are probably limited by the phy sizes, why not put as much cache as you can on there. It might not all go to use in all situations but it's hadly going to hurt performance.

I hope the phoenix reumour using the cache chiplet is true though. I can't believe the performance claims with only ddr5 but I wonder if phoenix will use the option to use a couple of gddr6 modules coming off the chiplet for it's top end options.
 
Reactions: Tlh97

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136
Enhanced Wave64 mode/dual mode Wave32 confirmed?
It is the same stuff discussed 2 pages ago, dual wave32 or one cycle wave64. (caveats being that 4 operand from VGPR banks with scalar or immediate)
At least dual 32 lane VALU is confirmed pretty much though.

For more background, that value in the twitter post is a bit mask related to the perf counter logged by the instruction sequencer (or frontend) in a WGP.

Actually more crazy stuff are there but not sure if they are just placeholders for future SoC.
e.g.
32 threads in the Texture Addressing Unit which perform RT with traversal capability, texture (i.e. BVH) load capability.
Dual Geometry Engine
 

leoneazzurro

Golden Member
Jul 26, 2016
1,015
1,610
136
It is the same stuff discussed 2 pages ago, dual wave32 or one cycle wave64. (caveats being that 4 operand from VGPR banks with scalar or immediate)
At least dual 32 lane VALU is confirmed pretty much though.

For more background, that value in the twitter post is a bit mask related to the perf counter logged by the instruction sequencer (or frontend) in a WGP.

Actually more crazy stuff are there but not sure if they are just placeholders for future SoC.
e.g.
32 threads in the Texture Addressing Unit which perform RT with traversal capability, texture (i.e. BVH) load capability.
Dual Geometry Engine

Yes I know it was referring to what it was already discussed, but as these drivers seems to be confirming it more and more, I posted this tweet. Also, there is a reference to the patch.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,422
1,759
136
Furthermore, I assume the hit rate is a function of the number of local caches it needs to service.

It's not. The hit rate plateaus when it successfully starts caching all cacheable data between frames. This means that if you get up near that regime, how fast or wide your GPU is has no direct impact on how much cache you need. Only your workload matters. Cache demand will go up in the future if/when resolutions go up or when devs use more data per pixel.

(Of course, faster GPUs tend to be used for larger workloads.)
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
Soo what does the stack look like based on latest rumours?

N31 = 7900 series with 384bit bus and 24GB ram
N32 = 7800 series with 256bit bus and 16GB ram
N33 = 7600 series with 128bit bus and 8GB ram.

Is there an N34 for 7700 Series or will AMD go the 5600XT route and make 7700 from cut down N32's with a 192 bit bus? I guess it makes some level of sense to use cut N32 because I expect N32 will be the most produced N5 GPU chip so creating a SKU that uses parts that are partially defective or fail 7800 series binning is probably a smart play to maximise yields.

Also is single GCD absolutely confirmed for N31 and N32 or is dual GCD still possible? If dual GCD is still on the table could AMD use a single N31 GCD design to power the 7700 series? 7900 being 2x N31, 7800 being 2x N32 and 7700 being 1x N31. 1x N32 would match N33 in spec but could AMD use that chiplet in laptops or maybe in a 7600 series refresh sku as more 5nm capacity comes online?
 

Kepler_L2

Senior member
Sep 6, 2020
473
1,927
106
Yes the 7700XT will be a cutdown N32. One interesting thing about the Navi3x MCM design is that when they cutdown VRAM/cache they will actually ship less silicon! This makes configurations like 320-bit 7900XT and 192-bit 7700XT much more palatable for AMD.
 

jpiniero

Lifer
Oct 1, 2010
14,845
5,457
136
Given what they are likely charging for it, putting the N33 in the 7600 range would be a bad idea.

It'd just be simple enough to call N31 7900/XT, N32 7800/XT, and N33 7700/XT.
 

Kepler_L2

Senior member
Sep 6, 2020
473
1,927
106
Given what they are likely charging for it, putting the N33 in the 7600 range would be a bad idea.

It'd just be simple enough to call N31 7900/XT, N32 7800/XT, and N33 7700/XT.
N33 in the 7700XT would mean a 20% reduction in CU count, 33% reduction in VRAM and 50% reduction in PCIe lanes vs the 6700 XT. Yes it's quite a bit faster, but I feel the community would destroy AMD over this kind of downgrade.
 

jpiniero

Lifer
Oct 1, 2010
14,845
5,457
136
N33 in the 7700XT would mean a 20% reduction in CU count, 33% reduction in VRAM and 50% reduction in PCIe lanes vs the 6700 XT. Yes it's quite a bit faster, but I feel the community would destroy AMD over this kind of downgrade.

I thought N33 was 40 CUs or the equivalent of that.

Might be time for a new numbering scheme.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,015
1,610
136
I thought N33 was 40 CUs or the equivalent of that.

Might be time for a new numbering scheme.

N33 seems to be 4096 SP, organized in 32 CU, with each CU having double the FP resources of RDNA2's CU and probably (way) higher clock.
Rumors seems to say that performance it is on par with a 6900XT (full HD). I don't know how reliable that could be, but at this point if that is true a N31 (3x N33) seems quite possibly having the capability to be 2,5+X a 6900XT, in raster.
I'd add that, due to the extreme modularity of the memory configuration, it's very possible we will get a lot of different SKU this time.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
AMD can still cut down heavily the N33 die, into 3 SKUs.

If Kepler is correct, and N33 is going to be 7600 Series, then we are looking at 7600 XT, 7600, and potentially - 7500 XT.


However, we know that it was required for AMD to disable pairs of CPUs in order to get usable dies.

Is it possible that AMD is required to disable pairs of WGPs now?

7600 XT - 32 CUs,
7600 - 24 CUs,
7500 XT - 16 CUs, or...
7600 XT - 32 CUs,
7600 - 28 CUs,
7500 XT - 24 CUs?

For 7500 XT, AMD can give it 96 bit bus(6GB VRAM), and sell it for over 200$(250-270).
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,430
2,915
136
N33 in the 7700XT would mean a 20% reduction in CU count, 33% reduction in VRAM and 50% reduction in PCIe lanes vs the 6700 XT. Yes it's quite a bit faster, but I feel the community would destroy AMD over this kind of downgrade.
32CU instead of 40CU is not a downgrade when the performance increases by a lot. The real downgrade is 8GB Vram. If It had 12GB, then I don't think 8 PCIe4 lanes would really matter.

The dies are 96, 64 and 32 CUs, with 128 SPs per CU now.
Do we get 64 SPs(shaders) per SIMD32 or there is 2x more SIMD32 per CU?
 
Reactions: Mopetar
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |