Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 48 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
Having a 384bit bus would really surprise me. RDNA2's whole thing was to reduce bandwidth by adding on die cache. Now these rumors are claiming AMD is adding way more cache, but then also greatly increasing the bus width??

This makes no sense. All that worked and added cost to the die, only to also increase cost and power consumption of the board and memory modules.
RDNA2 was bandwidth starved. AMD has to compete with NVIDIA, and their cache won't get them to that point by itself.

I don't think these 5x32b (20GB) and 3x32b (12GB) memory configurations are realistic. The IF cache has to be distributed close to the shaders for energy efficiency. There's a reason why Navi2 has the IF cache spread out on several sides of the die. Data locality maximized. If you start having odd memory controller configs, this can't be preserved. The iF cache needs to be spread out and situated close to the shaders it serves and the memory controller should be as close to the IF cache it services. This will work well for most data.

AMD maintained memory size for all models using the same die. This is not by accident, with IF cache, they need to do this to optimize energy efficiency.

Having said all that we have a RX 6700 GPU with 160b memory. Obviously a product being used to utilize partial defectives. The utility of any explanation is in it's predictive power. I predict a change in expected perf/power for this model.

The cache absolutely does NOT need to be close to the shaders. Placing it with the memory controllers actually makes a ton of sense. I'm not saying for sure they took this route, however, having the cache sit next to the controller(s) (there are more than one) means you can query the cache prior to jumping out to slower/higher latency GDDR6X. It also is far more cost effective: In theory AMD can just add/remove MCDs to scale a product up or down as needed. They can even release *50 models down the road with double the cache in a very cost-effective manner. The latency between the GCD and MCD is likely to be minimal compared to the latency between GDDR6 and the memory controllers.
 

Karnak

Senior member
Jan 5, 2017
399
767
136
RDNA2 was bandwidth starved.
Wut? It isn't at all. Get a 6900XT and overclock the memory, the gains you'll get relative to the higher clocks and bandwidth increase vs. stock are not even close for a "bandwidth starved" conclusion.

Ampere is not stronger at higher resolutions because of higher bandwidth if that's what you're thinking of. That's simply because of double fp32.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,428
2,914
136
There's 1 MCD for each SE on Navi3x. Cutdown configs will disable 1 SE and also lose 1 MCD. It will not affect efficiency at all.
So does It mean, the RDNA3 lineup will look like this?
N31(Full): 6SE -> 48WGP -> 96CU -> 12288SP; 386bit GDDR6 and 192MB IF
N31(Cutdown): 5SE -> 40WGP -> 80CU -> 10240SP; 320bit GDDR6 and 160MB IF
N32(Full): 4SE -> 32WGP -> 64CU -> 8192SP; 256bit GDDR6 and 128MB IF
N32(Cutdown): 3SE -> 24WGP -> 48CU -> 6144SP; 192bit GDDR6 and 96MB IF
and a monolithic chip
N33(Full): 2SE -> 16WGP -> 32CU -> 4096SP; 128bit GDDR6 and 64MB IF
 
Reactions: Tlh97 and Kepler_L2

GodisanAtheist

Diamond Member
Nov 16, 2006
7,062
7,487
136
So does It mean, the RDNA3 lineup will look like this?
N31(Full): 6SE -> 48WGP -> 96CU -> 12288SP; 386bit GDDR6 and 192MB IF
N31(Cutdown): 5SE -> 40WGP -> 80CU -> 10240SP; 320bit GDDR6 and 160MB IF
N32(Full): 4SE -> 32WGP -> 64CU -> 8192SP; 256bit GDDR6 and 128MB IF
N32(Cutdown): 3SE -> 24WGP -> 48CU -> 6144SP; 192bit GDDR6 and 96MB IF
and a monolithic chip
N33(Full): 2SE -> 16WGP -> 32CU -> 4096SP; 128bit GDDR6 and 64MB IF

-Isn't N33 supposed to be ~400mm2? That looks either too anemic for a 400mm2 chip, even if it is on N6 or the size estimates we've gotten so far are wrong.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,428
2,914
136
-Isn't N33 supposed to be ~400mm2? That looks either too anemic for a 400mm2 chip, even if it is on N6 or the size estimates we've gotten so far are wrong.
~400mm2 looks too much considering specs are reduced quite a lot and we have a bit smaller process.
It's true that we don't know how much bigger is RDNA3 shader engine or WGP compared to RDNA2.
But I think just by halving bus, IF and ROPs + smaller process should reduce the size to 400mm2 or maybe a bit less.

To be honest, N33 looks like N23 only beefed up in some places and N23 is only 237mm2. I would be surprised, If N33 end's up much bigger than a 335mm2 N22.
Not sure If the number of TMUs will be doubled per RDNA3 CU or not.

Size
[mm2]
SEWGPCU (TMU)ALUROPBus
[Bit]
IF
[MB]
N2323721632(128)20486412832
N33400?21632(128?)4096
(+100%)
64?12864
(+100%)
N215204
(+100%)
40
(+150%)
80(320)
(+150%)
5120
(+150%)
128
(+100%)
256
(+100%)
128
(+300%)
 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
Do we know what speed the RAM will be for RDNA3?
Regular ol' GDDR6 RAM this fast almost makes GDDR6X pointless.
 
Reactions: igor_kavinski

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
-Isn't N33 supposed to be ~400mm2? That looks either too anemic for a 400mm2 chip, even if it is on N6 or the size estimates we've gotten so far are wrong.

For a part that is supposed to target a $400 price point I do think 400mm is rather large. I think something targeting 1080p does not really need more than 64MB cache and then you think about rops, tmus etc and while ALU count seems to be double N23 a lot of stuff is staying the same (although higher clocks will boost performance even if number of functional units does stay the same in some places). I think 300mm is a more realistic size for something that is supposed to be coming in at $400.

There's 1 MCD for each SE on Navi3x. Cutdown configs will disable 1 SE and also lose 1 MCD. It will not affect efficiency at all.

That is what I meant. SE not WGP.

Anyhow it makes sense to me. Just not sure what the actually product stack will look like 100% but I think with the latest rumours I am getting a clearer picture now and I do think something like this

CardBusMemoryCacheShadersDie
7950XT (7900XT)384bit24GB384MB (192MB)12,288N31
7900XT (7900)384bit24GB192MB12,288 (11,520)N31
7850XT (7800XT)320bit20GB160MB10,240N31
7800XT (7800)256bit16GB128MB8,192N32
7700XT192bit12GB96MB6,144N32
7600XT128bit8GB64MB4,096N33

could work pretty well although I am not convinced that is what they will be called. I am happy with N33 and cut N32 being where they are but above that there is a lot of room for filling niches.
 
Last edited:

biostud

Lifer
Feb 27, 2003
18,397
4,963
136
Wut? It isn't at all. Get a 6900XT and overclock the memory, the gains you'll get relative to the higher clocks and bandwidth increase vs. stock are not even close for a "bandwidth starved" conclusion.

Ampere is not stronger at higher resolutions because of higher bandwidth if that's what you're thinking of. That's simply because of double fp32.
Maybe it's cache starved at 4k? The bandwidth and latency of the cache is much faster than the memory, so even if you overclock the memory, it is nowhere near the speed of the cache.

So if I'm finally starting to grasp what N31 and N32 is about, it is kind of the inverse version of EPYC and Threadripper CPU's. Instead of having a central memory controller/IOD and several compute dies with optional vcache, you have a central compute die with several memory controllers with vcache on N31 and N32, and when making smaller dies, the payoff from going MCM diminish so therefore the N33 is monolithic.
 

Aapje

Golden Member
Mar 21, 2022
1,467
2,031
106
So does It mean, the RDNA3 lineup will look like this?
N31(Full): 6SE -> 48WGP -> 96CU -> 12288SP; 386bit GDDR6 and 192MB IF
N31(Cutdown): 5SE -> 40WGP -> 80CU -> 10240SP; 320bit GDDR6 and 160MB IF
N32(Full): 4SE -> 32WGP -> 64CU -> 8192SP; 256bit GDDR6 and 128MB IF
N32(Cutdown): 3SE -> 24WGP -> 48CU -> 6144SP; 192bit GDDR6 and 96MB IF
and a monolithic chip
N33(Full): 2SE -> 16WGP -> 32CU -> 4096SP; 128bit GDDR6 and 64MB IF

There needs to be a cutdown N33 to sell the dies with defects.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Do we know what speed the RAM will be for RDNA3?
What are the odds of something crazy like LPDDR5x? RDNA2 was surprising with its small bus.
RDNA3 is rumored to have even bigger IF, more compression and clocks over 3GHz.... bandwidth shoundt be a problem, but latency might... a cheap, low latency and power efficient memory, might be good enought

ofc the patches may say otherwise XD, im really newbie at those memory jargons
 

tomatosummit

Member
Mar 21, 2019
184
177
116
What are the odds of something crazy like LPDDR5x? RDNA2 was surprising with its small bus.
RDNA3 is rumored to have even bigger IF, more compression and clocks over 3GHz.... bandwidth shoundt be a problem, but latency might... a cheap, low latency and power efficient memory, might be good enought

ofc the patches may say otherwise XD, im really newbie at those memory jargons
Well rdna3 in phoenix point is probably going to have some lpddr5 variants if you want to count that.
As for the big boy graphics cards it's gddr or a long shot for hbm.
The umc and separate memory chiplets does actually open up some possibilities for memory types but I doubt they'll be used, doubly so in the consumer space.
It's the cdna3 apus thats will run with it with something like hbm plus genz (depreciated?) attached ddr5.

On second thought it might be worth humouring something like n33 with lpddr5x with low clocks for a portable scenario. Like that macbook only gpu last generation.
 
Reactions: Tlh97 and Olikan

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
What are the odds of something crazy like LPDDR5x? RDNA2 was surprising with its small bus.
RDNA3 is rumored to have even bigger IF, more compression and clocks over 3GHz.... bandwidth shoundt be a problem, but latency might... a cheap, low latency and power efficient memory, might be good enought

ofc the patches may say otherwise XD, im really newbie at those memory jargons

There is no way any of these cards are going to use LPDDR5x, which has a peak bandwidth rating of 8.5Gbps. GDDR6 has either 14 or 16Gbps, and will be cheaper to produce than LPDDR5x, which is tailored to low power applications (like phones).
 

Mopetar

Diamond Member
Jan 31, 2011
8,005
6,449
136
It's be weird for them not to have commitments already in place. In that case the GPUs and the memory launch when both are ready enough.

Has there been any talk about v-cache on MCDs? It seems practically obvious with the only real question being how much practical benefit it adds if you already have 384 MB of cache.

It would be pretty nuts if they were able to basically pack the v-cache twice as tight for the same area again, because that theoretically means a top end card with over a gigabyte of cache. That's just an interesting product no matter how you slice it.
 

jpiniero

Lifer
Oct 1, 2010
14,835
5,452
136
It's be weird for them not to have commitments already in place. In that case the GPUs and the memory launch when both are ready enough.

Has there been any talk about v-cache on MCDs? It seems practically obvious with the only real question being how much practical benefit it adds if you already have 384 MB of cache.

Feels like a Radeon Pro exclusive feature, at least in this gen. How about taking N31, adding 512 bit memory and also vcache? Even with dual sided GDDR6, that'd be 64 GB of memory. Course it would be megabucks.
 

gdansk

Platinum Member
Feb 8, 2011
2,489
3,381
136
Going from 16->24gbps and 256->384 bit would be twice the raw bandwidth. Consider there is more effective bandwidth from better last level cache and perhaps more compression. Every rumor keeps pointing to at least 2x again.

But I wouldn't be surprised if they use cheaper GDDR6 chips (21gbps?).
 
Reactions: Tlh97 and Saylick

SteinFG

Senior member
Dec 29, 2021
521
610
106
Samsung has said they're ramping up to coincide with new GPU launches this year. It would be strange for them to make all this effort and pomp just to miss the new GPU launches IMO.
Nothing about current year GPUs was told if I read the press release right. It just says next-gen GPUs, which is any GPU in the future
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
Going from 16->24gbps and 256->384 bit would be twice the raw bandwidth. Consider there is more effective bandwidth from better last level cache and perhaps more compression. Every rumor keeps pointing to at least 2x again.

But I wouldn't be surprised if they use cheaper GDDR6 chips (21gbps?).
You're probably right. 21 Gbps GDDR6 seems like a given since it's been available for a while now, and like you said would offer nearly a 2x increase in raw bandwidth (not including Infinity Cache) over N21 due to the 50% wider bus (21/16 * 1.5 = 1.97). Add in end-to-end data compression and they likely have enough raw bandwidth to hit the rumored >2x performance increase.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Nothing about current year GPUs was told if I read the press release right. It just says next-gen GPUs, which is any GPU in the future

This news is the product release, the development announcement for this speed was made last year. As far as I remember a product availability announcement for memory like this has shortly preceded products with it, so AMD card showing up with this could be just around the corner.

Since Samsung is the only vendor producing this speed it might only show up in the highest end models, while other GPUs get cheaper 21gbps memory from multiple vendors.
 
Reactions: Tlh97 and Stuka87

Aapje

Golden Member
Mar 21, 2022
1,467
2,031
106
Nothing about current year GPUs was told if I read the press release right. It just says next-gen GPUs, which is any GPU in the future

No, the next gen is the upcoming gen, not the gen after next gen. After reading the press release very carefully, I'm fairly confident that they intend to launch with this upcoming generation of cards.
 
Reactions: Tlh97 and Lodix

HurleyBird

Platinum Member
Apr 22, 2003
2,725
1,342
136
Wut? It isn't at all. Get a 6900XT and overclock the memory, the gains you'll get relative to the higher clocks and bandwidth increase vs. stock are not even close for a "bandwidth starved" conclusion.

The usual "bandwidth starved" conclusion is that a card is bandwidth starved if increased memory clocks yield more improvement than increased core clocks, which is facile and wrong. You also need to account for die area of different blocks. For example, if the memory system takes up 20% of the area that the shaders, TMUs, etc. do, then even if increased core clocks yield twice the performance of increased memory clocks, the design is still very bandwidth starved.

Of course, even that's a bit simplified. To name a couple other considerations, the power consumption of different blocks and the behavior at different resolutions should be factored in also. For the later, at the moment 4K is arguably more important than every other resolution combined (and will be moreso in the next generation as capable 4K gaming will be pushed lower in the product stack), and is also the point that AMD's current implementation starts to falter a bit against Nvidia's.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |