Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 23 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146

soresu

Platinum Member
Dec 19, 2014
2,959
2,181
136
Another thing they mentioned in another patent is that they can extract motion vectors from pixel activity from multiple images instead of relying on games engine to provide the motion vectors
This means they can stick the upscaler in something like RSR without game engine integration.
It does mean that, but it would definitely be inferior to engine integration as that is explicit and inferred MV's are implicit.

The latter meaning estimation/approximation rather than certainty so you will end up with artifacts when it makes mistakes.

Though I guess for games where you have no choice it's better than nothing if you are perf constrained.

Between one thing and another though I would expect some open technique to end up winning the day, it would be great if both AMD and Intel made their future scalers open and some collaboration effort turned out a less branded solution that many would be fine getting behind like with Vulkan.
 

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
That gets murky because your definition of "viable" changes depending on what you're discussing. Desktop CPU TDPs continue to be on the rise (just look at top end Alder Lake). You can still SLI 2 RTX 3090s and play the handful of games that support it. That's over 700W (going towards 800W with power caps) and poor game support. That's not much of a decline from 4-way 980 Ti's roughly 1kW power consumption and completely awful performance scaling.

While the fraction of a percent top-end has fluctuated indeed over the years, the Steam Hardware Survey alone has shown how overall, gaming PC's are taken a (little) more seriously than the before, and hardware has moved from 65W -> 95W -> 125W CPUs over the years as well as the move to more consistent use 100W+ GPUs from the days of where 25-50W GPUs were most common.

It's hypothesis for my part, but my belief that this has occurred is due to the increased rift in just-enough computing driving much higher laptop sales over Desktop sales. Those buying Desktops are more encouraged to do so fulfilling a need like gaming, vs. in the past when a Desktop was simply what you got unless portability was the primary concern.

Intel TDPs may be going up, but AMD’s TDPs are unchanged.

I'm gonna play "try to predict the specs!"

3 dies. 96, 64, and 32 work group processors. They've already got multiples of 16 going for series X and the Rx6600(s).

600 watt tdp/OAM only: 2 96wgp dies, 512mb SRAM, 4 HBM3 stacks, 64-96gb of ram, 2.5ghz. I imagine studios that use the new micro LED stages (like The Mandalorian) would love to render on a big farm of these.

500 watts tdp/OAM only: 2 82wgp dies (cut down 96 dies), otherwise same as above.

450 watts tdp/PCI Express 5.0: 2 80wgp (cut down 96, you can disable an entire shader engine or cut down a die even more, whichever!), 512-384mb SRAM, 24gb of ram, 384bit GDDR6 bus @20-24ghz, 2.5ghz. The big consumer card.

350 watts/PCI Express 4.0/5.0: 2 64wgp dies, 256mb SRAM, 16gb of ram, 256bit GDDR6@20-24ghz, 2.5ghz

300 watts (etc.): 1 96wgp die, 256mb SRAM, 16gb of ram, 256bit GDDR6@slow than the above, 2.75ghz.

250 watts: 1 80/82wgp die, 192mb SRAM, 16gb, 256bit GDDR6@slowish, 2.75ghz. Kinda a better 6900xt for a lot cheaper.

200 watts: 1 64wgp die, 192mb SRAM, 12gb, 192bit GDRR6@20ghz+, 2.75-2.9ghz. Mid range big seller.

175 watts: 1 56wgp die, 192-128mb SRAM , same as above but even cheaper!

125 watts: 1 32wgp monolithic die (other than SRAM), 64-128mb SRAM. 128bit GDDR6@20ghz+, 2.9ghz. Low end best seller, better than 6600xt all around for the same price or lower! (Chip shortage is expected to ease next year, and competition will be high).

100 watts: 28wgp, same as above just cut down, $250 (yay competition, I hope).

There, that was fun.

Unfortunately you are off the mark. Good guess, however.
 

thecoolnessrune

Diamond Member
Jun 8, 2005
9,673
580
126
Intel TDPs may be going up, but AMD’s TDPs are unchanged.

I'm not sure I follow. It's been believed that AM5 will likely move the needle up to 120W similar to what Intel has. Likewise, Genoa will almost undoubtedly track the same 400W TDP targets Intel is targeting for the next-gen Server space (there's no reason to leave the power on the table when OEMs will already be designing primarily around Intel TDPs).
 

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
I didn't understand half of that, but I'm very skeptical of upscaling that is not multi-frame. No matter how cheap, all we've seen so far is artifacts and failing to match the quality of newer DLSS versions. After all, what these techniques set out to do is AA cheaply, and only DLSS delivers somewhat satisfactory on that. FSR and similar techniques are surely a lot better than nothing, don't get me wrong, but hardware support for stuff that is not multi-frame seems like a waste to me.

Asking out of interest, do "hardware" scalers like in every TV not work well enough for gaming? or could we just use them either in the gpu or the display itself? I watch a lot of 720p and 1080p content on my 4k tv and it looks just fine.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
Asking out of interest, do "hardware" scalers like in every TV not work well enough for gaming? or could we just use them either in the gpu or the display itself? I watch a lot of 720p and 1080p content on my 4k tv and it looks just fine.
Those usually introduce a significant latency which is why most TVs also offer a "game mode" to avoid that.
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136

N31 and N32 on 5nm/6nm is no surprise. What's surprising is that MI300 is still 6nm (same as MI200). How is it going to compete with Hopper if they haven't moved to 5nm for that generation?
 
Reactions: Tlh97 and Joe NYC

Frenetic Pony

Senior member
May 1, 2012
218
179
116
It should fit too on OAM Module card

I think that's the main thing. Sure 5nm has better area and power. But it's an HPC like product, they probably don't care about power. And as for money, taping out a new chip on 5nm is ridiculously expensive and right now there seems to be limited supply. Going with 6nm, something like a bigger refresh of MI200 (just double the compute dies and switch to HBM3?) might be much faster to market and cheaper/more available.

I'd expect MI400 to come out on 4x in like two years or so, as there'll probably be good availability with TSMC seeing so much HPC work (heavy investment) and the ability to drive clockspeeds up to ridiculous highs will be beneficial.

Also Navi 33 is likely to look like the 6600xt. Ditch the on compute die SRAM and use one of their 64mb bonded SRAM chips to get better yields (alongside 6nm) while improving capabilities. Or it'll be a refresh of the 6600xt altogether, though with simultaneous competition from Intel and Nvidia that doesn't seem as good an idea (arch improvements should boost performance over a refresh). Either way the performance configuration/die size ratio of the 6600 is in the sweet spot for a mass market card given the new console performance.
 
Reactions: Tlh97

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
I think that's the main thing. Sure 5nm has better area and power. But it's an HPC like product, they probably don't care about power. And as for money, taping out a new chip on 5nm is ridiculously expensive and right now there seems to be limited supply. Going with 6nm, something like a bigger refresh of MI200 (just double the compute dies and switch to HBM3?) might be much faster to market and cheaper/more available.

I'd expect MI400 to come out on 4x in like two years or so, as there'll probably be good availability with TSMC seeing so much HPC work (heavy investment) and the ability to drive clockspeeds up to ridiculous highs will be beneficial.

I have seen rumors that Mi300 will be (as if) an APU. Meaning there will also be a CPU in the package. Say 8 cores per GPU unit.

If that is the case, and there will be 8 CPU cores on the same die with the GPU, and Mi300 is to be released in H2 2022, then time to market would dictate that the CPU cores are Zen 3, and Zen 3 is already on N6 (in Rembrandt).

So these individual chiplets of Mi300 may be a compute Rembrandt on steroids.

Also Navi 33 is likely to look like the 6600xt. Ditch the on compute die SRAM and use one of their 64mb bonded SRAM chips to get better yields (alongside 6nm) while improving capabilities. Or it'll be a refresh of the 6600xt altogether, though with simultaneous competition from Intel and Nvidia that doesn't seem as good an idea (arch improvements should boost performance over a refresh). Either way the performance configuration/die size ratio of the 6600 is in the sweet spot for a mass market card given the new console performance.

It will be interesting to see if there is a stacked SRAM Infinity Cache on Navi 33, or if it is a single monolithic die.

As far as 6600 itself, Navi23, it is notably absent from RDNA2 "refresh" being talked about, which just covers Navi21 and maybe Navi22.

It would not surprise me if they have a real refresh, no N6 of the Navi23 die, because, as you say, that would be a mass market sweet spot, and it could co-exist with RDNA3.

As far as Navi33, I think it will be a bigger die, if it is to reach performance near Navi31.
 

biostud

Lifer
Feb 27, 2003
18,397
4,963
136
I'm not sure I follow. It's been believed that AM5 will likely move the needle up to 120W similar to what Intel has. Likewise, Genoa will almost undoubtedly track the same 400W TDP targets Intel is targeting for the next-gen Server space (there's no reason to leave the power on the table when OEMs will already be designing primarily around Intel TDPs).
But the question is how much performance do you get in the same TDP.
 

biostud

Lifer
Feb 27, 2003
18,397
4,963
136
Probably not "much" more since the whole point of increased TDP is to get more performance.
So, by not much you don't think the rumor of 96 zen4 cores within 400w TDP is going to be substantial more than what Intel can offer?

 

Tup3x

Golden Member
Dec 31, 2016
1,011
1,001
136
So, by not much you don't think the rumor of 96 zen4 cores within 400w TDP is going to be substantial more than what Intel can offer?

I was talking about AM5 in comparison to LGA1700. Not sure what that would be competing against and if Intel has something significantly different coming up.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
There isn’t. None of mainstream products will have die stacking.

That's a question of cost. Increased die yield versus packaging cost is an unknown outside TSMC and its close customers at this point. Certainly it's probably costly right now with the introduction of such technology. But the SRAM is a variation of N6, and so is Navi33. If they're made in the same facility there won't be any transportation costs, and if Navi 33 comes out next year packaging costs might come down enough that it's worth the tradeoff.

That's just speculation of course. But the SRAM chips being 64mb indicate that's around the sweet spot for AMD in terms of yield tradeoff, and sounds about right for a next gen mainstream GPU cache.
 

scineram

Senior member
Nov 1, 2020
361
283
106
It’s not just a question of cost, but also capacity. This is low volume cutting edge packaging tech, so only for niche products for a while.
 

KompuKare

Golden Member
Jul 28, 2009
1,072
1,111
136
It’s not just a question of cost, but also capacity. This is low volume cutting edge packaging tech, so only for niche products for a while.
There's versatility too.
Zen3 only had four SKUs (5600X, 5800X, 5900X, 5950X).
Zen3 APU even less.
But precisely for the APUs or GPUs, being able to make a chiplet with the forward planning for die stacking makes a lot of sense.
Suddenly you can then sell 5700G as 5850G-3D with more cache than mainstream desktop.
Sell a Navi 21 die as 6800, 6800 XT, 6900 XT and 6950 XT-3D with more infinity cache.
Halo parts generate crazy mindshare, and if you can re-purpose an existing die with extra 3D cached stacked on to it as a halo part you get that mindshare without the huge expense of new masks and designs.

Same united die (Zen4 / Navi 31 / whatever) designed from the outset to have the stacking vias. Then take some of those chiplets and send them to TSMC's 3D packaging facility. Halo tends to be low volume anyhow.
 
Reactions: Tlh97

soresu

Platinum Member
Dec 19, 2014
2,959
2,181
136
From what I remember of the NV21 layout the Infinity Cache area configuration is not ideal at all for adding stacked cache on top.

It's more stretched out than the relatively square like configuration of the Matisse-X cache chiplet isn't it?
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
From what I remember of the NV21 layout the Infinity Cache area configuration is not ideal at all for adding stacked cache on top.

It's more stretched out than the relatively square like configuration of the Matisse-X cache chiplet isn't it?

It kind just fills in space around the outside with gaps for the memory PHYs.

So yea, not ideal for a stacked design at all.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Alternatively it was the most space efficient way to lay it out.
I doubt it.

Data movement reduction, both the amount of and the distance moved, is crucial for GPU efficiency. Highly reused code and data, those cases that benefit the most from IF cache, can be situated closer to needed shaders with a spatially distributed cache that is also closer on average to the memory controllers for those subgroups of data.
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
I doubt it.

Data movement reduction, both the amount of and the distance moved, is crucial for GPU efficiency. Highly reused code and data, those cases that benefit the most from IF cache, can be situated closer to needed shaders with a spatially distributed cache that is also closer on average to the memory controllers for those subgroups of data.

In which case N31 and N32 are doomed to fail since cache will be on another die(s)
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |