Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

moinmoin · Nov 17, 2021

nicalandia said:
Could one use one of these for gaming?

Wouldn't be optimal as that's essentially a SLI setup. For gaming you want one gpulet to pose as a single GPU for the whole package, and have massive bandwidth between all the parts. MI200 is more like Epyc Naples while for gaming we need something like Epyc Rome.

nicalandia · Nov 17, 2021

moinmoin said:
Wouldn't be optimal as that's essentially a SLI setup. For gaming you want one gpulet to pose as a single GPU for the whole package, and have massive bandwidth between all the parts. MI200 is more like Epyc Naples while for gaming we need something like Epyc Rome.

The one pictured is actually a MI250X, but I get what you are saying

nicalandia · Nov 17, 2021

MI250X Specs seems incredible

https://www.amd.com/es/products/server-accelerators/instinct-mi250x

moinmoin · Nov 17, 2021

nicalandia said:
The one pictured is actually a MI250X, but I get what you are saying

The series is called MI200.

Frenetic Pony · Nov 22, 2021

thecoolnessrune said:
For the past year or so we’ve been getting briefed by our Partners indicating ~300W CPU / ~400W GPU configurations air cooled. 400W CPU / 600W GPU configurations liquid cooled. This is why company’s like Cisco released the UCS-X Chassis. 2023 is supposed to see higher wattage GPU configurations than that. 1kW is not far off.

Can't get 1k even with normal water cooling, but TSMC has some weird experiments with "on chip" water cooling so sure eventually. But even these new 600w cards only make sense for OAM only GPUs (that's above even PCI Express 5.0 power specs), mostly used in datacenters with open loop water cooling. Which is not to mention the sheer size of the standard heatsinks there, they make the 3090 looks small. I can't see these cards being even for most high end consumers (air cooling absolutely maxes out at 450 watts, and closed loop water cooling isn't much if any better). But hey, maybe if you've got 5k+ to drop on the system alone you can get one. We can all watch Henry Cavill play Crysis 4 at 8k 120 and really put that HDMI 2.1 to use.

thecoolnessrune · Nov 23, 2021

Frenetic Pony said:
Can't get 1k even with normal water cooling, but TSMC has some weird experiments with "on chip" water cooling so sure eventually. But even these new 600w cards only make sense for OAM only GPUs (that's above even PCI Express 5.0 power specs), mostly used in datacenters with open loop water cooling. Which is not to mention the sheer size of the standard heatsinks there, they make the 3090 looks small. I can't see these cards being even for most high end consumers (air cooling absolutely maxes out at 450 watts, and closed loop water cooling isn't much if any better). But hey, maybe if you've got 5k+ to drop on the system alone you can get one. We can all watch Henry Cavill play Crysis 4 at 8k 120 and really put that HDMI 2.1 to use.

Yes I agree, all of this is for Datacenter. While we've seen power consumption of PCs increase substantially in 10 years, there are reasonable limits that we won't see Home Use PC's cross, especially as laptops continue to outsell desktops at an over 2:1 ratio.

As far as closed loop cooling goes, in Datacenter several OEMs seem to have some interesting investments up their sleeves, and considering they're developing Closed Loop Liquid Cooling as an option in a number of next generation designs, it stands to reason that there's something there that must make it worthwhile vs. jumping directly from air to facility cooling. In traditional server designs it's less appealing, but in modular chassis designs and in multi-node design (2U4N and similar) this makes way more sense, as there is far more density (Even most recent Ice Lake 2U4N designs are limited to 205W TDPs).

biostud · Nov 23, 2021

thecoolnessrune said:
Yes I agree, all of this is for Datacenter. While we've seen power consumption of PCs increase substantially in 10 years, there are reasonable limits that we won't see Home Use PC's cross, especially as laptops continue to outsell desktops at an over 2:1 ratio.

When SLI and CF stopped being a viable solution, the power consumption of a top gaming PC declined.

Saylick · Nov 23, 2021

biostud said:
When SLI and CF stopped being a viable solution, the power consumption of a top gaming PC declined.

Ah yes, the glory days of QuadSLI and Quadfire... 2x HD7990 wasn't too long ago.

thecoolnessrune · Nov 23, 2021

biostud said:
When SLI and CF stopped being a viable solution, the power consumption of a top gaming PC declined.

That gets murky because your definition of "viable" changes depending on what you're discussing. Desktop CPU TDPs continue to be on the rise (just look at top end Alder Lake). You can still SLI 2 RTX 3090s and play the handful of games that support it. That's over 700W (going towards 800W with power caps) and poor game support. That's not much of a decline from 4-way 980 Ti's roughly 1kW power consumption and completely awful performance scaling.

While the fraction of a percent top-end has fluctuated indeed over the years, the Steam Hardware Survey alone has shown how overall, gaming PC's are taken a (little) more seriously than the before, and hardware has moved from 65W -> 95W -> 125W CPUs over the years as well as the move to more consistent use 100W+ GPUs from the days of where 25-50W GPUs were most common.

It's hypothesis for my part, but my belief that this has occurred is due to the increased rift in just-enough computing driving much higher laptop sales over Desktop sales. Those buying Desktops are more encouraged to do so fulfilling a need like gaming, vs. in the past when a Desktop was simply what you got unless portability was the primary concern.

blckgrffn · Nov 23, 2021

thecoolnessrune said:
That gets murky because your definition of "viable" changes depending on what you're discussing. Desktop CPU TDPs continue to be on the rise (just look at top end Alder Lake). You can still SLI 2 RTX 3090s and play the handful of games that support it. That's over 700W (going towards 800W with power caps) and poor game support. That's not much of a decline from 4-way 980 Ti's roughly 1kW power consumption and completely awful performance scaling.

While the fraction of a percent top-end has fluctuated indeed over the years, the Steam Hardware Survey alone has shown how overall, gaming PC's are taken a (little) more seriously than the before, and hardware has moved from 65W -> 95W -> 125W CPUs over the years as well as the move to more consistent use 100W+ GPUs from the days of where 25-50W GPUs were most common.

It's hypothesis for my part, but my belief that this has occurred is due to the increased rift in just-enough computing driving much higher laptop sales over Desktop sales. Those buying Desktops are more encouraged to do so fulfilling a need like gaming, vs. in the past when a Desktop was simply what you got unless portability was the primary concern.

On the other side, if you need a "desktop" for some reason the business off lease supplies are immense.

In the "just enough" desktop side we've got Synology/Qnap units replacing that old PC you used a server and even the R-Pi as a standard computing platform that makes setting up a lot of software trivial. I am not recommending SBC's for desktop use but I have replaced several PCs with dedicated, much lower power appliances.

I am not excited about moving to north of 400W for GPUs. I mean... that's such a burden on everything else in the system. When the consoles are like sub 200W by themselves with pretty capable GPUs it seems so gratuitous.

(Coming from a guy who rocked a 290x for years because of the value )

biostud · Nov 23, 2021

thecoolnessrune said:
That gets murky because your definition of "viable" changes depending on what you're discussing. Desktop CPU TDPs continue to be on the rise (just look at top end Alder Lake). You can still SLI 2 RTX 3090s and play the handful of games that support it. That's over 700W (going towards 800W with power caps) and poor game support. That's not much of a decline from 4-way 980 Ti's roughly 1kW power consumption and completely awful performance scaling.

NVIDIA has stopped promoting SLI, so while you could run dual 3090 it is not a fair comparison to when SLI and CF was actually a realistic solution for a gaming setup.

The 12900K might use a lot of power under full load, but not when gaming.

Also if you are are gaming you wouldn't need to buy the most expensive desktop CPU, as much of the extra cost lies in MT performance coming from lots of cores, that games rarely use. In 95% of the cases the GPU will be the limiting factor whether you run a 12600k, 12700k, 12900k, 5800x, 5900x or 5950x in 1440p or higher resolution.

thecoolnessrune · Nov 23, 2021

biostud said:
NVIDIA has stopped promoting SLI, so while you could run dual 3090 it is not a fair comparison to when SLI and CF was actually a realistic solution for a gaming setup.

That's exactly what I meant by murky. Calling it not fair simply because NVIDIA stopped promoting it is indeed an arbitrary point, but that's fine. All the same, calling dual 3090 SLI "not realistic" is not based on any resilient definition, as what was less realistic than someone taking single digit performance gains or even performance losses in going from dual SLI to triple or quad SLI? Top End SLI / Crossfire was never a "realistic" solution, which is why it's going away.

As to the CPU differences, I'll say again that I agree with you, but that is entirely different from how top end gaming PCs are sold and marketed. That methodology does not follow what gets sold as top end gaming PCs. "Top End Gaming PC's" is entirely a fractional percentage niche category with trends entirely separate from what's seen in the PC Market.

biostud · Nov 23, 2021

Saylick said:
Ah yes, the glory days of QuadSLI and Quadfire... 2x HD7990 wasn't too long ago.

I had a single 7990

biostud · Nov 23, 2021

thecoolnessrune said:
That's exactly what I meant by murky. Calling it not fair simply because NVIDIA stopped promoting it is indeed an arbitrary point, but that's fine. All the same, calling dual 3090 SLI "not realistic" is not based on any resilient definition, as what was less realistic than someone taking single digit performance gains or even performance losses in going from dual SLI to triple or quad SLI? Top End SLI / Crossfire was never a "realistic" solution, which is why it's going away.

As to the CPU differences, I'll say again that I agree with you, but that is entirely different from how top end gaming PCs are sold and marketed. That methodology does not follow what gets sold as top end gaming PCs. "Top End Gaming PC's" is entirely a fractional percentage niche category with trends entirely separate from what's seen in the PC Market.

Also once AMD (and nvidia) starts using MCM it is two GPU's on one video card.

thecoolnessrune · Nov 23, 2021

biostud said:
Also once AMD (and nvidia) starts using MCM it is two GPU's on one video card.

I think this will be fluid. MCMs themselves are nothing new at this point, but our definitions have had to be malleable as technology has changed. Back in the early days of dual core CPUs, I can remember this very same debate over whether a CPU was two CPUs in one socket, vs. the concept of cores. Indeed, Software had to maneuver the same differences, as NUMA as a concept always existed. It didn't help that back in the early days when Intel had its CPUs shared across the Front Side Bus vs. AMD's SRQ internal bus. Either way, as designs evolved, we got used to saying systems had multiple cores per socket, and systems may have multiple sockets. When Ryzen was released with its concept of CCX, we did not immediately return to saying Ryzen had multiple CPUs per Socket. We further delineated that a CPU may have multiple Cores, and multiple cores may reside on a Core Complex, and a CPU may have multiple Core Complexes.

If CDNA2 ends up being multiple GPU Complexes attached via PCIe, then I think calling it 2 GPUs on one video card is accurate still, just like with SLI and Crossfire. However, as AMD alludes to having a high bandwidth die interconnect (reminding me a bit of SRQ with AMD's first dual core CPU), and if the Memory footprint is Unified, then perhaps a new intermediary term like GPU Complex would begin to be more appropriate in this area.

jpiniero · Nov 23, 2021

thecoolnessrune said:
If CDNA2 ends up being multiple GPU Complexes attached via PCIe, then I think calling it 2 GPUs on one video card is accurate still, just like with SLI and Crossfire. However, as AMD alludes to having a high bandwidth die interconnect (reminding me a bit of SRQ with AMD's first dual core CPU), and if the Memory footprint is Unified, then perhaps a new intermediary term like GPU Complex would begin to be more appropriate in this area.

I thought AMD confirmed it is being exposed as 2 GPUs.

thecoolnessrune · Nov 23, 2021

jpiniero said:
I thought AMD confirmed it is being exposed as 2 GPUs.

If anyone knows, I'd like to know that too! From AMD's presentation, they refer to the OAM module as a "Multi-Die GPU" (singular, noted on the slide as well as by the speaker). But that could still mean that the individual dies of the GPU are exposed to the workloads. I haven't had a lot of time to look for any insights journalists have gotten further out of AMD, so if anyone has them, I'd welcome seeing them!

moinmoin · Nov 24, 2021

I feel really uncomfortable with calling the CDNA family "GPUs" at this point. For compute being multi-die multi-socket etc. is no issue. For graphic workloads it is since there needs to be a last step assembling all the different computed results in one graphic frame in some way. The more disintegrated this happens the more effort (especially bandwidth and low latency) is needed to assemble the final picture back. That's why SLI and CF are as inefficient as they are and lost popularity even though DX12 technically even enabled mixed setups. For pure compute this major bottleneck of targeting an unified picture just doesn't apply.

andermans · Nov 26, 2021

moinmoin said:
I feel really uncomfortable with calling the CDNA family "GPUs" at this point. For compute being multi-die multi-socket etc. is no issue. For graphic workloads it is since there needs to be a last step assembling all the different computed results in one graphic frame in some way. The more disintegrated this happens the more effort (especially bandwidth and low latency) is needed to assemble the final picture back. That's why SLI and CF are as inefficient as they are and lost popularity even though DX12 technically even enabled mixed setups. For pure compute this major bottleneck of targeting an unified picture just doesn't apply.

I think it is safe to say they aren't GPUs. MI100 didn't have any rasterizer hardware so if you want to do graphics you have to do it all in compute. Suspect Mi200 doesn't have rasterizer HW either.

Frenetic Pony · Nov 26, 2021

I'm gonna play "try to predict the specs!"

3 dies. 96, 64, and 32 work group processors. They've already got multiples of 16 going for series X and the Rx6600(s).

600 watt tdp/OAM only: 2 96wgp dies, 512mb SRAM, 4 HBM3 stacks, 64-96gb of ram, 2.5ghz. I imagine studios that use the new micro LED stages (like The Mandalorian) would love to render on a big farm of these.

500 watts tdp/OAM only: 2 82wgp dies (cut down 96 dies), otherwise same as above.

450 watts tdp/PCI Express 5.0: 2 80wgp (cut down 96, you can disable an entire shader engine or cut down a die even more, whichever!), 512-384mb SRAM, 24gb of ram, 384bit GDDR6 bus @20-24ghz, 2.5ghz. The big consumer card.

350 watts/PCI Express 4.0/5.0: 2 64wgp dies, 256mb SRAM, 16gb of ram, 256bit GDDR6@20-24ghz, 2.5ghz

300 watts (etc.): 1 96wgp die, 256mb SRAM, 16gb of ram, 256bit GDDR6@slow than the above, 2.75ghz.

250 watts: 1 80/82wgp die, 192mb SRAM, 16gb, 256bit GDDR6@slowish, 2.75ghz. Kinda a better 6900xt for a lot cheaper.

200 watts: 1 64wgp die, 192mb SRAM, 12gb, 192bit GDRR6@20ghz+, 2.75-2.9ghz. Mid range big seller.

175 watts: 1 56wgp die, 192-128mb SRAM , same as above but even cheaper!

125 watts: 1 32wgp monolithic die (other than SRAM), 64-128mb SRAM. 128bit GDDR6@20ghz+, 2.9ghz. Low end best seller, better than 6600xt all around for the same price or lower! (Chip shortage is expected to ease next year, and competition will be high).

100 watts: 28wgp, same as above just cut down, $250 (yay competition, I hope).

There, that was fun.

xilli_fiberbit · Dec 11, 2021

There is patent from AMD about stacked die related on machine learning and GPU : https://www.freepatentsonline.com/20210374607.pdf

In the two diagrams posted within the patents, it is stated that the APD die is both a memory and machine learning accelerator die which includes memory, machine learning accelerators, memory interconnects, inter-die interconnects, and controllers. The memory within the APD die can be used as both, a cache for the APD core die or can be utilized directly by the operations performed on the machine learning accelerators such as Matrix Multiplication operations.

DisEnchantment · Jan 29, 2022

xilli_fiberbit said:
There is patent from AMD about stacked die related on machine learning and GPU : https://www.freepatentsonline.com/20210374607.pdf

https://twitter.com/x/status/1487367034648203265

Seems Greymon55 also saying that N3x will have ML die

Let's rekindle this thread, Computex is only 4 months away.
And of course the multi die GPU chiplets has been discussed at length and I will not repeat.

Many new patents from AMD for RT were published since the well known 11200724 : Texture processor based ray tracing acceleration method and system (filed in 2017), which we know was used in RDNA 2/XSX/PS5

These are the new patents since then
10692271 Robust ray-triangle intersection

https://www.freepatentsonline.com/10692271.html

10706609 Efficient data path for ray triangle intersection

https://www.freepatentsonline.com/10706609.html

20200380761 COMMAND PROCESSOR BASED MULTI DISPATCH SCHEDULER

https://www.freepatentsonline.com/y2020/0380761.html

10930050 Mechanism for supporting discard functionality in a ray tracing context

https://www.freepatentsonline.com/10930050.html

20210209832 BOUNDING VOLUME HIERARCHY TRAVERSAL

https://www.freepatentsonline.com/y2021/0209832.html

Possible HW BVH Traversal

20210287421 RAY-TRACING MULTI-SAMPLE ANTI-ALIASING

https://www.freepatentsonline.com/y2021/0287421.html

20210287422 PARTIALLY RESIDENT BOUNDING VOLUME HIERARCHY

https://www.freepatentsonline.com/y2021/0287422.html

11158112 Bounding volume hierarchy generation

https://www.freepatentsonline.com/11158112.html

HW Assisted BVH structure generation

20210304484 BOUNDING VOLUME HIERARCHY COMPRESSION

https://www.freepatentsonline.com/y2021/0304484.html

BVH Compression

20210407175 EARLY CULLING FOR RAY TRACING

https://www.freepatentsonline.com/y2021/0407175.html

20210407176 EARLY TERMINATION OF BOUNDING VOLUME HIERARCHY TRAVERSAL

https://www.freepatentsonline.com/y2021/0407176.html

AMD is going all in on RT

But my most interesting part is the HW assisted convolution.

20200184002 HARDWARE ACCELERATED CONVOLUTION

https://www.freepatentsonline.com/y2020/0184002.html

This coupled with the ML chiplet (which is actually Cache+ML chiplet in one) could actually be the target implementation for Gaming Super Resolution that we discussed here before

DisEnchantment said:
New Patent came up today for AMD's FSR

Patent Public Search | USPTO

appft.uspto.gov

https://www.freepatentsonline.com/y2021/0150669.html

20210150669 GAMING SUPER RESOLUTION

Abstract
A processing device is provided which includes memory and a processor. The processor is configured to receive an input image having a first resolution, generate linear down-sampled versions of the input image by down-sampling the input image via a linear upscaling network and generate non-linear down-sampled versions of the input image by down-sampling the input image via a non-linear upscaling network. The processor is also configured to convert the down-sampled versions of the input image into pixels of an output image having a second resolution higher than the first resolution and provide the output image for display

It uses Inferencing for upscaling. As will all ML models, how you assemble the layers, what kind of parameters you choose, which activation functions you choose etc, matters a lot, and the difference could be night and day in accuracy, performance and memory

View attachment 44667

View attachment 44666

CNN are the most suited for performing ML with images and having that ML in the chiplet doesn't sound like a terrible idea.
One thing being that AMD could perform the upscaling without using motion vectors and utilizing the image at the end of the rendering pipeline, the output of which could be kept at the L3/chiplet die.

Another thing they mentioned in another patent is that they can extract motion vectors from pixel activity from multiple images instead of relying on games engine to provide the motion vectors
This means they can stick the upscaler in something like RSR without game engine integration.

The ML+Cache chiplet could actually be 3D IFC which could run on a different clock domain than the CUs.
And being specially optimized for convolution could mean that they are better for ML inferencing with images than general purpose matrix units
This could be the real Gaming Super Resolution, basically Radeon Super Resolution taken to the next level requiring no game integration at all.

https://twitter.com/x/status/1487373231057104901

/speculation
One hint, patents which are continuations or taking advantages of provisional patents are good candidates for being actual product technology

Linux merge window for 5.18 is coming in two months, so watch out for that

Infinity cache is like tailor made for BVH transversal. Similarly ML units sitting so close to the Cache is like PIM.
They just need to glue them well. FIngers crossed

Mopetar · Jan 29, 2022

Regardless of what new or exciting technology makes it into these cards, I think we're all just hoping that they'll be available without having to spend $$$$ just to be able to get one.

CakeMonster · Jan 29, 2022

I didn't understand half of that, but I'm very skeptical of upscaling that is not multi-frame. No matter how cheap, all we've seen so far is artifacts and failing to match the quality of newer DLSS versions. After all, what these techniques set out to do is AA cheaply, and only DLSS delivers somewhat satisfactory on that. FSR and similar techniques are surely a lot better than nothing, don't get me wrong, but hardware support for stuff that is not multi-frame seems like a waste to me.

GodisanAtheist · Jan 29, 2022

Mopetar said:
Regardless of what new or exciting technology makes it into these cards, I think we're all just hoping that they'll be available without having to spend $$$$ just to be able to get one.

-I'd just be happy if they make this gen of cards cheap, used or otherwise.

Even a 6600xt is a huge leap in performance from my 980ti, if I could get something like a 67/6800xt for $400 I'd again be set for the next 6 years.

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Member

Senior member

Member

Attachments

Golden Member

Diamond Member

Golden Member

Diamond Member