Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 22 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
Could one use one of these for gaming?
Wouldn't be optimal as that's essentially a SLI setup. For gaming you want one gpulet to pose as a single GPU for the whole package, and have massive bandwidth between all the parts. MI200 is more like Epyc Naples while for gaming we need something like Epyc Rome.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Wouldn't be optimal as that's essentially a SLI setup. For gaming you want one gpulet to pose as a single GPU for the whole package, and have massive bandwidth between all the parts. MI200 is more like Epyc Naples while for gaming we need something like Epyc Rome.
The one pictured is actually a MI250X, but I get what you are saying
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
For the past year or so we’ve been getting briefed by our Partners indicating ~300W CPU / ~400W GPU configurations air cooled. 400W CPU / 600W GPU configurations liquid cooled. This is why company’s like Cisco released the UCS-X Chassis. 2023 is supposed to see higher wattage GPU configurations than that. 1kW is not far off.

Can't get 1k even with normal water cooling, but TSMC has some weird experiments with "on chip" water cooling so sure eventually. But even these new 600w cards only make sense for OAM only GPUs (that's above even PCI Express 5.0 power specs), mostly used in datacenters with open loop water cooling. Which is not to mention the sheer size of the standard heatsinks there, they make the 3090 looks small. I can't see these cards being even for most high end consumers (air cooling absolutely maxes out at 450 watts, and closed loop water cooling isn't much if any better). But hey, maybe if you've got 5k+ to drop on the system alone you can get one. We can all watch Henry Cavill play Crysis 4 at 8k 120 and really put that HDMI 2.1 to use.
 

thecoolnessrune

Diamond Member
Jun 8, 2005
9,673
580
126
Can't get 1k even with normal water cooling, but TSMC has some weird experiments with "on chip" water cooling so sure eventually. But even these new 600w cards only make sense for OAM only GPUs (that's above even PCI Express 5.0 power specs), mostly used in datacenters with open loop water cooling. Which is not to mention the sheer size of the standard heatsinks there, they make the 3090 looks small. I can't see these cards being even for most high end consumers (air cooling absolutely maxes out at 450 watts, and closed loop water cooling isn't much if any better). But hey, maybe if you've got 5k+ to drop on the system alone you can get one. We can all watch Henry Cavill play Crysis 4 at 8k 120 and really put that HDMI 2.1 to use.

Yes I agree, all of this is for Datacenter. While we've seen power consumption of PCs increase substantially in 10 years, there are reasonable limits that we won't see Home Use PC's cross, especially as laptops continue to outsell desktops at an over 2:1 ratio.

As far as closed loop cooling goes, in Datacenter several OEMs seem to have some interesting investments up their sleeves, and considering they're developing Closed Loop Liquid Cooling as an option in a number of next generation designs, it stands to reason that there's something there that must make it worthwhile vs. jumping directly from air to facility cooling. In traditional server designs it's less appealing, but in modular chassis designs and in multi-node design (2U4N and similar) this makes way more sense, as there is far more density (Even most recent Ice Lake 2U4N designs are limited to 205W TDPs).
 

biostud

Lifer
Feb 27, 2003
18,402
4,965
136
Yes I agree, all of this is for Datacenter. While we've seen power consumption of PCs increase substantially in 10 years, there are reasonable limits that we won't see Home Use PC's cross, especially as laptops continue to outsell desktops at an over 2:1 ratio.

When SLI and CF stopped being a viable solution, the power consumption of a top gaming PC declined.
 
Reactions: Tlh97 and Saylick

thecoolnessrune

Diamond Member
Jun 8, 2005
9,673
580
126
When SLI and CF stopped being a viable solution, the power consumption of a top gaming PC declined.
That gets murky because your definition of "viable" changes depending on what you're discussing. Desktop CPU TDPs continue to be on the rise (just look at top end Alder Lake). You can still SLI 2 RTX 3090s and play the handful of games that support it. That's over 700W (going towards 800W with power caps) and poor game support. That's not much of a decline from 4-way 980 Ti's roughly 1kW power consumption and completely awful performance scaling.

While the fraction of a percent top-end has fluctuated indeed over the years, the Steam Hardware Survey alone has shown how overall, gaming PC's are taken a (little) more seriously than the before, and hardware has moved from 65W -> 95W -> 125W CPUs over the years as well as the move to more consistent use 100W+ GPUs from the days of where 25-50W GPUs were most common.

It's hypothesis for my part, but my belief that this has occurred is due to the increased rift in just-enough computing driving much higher laptop sales over Desktop sales. Those buying Desktops are more encouraged to do so fulfilling a need like gaming, vs. in the past when a Desktop was simply what you got unless portability was the primary concern.
 
Reactions: Tlh97 and blckgrffn

blckgrffn

Diamond Member
May 1, 2003
9,198
3,185
136
www.teamjuchems.com
That gets murky because your definition of "viable" changes depending on what you're discussing. Desktop CPU TDPs continue to be on the rise (just look at top end Alder Lake). You can still SLI 2 RTX 3090s and play the handful of games that support it. That's over 700W (going towards 800W with power caps) and poor game support. That's not much of a decline from 4-way 980 Ti's roughly 1kW power consumption and completely awful performance scaling.

While the fraction of a percent top-end has fluctuated indeed over the years, the Steam Hardware Survey alone has shown how overall, gaming PC's are taken a (little) more seriously than the before, and hardware has moved from 65W -> 95W -> 125W CPUs over the years as well as the move to more consistent use 100W+ GPUs from the days of where 25-50W GPUs were most common.

It's hypothesis for my part, but my belief that this has occurred is due to the increased rift in just-enough computing driving much higher laptop sales over Desktop sales. Those buying Desktops are more encouraged to do so fulfilling a need like gaming, vs. in the past when a Desktop was simply what you got unless portability was the primary concern.

On the other side, if you need a "desktop" for some reason the business off lease supplies are immense.

In the "just enough" desktop side we've got Synology/Qnap units replacing that old PC you used a server and even the R-Pi as a standard computing platform that makes setting up a lot of software trivial. I am not recommending SBC's for desktop use but I have replaced several PCs with dedicated, much lower power appliances.

I am not excited about moving to north of 400W for GPUs. I mean... that's such a burden on everything else in the system. When the consoles are like sub 200W by themselves with pretty capable GPUs it seems so gratuitous.

(Coming from a guy who rocked a 290x for years because of the value )
 

biostud

Lifer
Feb 27, 2003
18,402
4,965
136
That gets murky because your definition of "viable" changes depending on what you're discussing. Desktop CPU TDPs continue to be on the rise (just look at top end Alder Lake). You can still SLI 2 RTX 3090s and play the handful of games that support it. That's over 700W (going towards 800W with power caps) and poor game support. That's not much of a decline from 4-way 980 Ti's roughly 1kW power consumption and completely awful performance scaling.

NVIDIA has stopped promoting SLI, so while you could run dual 3090 it is not a fair comparison to when SLI and CF was actually a realistic solution for a gaming setup.

The 12900K might use a lot of power under full load, but not when gaming.

Also if you are are gaming you wouldn't need to buy the most expensive desktop CPU, as much of the extra cost lies in MT performance coming from lots of cores, that games rarely use. In 95% of the cases the GPU will be the limiting factor whether you run a 12600k, 12700k, 12900k, 5800x, 5900x or 5950x in 1440p or higher resolution.
 

thecoolnessrune

Diamond Member
Jun 8, 2005
9,673
580
126
NVIDIA has stopped promoting SLI, so while you could run dual 3090 it is not a fair comparison to when SLI and CF was actually a realistic solution for a gaming setup.
That's exactly what I meant by murky. Calling it not fair simply because NVIDIA stopped promoting it is indeed an arbitrary point, but that's fine. All the same, calling dual 3090 SLI "not realistic" is not based on any resilient definition, as what was less realistic than someone taking single digit performance gains or even performance losses in going from dual SLI to triple or quad SLI? Top End SLI / Crossfire was never a "realistic" solution, which is why it's going away.

As to the CPU differences, I'll say again that I agree with you, but that is entirely different from how top end gaming PCs are sold and marketed. That methodology does not follow what gets sold as top end gaming PCs. "Top End Gaming PC's" is entirely a fractional percentage niche category with trends entirely separate from what's seen in the PC Market.
 

biostud

Lifer
Feb 27, 2003
18,402
4,965
136
That's exactly what I meant by murky. Calling it not fair simply because NVIDIA stopped promoting it is indeed an arbitrary point, but that's fine. All the same, calling dual 3090 SLI "not realistic" is not based on any resilient definition, as what was less realistic than someone taking single digit performance gains or even performance losses in going from dual SLI to triple or quad SLI? Top End SLI / Crossfire was never a "realistic" solution, which is why it's going away.

As to the CPU differences, I'll say again that I agree with you, but that is entirely different from how top end gaming PCs are sold and marketed. That methodology does not follow what gets sold as top end gaming PCs. "Top End Gaming PC's" is entirely a fractional percentage niche category with trends entirely separate from what's seen in the PC Market.

Also once AMD (and nvidia) starts using MCM it is two GPU's on one video card.
 

thecoolnessrune

Diamond Member
Jun 8, 2005
9,673
580
126
Also once AMD (and nvidia) starts using MCM it is two GPU's on one video card.
I think this will be fluid. MCMs themselves are nothing new at this point, but our definitions have had to be malleable as technology has changed. Back in the early days of dual core CPUs, I can remember this very same debate over whether a CPU was two CPUs in one socket, vs. the concept of cores. Indeed, Software had to maneuver the same differences, as NUMA as a concept always existed. It didn't help that back in the early days when Intel had its CPUs shared across the Front Side Bus vs. AMD's SRQ internal bus. Either way, as designs evolved, we got used to saying systems had multiple cores per socket, and systems may have multiple sockets. When Ryzen was released with its concept of CCX, we did not immediately return to saying Ryzen had multiple CPUs per Socket. We further delineated that a CPU may have multiple Cores, and multiple cores may reside on a Core Complex, and a CPU may have multiple Core Complexes.

If CDNA2 ends up being multiple GPU Complexes attached via PCIe, then I think calling it 2 GPUs on one video card is accurate still, just like with SLI and Crossfire. However, as AMD alludes to having a high bandwidth die interconnect (reminding me a bit of SRQ with AMD's first dual core CPU), and if the Memory footprint is Unified, then perhaps a new intermediary term like GPU Complex would begin to be more appropriate in this area.
 
Reactions: wanderica and Tlh97

jpiniero

Lifer
Oct 1, 2010
14,841
5,456
136
If CDNA2 ends up being multiple GPU Complexes attached via PCIe, then I think calling it 2 GPUs on one video card is accurate still, just like with SLI and Crossfire. However, as AMD alludes to having a high bandwidth die interconnect (reminding me a bit of SRQ with AMD's first dual core CPU), and if the Memory footprint is Unified, then perhaps a new intermediary term like GPU Complex would begin to be more appropriate in this area.

I thought AMD confirmed it is being exposed as 2 GPUs.
 

thecoolnessrune

Diamond Member
Jun 8, 2005
9,673
580
126
I thought AMD confirmed it is being exposed as 2 GPUs.
If anyone knows, I'd like to know that too! From AMD's presentation, they refer to the OAM module as a "Multi-Die GPU" (singular, noted on the slide as well as by the speaker). But that could still mean that the individual dies of the GPU are exposed to the workloads. I haven't had a lot of time to look for any insights journalists have gotten further out of AMD, so if anyone has them, I'd welcome seeing them!

 
Reactions: Tlh97 and Krteq

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
I feel really uncomfortable with calling the CDNA family "GPUs" at this point. For compute being multi-die multi-socket etc. is no issue. For graphic workloads it is since there needs to be a last step assembling all the different computed results in one graphic frame in some way. The more disintegrated this happens the more effort (especially bandwidth and low latency) is needed to assemble the final picture back. That's why SLI and CF are as inefficient as they are and lost popularity even though DX12 technically even enabled mixed setups. For pure compute this major bottleneck of targeting an unified picture just doesn't apply.
 
Reactions: Tlh97 and Zepp

andermans

Member
Sep 11, 2020
151
153
76
I feel really uncomfortable with calling the CDNA family "GPUs" at this point. For compute being multi-die multi-socket etc. is no issue. For graphic workloads it is since there needs to be a last step assembling all the different computed results in one graphic frame in some way. The more disintegrated this happens the more effort (especially bandwidth and low latency) is needed to assemble the final picture back. That's why SLI and CF are as inefficient as they are and lost popularity even though DX12 technically even enabled mixed setups. For pure compute this major bottleneck of targeting an unified picture just doesn't apply.

I think it is safe to say they aren't GPUs. MI100 didn't have any rasterizer hardware so if you want to do graphics you have to do it all in compute. Suspect Mi200 doesn't have rasterizer HW either.
 
Reactions: Tlh97 and Leeea

Frenetic Pony

Senior member
May 1, 2012
218
179
116
I'm gonna play "try to predict the specs!"

3 dies. 96, 64, and 32 work group processors. They've already got multiples of 16 going for series X and the Rx6600(s).

600 watt tdp/OAM only: 2 96wgp dies, 512mb SRAM, 4 HBM3 stacks, 64-96gb of ram, 2.5ghz. I imagine studios that use the new micro LED stages (like The Mandalorian) would love to render on a big farm of these.

500 watts tdp/OAM only: 2 82wgp dies (cut down 96 dies), otherwise same as above.

450 watts tdp/PCI Express 5.0: 2 80wgp (cut down 96, you can disable an entire shader engine or cut down a die even more, whichever!), 512-384mb SRAM, 24gb of ram, 384bit GDDR6 bus @20-24ghz, 2.5ghz. The big consumer card.

350 watts/PCI Express 4.0/5.0: 2 64wgp dies, 256mb SRAM, 16gb of ram, 256bit GDDR6@20-24ghz, 2.5ghz

300 watts (etc.): 1 96wgp die, 256mb SRAM, 16gb of ram, 256bit GDDR6@slow than the above, 2.75ghz.

250 watts: 1 80/82wgp die, 192mb SRAM, 16gb, 256bit GDDR6@slowish, 2.75ghz. Kinda a better 6900xt for a lot cheaper.

200 watts: 1 64wgp die, 192mb SRAM, 12gb, 192bit GDRR6@20ghz+, 2.75-2.9ghz. Mid range big seller.

175 watts: 1 56wgp die, 192-128mb SRAM , same as above but even cheaper!

125 watts: 1 32wgp monolithic die (other than SRAM), 64-128mb SRAM. 128bit GDDR6@20ghz+, 2.9ghz. Low end best seller, better than 6600xt all around for the same price or lower! (Chip shortage is expected to ease next year, and competition will be high).

100 watts: 28wgp, same as above just cut down, $250 (yay competition, I hope).

There, that was fun.
 
May 17, 2020
123
233
116
There is patent from AMD about stacked die related on machine learning and GPU : https://www.freepatentsonline.com/20210374607.pdf

In the two diagrams posted within the patents, it is stated that the APD die is both a memory and machine learning accelerator die which includes memory, machine learning accelerators, memory interconnects, inter-die interconnects, and controllers. The memory within the APD die can be used as both, a cache for the APD core die or can be utilized directly by the operations performed on the machine learning accelerators such as Matrix Multiplication operations.
 

Attachments

  • AMD-RDNA-3-Gaming-GPUs-With-APD-Accelerator-Core-_1-1001x1480.png
    156.5 KB · Views: 29

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136
There is patent from AMD about stacked die related on machine learning and GPU : https://www.freepatentsonline.com/20210374607.pdf
Seems Greymon55 also saying that N3x will have ML die

Let's rekindle this thread, Computex is only 4 months away.
And of course the multi die GPU chiplets has been discussed at length and I will not repeat.

Many new patents from AMD for RT were published since the well known 11200724 : Texture processor based ray tracing acceleration method and system (filed in 2017), which we know was used in RDNA 2/XSX/PS5

These are the new patents since then
10692271 Robust ray-triangle intersection

10706609 Efficient data path for ray triangle intersection

20200380761 COMMAND PROCESSOR BASED MULTI DISPATCH SCHEDULER

10930050 Mechanism for supporting discard functionality in a ray tracing context

20210209832 BOUNDING VOLUME HIERARCHY TRAVERSAL
Possible HW BVH Traversal

20210287421 RAY-TRACING MULTI-SAMPLE ANTI-ALIASING

20210287422 PARTIALLY RESIDENT BOUNDING VOLUME HIERARCHY

11158112 Bounding volume hierarchy generation
HW Assisted BVH structure generation

20210304484 BOUNDING VOLUME HIERARCHY COMPRESSION
BVH Compression

20210407175 EARLY CULLING FOR RAY TRACING

20210407176 EARLY TERMINATION OF BOUNDING VOLUME HIERARCHY TRAVERSAL

AMD is going all in on RT

But my most interesting part is the HW assisted convolution.

20200184002 HARDWARE ACCELERATED CONVOLUTION
This coupled with the ML chiplet (which is actually Cache+ML chiplet in one) could actually be the target implementation for Gaming Super Resolution that we discussed here before
New Patent came up today for AMD's FSR




20210150669
GAMING SUPER RESOLUTION

Abstract
A processing device is provided which includes memory and a processor. The processor is configured to receive an input image having a first resolution, generate linear down-sampled versions of the input image by down-sampling the input image via a linear upscaling network and generate non-linear down-sampled versions of the input image by down-sampling the input image via a non-linear upscaling network. The processor is also configured to convert the down-sampled versions of the input image into pixels of an output image having a second resolution higher than the first resolution and provide the output image for display














It uses Inferencing for upscaling. As will all ML models, how you assemble the layers, what kind of parameters you choose, which activation functions you choose etc, matters a lot, and the difference could be night and day in accuracy, performance and memory

View attachment 44667

View attachment 44666
CNN are the most suited for performing ML with images and having that ML in the chiplet doesn't sound like a terrible idea.
One thing being that AMD could perform the upscaling without using motion vectors and utilizing the image at the end of the rendering pipeline, the output of which could be kept at the L3/chiplet die.

Another thing they mentioned in another patent is that they can extract motion vectors from pixel activity from multiple images instead of relying on games engine to provide the motion vectors
This means they can stick the upscaler in something like RSR without game engine integration.

The ML+Cache chiplet could actually be 3D IFC which could run on a different clock domain than the CUs.
And being specially optimized for convolution could mean that they are better for ML inferencing with images than general purpose matrix units
This could be the real Gaming Super Resolution, basically Radeon Super Resolution taken to the next level requiring no game integration at all.
/speculation
One hint, patents which are continuations or taking advantages of provisional patents are good candidates for being actual product technology

Linux merge window for 5.18 is coming in two months, so watch out for that

Infinity cache is like tailor made for BVH transversal. Similarly ML units sitting so close to the Cache is like PIM.
They just need to glue them well. FIngers crossed
 
Last edited:

CakeMonster

Golden Member
Nov 22, 2012
1,428
535
136
I didn't understand half of that, but I'm very skeptical of upscaling that is not multi-frame. No matter how cheap, all we've seen so far is artifacts and failing to match the quality of newer DLSS versions. After all, what these techniques set out to do is AA cheaply, and only DLSS delivers somewhat satisfactory on that. FSR and similar techniques are surely a lot better than nothing, don't get me wrong, but hardware support for stuff that is not multi-frame seems like a waste to me.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,063
7,489
136
Regardless of what new or exciting technology makes it into these cards, I think we're all just hoping that they'll be available without having to spend $$$$ just to be able to get one.

-I'd just be happy if they make this gen of cards cheap, used or otherwise.

Even a 6600xt is a huge leap in performance from my 980ti, if I could get something like a 67/6800xt for $400 I'd again be set for the next 6 years.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |