Question Speculation: RDNA2 + CDNA Architectures thread

Page 15 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Do we know for sure that it is a 96 CU config? I thought the most likely rumour was 80 CU?

No we do not know for sure. But we can make an estimate based on the Xbox Series X SoC die size of 360 sq mm. Based on my calculation the 56CU RDNA2 GPU with 320 bit GDDR6 memory controller takes up <= 300 sqmm of die space. I extrapolated how many CUs AMD can fit in 505 sq mm and 96 CU is the answer based on my calculations.
 

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
Given that PS5 has to work in a power and form factor constrained console with the ability to work in varying ambient temperatures from cold (25 celsius) to warm (35+ celsius)
Where do you live that 25C is considered cold??!!

For me anything below 10C is cold, and 35+C is me waiting to die cooking in my own juices.

Also, I think you have overestimated density of the N2X series based on XSX's chip, which is running at a significantly lower max clock than PS5 and likely N2X as well.

For a start, neither PS5 nor XSX exactly match the coming off the shelf RDNA2 GPU uArch, they may even diverge significantly from it (RPM less Vega in XB1X, and RPM + Polaris in PS4 Pro).

To accommodate higher clocks I think AMD may well have spread out the design a bit to dissipate the heat more effectively, this has been AMD's strategy for a while now (Vega 64?) with its PC GPU's.
 

CastleBravo

Member
Dec 6, 2019
119
271
96
Do we know for sure that it is a 96 CU config? I thought the most likely rumour was 80 CU?

Latest leak from MLID says 427mm^2, 72 CU, 2.15Ghz boost, 40-50% more performance than 2080ti. Info came from an Nvidia leaker, not AMD, so this is supposedly what Nvidia is trying to beat with RTX 3k.

 
Last edited:

Konan

Senior member
Jul 28, 2017
360
291
106
Latest leak from MLID says 427mm^2, 72 CU, 2.15Ghz boost, 40-50% more performance than 2080ti. Info came from an Nvidia leaker, not AMD, so this is supposedly what Nvidia is trying to beat with RTX 3k.


wccftech got the same info

300W card

Also if it is 427mm2 with just 72CU where does the hardware accelerated RT/Tensor space equivalents go? (considering the % space Nvidia allocated wondering what AMD's solution is there)
That thing looks like it's very power hungry.... if true...

Unsure of this but MLiD says only partially vetted.
 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,509
7,766
136
Hmm, 2.05 GHz game clock with 72 CUs gives roughly 19 TFLOPS of compute or about 40% more than what a 2080 Ti would give you, so if it's already 40% faster than a 2080 Ti, then it's got IPC parity with Turing at least.

I'll believe it when I see it but 2x36 CUs seems like such an odd configuration unless it's really 2x40 CU but cut down. If they somehow managed to fit 80 CUs in a die only 427mm2 while still finding a way to increase max clocks to 2 GHz and keeping TDP at 300W, that's super impressive. I almost want it to be a 500mm2 die because 427mm2 seems a little too dense and won't be able to OC that much higher.
 
Reactions: Tlh97 and Konan

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
I'll believe it when I see it but 2x36 CUs seems like such an odd configuration unless it's really 2x40 CU but cut down.
Think PS5 x2 - this matches to PS4 Pro being PS4 x2 (GPU), I would not be surprised to find that 'PS5 Pro' is in development.

Also the cluster terminology makes me think of Zen - if the rumour turns out to be true this could be a prelude to RDNA chiplets.
 
Reactions: Tlh97 and Saylick

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Where do you live that 25C is considered cold??!!

For me anything below 10C is cold, and 35+C is me waiting to die cooking in my own juices.

My home country is India even though I now live in CA, US. I have lived enough in India to tell you that in summers the room temperatures easily go higher than 30C and in some cities even 35C. Peak summer day temperatures hit 40-45c in Southern India and upto 50C in Northern India. Whats even common is there are lots of people who do not have any airconditioners. So there is definitely a case that a consumer electronic device sold to be used in many countries around the world with different climate should function reliably at 30-35c room temperatures.

Also, I think you have overestimated density of the N2X series based on XSX's chip, which is running at a significantly lower max clock than PS5 and likely N2X as well.

For a start, neither PS5 nor XSX exactly match the coming off the shelf RDNA2 GPU uArch, they may even diverge significantly from it (RPM less Vega in XB1X, and RPM + Polaris in PS4 Pro).

To accommodate higher clocks I think AMD may well have spread out the design a bit to dissipate the heat more effectively, this has been AMD's strategy for a while now (Vega 64?) with its PC GPU's.

You seem to not realize that AMD's IP design for both CPUs and GPUs is highly modular and reusable. The Zen 2 CPU core is designed using a relaxed metal stack with 57nm M1 pitch which is the same as CPP (Contacted poly pitch). The CPU core used in the desktop, server, notebook and game console chips is the same. What differs is the part of the voltage/freq curve each specific SoC is optimized for and that could include certain transistor level optimizations. But thats all about it. The CPU core in itself from a physical design and area point of view is the same. This holds true for the GPU IP too used in Navi desktop/notebook GPUs. So the GPU in PS5 and Xbox Series X is custom RDNA2 but the basic GPU IP blocks are developed once and used across various SoCs. The transistor level optimizations could be different for game console GPU vs PC desktop GPU but the actual physical design is not different. I am quite confident about this based on information I have access to.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
Think PS5 x2 - this matches to PS4 Pro being PS4 x2 (GPU), I would not be surprised to find that 'PS5 Pro' is in development.

Also the cluster terminology makes me think of Zen - if the rumour turns out to be true this could be a prelude to RDNA chiplets.
What makes you think 36CUs is the full die for the PS5?

Note: You should probably look at the diagram for Navi10. 36CUs max is not a sensible config.
 
Reactions: Tlh97 and Glo.

FaaR

Golden Member
Dec 28, 2007
1,056
412
136
Also if it is 427mm2 with just 72CU where does the hardware accelerated RT/Tensor space equivalents go?
AMD does RT in the shader processors themselves, and not parceled out onto separate units like in at least the current NV RTX series. There's been no suggestion AMD's next gen has anything similar to tensor units either. The makers of next-gen AMD-powered games consoles certainly haven't made any such announcements, and you'd think AI acceleration would have been something they would have liked to reveal, to create even more buzz and excitement.
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
AMD does RT in the shader processors themselves, and not parceled out onto separate units like in at least the current NV RTX series. There's been no suggestion AMD's next gen has anything similar to tensor units either. The makers of next-gen AMD-powered games consoles certainly haven't made any such announcements, and you'd think AI acceleration would have been something they would have liked to reveal, to create even more buzz and excitement.
In the TMUs, but it still requires specific functionality being added to the TMUs to allow for RTRT.

Tensors aren't coming to RDNA2 though yeah. Not that I think they need to tbh. We'll have to see how things go but Tensors on consumer products are only as useful as the number of games that support DLSS2.0
 

FaaR

Golden Member
Dec 28, 2007
1,056
412
136
Tensors on consumer products are only as useful as the number of games that support DLSS2.0
Doesn't running the tensor units on current RTX mean stopping the rest of the GPU entirely?

So there's a built-in and unavoidable performance hit there. Maybe NV can engineer around that issue in the next gen of chips. This upcoming gen is looking much more interesting than in a loong time. 2080 series was kind of meh (1080 on steroids basically) and AMD hasn't been doing well in the high end for absolutely ages now. Fury X and Vega/Radeon VII were all very meh on the whole, and had bad power/performance ratio. And Polaris/navi 10 weren't high end chips to begin with.

But I'm hoping for some decent big gun competition this time around. The fact AMD hasn't been doing any deceptive early hyping like with Vega in particular speaks in favor of that I would like to think!
 
Reactions: Tlh97

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
Tensors on consumer products are only as useful as the number of games that support DLSS2.0
For now yes - but I wouldn't expect that to remain so in the future, given all the research I have seen going into ML augmented graphics and other games related fields.

Maybe not as soon as RDNA3, but certainly I think it's possible that we might see ML hardware from the CDNA+ design teams filter down to RDNA4+.

It would be shortsighted of Wang and Papermaster not to at least be keeping such a strategy on the backburner when nVidia already have tensors integrated into gaming products.

Not to mention that however good the ray intersection HW is in either RDNA2+ or Turing/Ampere they will always need denoising for the MC noise that comes with running RT at low SPP for real time speeds, for which tensor/ML HW is well suited for.

I'm assuming that for now RDNA2 is doing this denoising stage in a compute shader, but power wise that is certainly not optimal vs dedicated ML/tensors if you are doing it in any mobile scenarios.
 
Last edited:

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Doesn't running the tensor units on current RTX mean stopping the rest of the GPU entirely?

So there's a built-in and unavoidable performance hit there. Maybe NV can engineer around that issue in the next gen of chips. This upcoming gen is looking much more interesting than in a loong time. 2080 series was kind of meh (1080 on steroids basically) and AMD hasn't been doing well in the high end for absolutely ages now. Fury X and Vega/Radeon VII were all very meh on the whole, and had bad power/performance ratio. And Polaris/navi 10 weren't high end chips to begin with.

But I'm hoping for some decent big gun competition this time around. The fact AMD hasn't been doing any deceptive early hyping like with Vega in particular speaks in favor of that I would like to think!

Not sure if it stops it entirely. But it certainly halts the work of anything that is waiting on the RT calculations. So when a frame is rendered, the ray tracing then needs to be applied. That frame will wait for those calculations to finish. This is why RT on lower end cards like the 2060 is basically worthless. There isn't enough RT hardware for the rasterizing side to run at full speed. If the 2060 had the same number of RT cores as a 2080, it would be a very different story.

I think this is where AMD may come out ahead with RT being on each shader. More shaders, more RT performance. And you won't get this weird imbalanced setup like Turing has.
 
Reactions: Tlh97 and FaaR

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
It exactly matches PS4 Pro and is a 2x multiple of PS4 - that makes backwards compatibility a bit simpler to say the least.
I think the point is that makes 36CU for the whole die very unlikely. The console dies all have redundant logic like CUs for improving yield, so the full die more likely has something like 40CUs, of which only the best 36CUs or so are enabled.
 
Reactions: Tlh97 and Mopetar

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
I think the point is that makes 36CU for the whole die very unlikely. The console dies all have redundant logic like CUs for improving yield, so the full die more likely has something like 40CUs, of which only the best 36CUs or so are enabled.
Ah, fair enough.
 

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
I think this is where AMD may come out ahead with RT being on each shader. More shaders, more RT performance. And you won't get this weird imbalanced setup like Turing has.
Considering the quoted intersection performance of XSX, I can only imagine how good Big Navi is in this regard.

Waits patiently for offline PT renderer that uses it.......
 

Veradun

Senior member
Jul 29, 2016
564
780
136
No we do not know for sure. But we can make an estimate based on the Xbox Series X SoC die size of 360 sq mm. Based on my calculation the 56CU RDNA2 GPU with 320 bit GDDR6 memory controller takes up
For what it's worth my calculations with the ballparkmeter went around 100CU for 505mmq even before xbx soc was revealed.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
Considering the quoted intersection performance of XSX, I can only imagine how good Big Navi is in this regard.

Waits patiently for offline PT renderer that uses it.......
Radeon ProRender 2.0 has a closed library which is related to the RT IP implemented in RDNA2.
Additionally they have procedurally generated textures for their material library and also AI/ML denoiser.






 
Last edited:
Reactions: Tlh97

soresu

Diamond Member
Dec 19, 2014
3,206
2,474
136
Interesting that HBCC is only mentioned in the context of Vega, not Vega onwards - I wonder if it is in RDNA at all, and if not whether it is in CDNA either.

As a note, I have tested RPR 2.0 experimentally, I can't say I was impressed vs Arnold running on CPU only - though I guess I'm only running a runty RX 580.

Albeit I'd wager that AMD are not giving RPR even a fraction of the developer/engineer time for optimisation that Autodesk is putting into Arnold.

Given this slow performance I don't think it likely that any commercial renderers like Arnold and Renderman will make the effort to port to a Radeon Rays based backend, much to my eternal disappointment.

Edit: Perhaps HBCC might better explain AMD's holding on to Vega in APU's even as far as Cezanne.

APU's are often memory constrained in graphics loads, so it doesn't seem like such a stretch that they might be saving a uArch upgrade for them until they have improved and ported HBCC to RDNA.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |