Question Speculation: RDNA2 + CDNA Architectures thread

Page 123 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,404
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

Mopetar

Diamond Member
Jan 31, 2011
8,004
6,445
136
We are talking about Raja's RTG, or David Wang's/Lisa Su's RTG?

Its not the same RTG anymore, guys.

The predecessor doing a terrible job doesn't say anything about the current group. It's hard to imagine they could be worse, but if you're rolling dice having the last toss come up as a 1 doesn't mean the next must come up a 6.

Why do you have any good reason to believe that the team could make a really accurate prediction about a hypothetical new approach to the way GPUs are designed will turn out when they don't have a track record for you to trust?

You could still be correct, but your reasons for believing what you believe aren't good. I think what you believe is more likely than what I've proposed, but a lot of what you're saying is bad reasoning unless you've got some information about the team behind these cards that you're not sharing.

Chip developers use simulations to avoid getting into exactly such situations. The only way it then can still happen is bad management (Koduri) or over-promised capabilities for the node used (Ampere on Samsung). AMD's track record with TSMC is pretty flawless so far, and RDNA2 isn't even on a new node, so simulations should have become even better and not worse than with RDNA1.

I'm not sure how well you can simulate a pretty radical change like a including a big cache and a smaller bus. I suppose you could use FPGAs to model the behavior, but this isn't just tweaking the design in small ways on an established process. The proposed inclusion of what's being dubbed the infinity cache and the reduction in bus width is a major architectural shift.
 
Reactions: lightmanek

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Also.

If Navi 21 has 128 ROPs with 256 bit GDDR6 memory bus, wouldn't it be logical if Navi 22 with 40CUs and 192 bit bus to have 96 ROPs, and Navi 23 with 32 CUs and 128 bit bus to have 64 ROPs?
 
Reactions: Tlh97 and Mopetar

Mopetar

Diamond Member
Jan 31, 2011
8,004
6,445
136
I am willing to bed that they only have 2 SKUs per die. As others have said 7 nm is very mature at this point and having 2 parts will give a 90+% success rate. You have to realize that the cost of a Big Navi die is much higher, even if you bin. AMD would love to sell tiny dies all day long, so you can bet that they will only use large dies where they need it competitively. All the mainstream stuff will be on smaller chips.

A big die can have a lot more reasons to fail, which makes a larger number of products more likely, particularly because throwing out a big die is a lot more costly than a chiplet.

Even if AMD gets a lot of fully functional dies, not all will be able to hit some arbitrary clock speed. There's probably two bins at the top for that reason alone.

Depending on the defect density there's a least one cutdown bin. The question is whether it's a single catch all or if there are enough defective dies to segment the cut dies into different bins. I'm thinking we see a 72 CU bin just to allow for some ability to ensure a particular clock speed. A 64 CU bin for any chips with a lot of defects or defects that took out part of the chip that wouldn't allow for the full memory bus to be used.

If there's a shot of them matching or even beating the 3090 in some way we definitely see that kind of bin, even if it's only a top 10% situation. I also think we see more cards from big Navi just because a lot of the rumors have a big gap between the biggest die and the next step down. There's just too much room in between for them not target at least three different performance levels.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
The counter argument to Renoir is that "harvesting" a tiny, cheaper die is more cost effective than disabling 30% of the shader array from a 500mm2+ chip. That being said, I agree with you. Without knowing the economies of scale, I would expect AMD to cover the gap between 40 and 80CU with defective (or just cut down) big chips, instead of developing a middle sized solution from scrath. with all the costs and market headaches this involves.

The binning process is designed to maximize the usage of dies from a wafer. So whatever SKU strategy AMD adopts for Navi 21 will be done to maximize the total product revenue from a wafer of Navi 21 chips.

Also.

If Navi 21 has 128 ROPs with 256 bit GDDR6 memory bus, wouldn't it be logical if Navi 22 with 40CUs and 192 bit bus to have 96 ROPs, and Navi 23 with 32 CUs and 128 bit bus to have 64 ROPs?


RDNA already moved the RBE/ROPs into the Shader Array and each Shader Engine consisted of 2 Shader Arrays. So ROP count scaled with Shader array and not with memory bus width like in previous architecture like Polaris. In RDNA each Shader Array had 4 RBE with 4 ROPs per RBE - 16 ROPs. So the ROP count scaled with Shader Array count. Navi 10 had 2 SE/4SA . So 4 x 4 x 4 = 64 ROPs. RDNA2 made 2 changes.

1.) Halved the number of RBE per Shader Array from 4 to 2
2.) Doubled the number of ROPs per RBE from 4 to 8

So effective ROP throughput per Shader Array stayed the same from RDNA to RDNA2. But RDNA2 scaled to double the Shader engine / Shader Array count of RDNA.

N21 - 4 SE/8 SA = 8 x 2 x 8 = 128 ROPs
N22 - 2 SE/4 SA = 4 x 2 x 8 = 64 ROPs
N23 - 2 SE/4 SA = 4 x 2 X 8 = 64 ROPs

Incidentally Ampere moved the ROP/RBE into the GPC. So they have 2 RBE with 8 ROPs each per GPC. GA102 = 7 x 2 x 8 = 112 ROPs


Look at the Navi 2x table for specs. The front end of the rasterizer (scan converter) has doubled the output compared to Navi 1x .


num_packer_per_sc = 2 for Navi 1x
num_packer_per_sc = 4 for Navi 2x

All RDNA and RDNA2 GPUs have 1 scan converter per Shader Array but RDNA2 has twice the packer per scan converter as RDNA.
num_sc_per_sh = 1
 
Last edited:

Vope45

Member
Oct 4, 2020
114
168
86
The only GPU arch that was designed, possibly, or rather there was groundwork landed from Koduri was Navi 1.

To the team designing RDNA2 there was plenty of Zen Engineers added, for optimizing the physical design(which is pretty key aspect of ANY ARCHITECTURE) long after Koduri left AMD.

Do not mistake RDNA1 for RDNA2. Raja could've had a hand in RDNA1. There is a large chance that RDNA2 is vastly different, than what he himself knew.

And here, we are talking only about consumer GPUs. Knowing Raja, he builds more compute focused GPUs(Vega, GCN, DGX from Intel).

Judging from RDNA 2 density their physical design team still has a lot work to do.
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,140
136
Judging from RDNA 2 density their physical design team still has a lot work to do.
Not sure what you mean here. Going for the highest density isn't the end all, be all of silicon design. Density and clocks are generally inversely related, and judging by how highly clocked RDNA 2 can be, it makes sense that the density isn't going up much further.
 

Vope45

Member
Oct 4, 2020
114
168
86
And judging from the clocks of those GPUs?

Yes. It's nothing compare to the jump from Kepler to Maxwell to Pascal. Ampere even with so much more added compute capability still clocks quite high on Samsung 8nm.

It all boils down to R&D.
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,860
3,407
136
Yes. It's nothing compare from the jumped from Kepler to Maxwell to Pascal. Ampere even with so much more added compute capability still clocks quite high on Samsung 8nm.

It all boils down to R&D.
this is nothing hand waving, what in all of ampere's "added compute capabilities" is a limiting factor on clocks given they have almost gone nowhere?
 

Vope45

Member
Oct 4, 2020
114
168
86
this is nothing hand waving, what in all of ampere's "added compute capabilities" is a limiting factor on clocks given they have almost gone nowhere?

AMD from 2013 till 2018 was compute focus per Raja's quote, Nvidia on the other hand focus solely on rasterization with much higher clock. RDNA was stripped down all unnecessary compute function and suddenly it clocks sky high compare to Vega. At the same time, Nvidia was pushing ray tracing which is compute heavy and suddenly Turing clocks about the same as Pascal. Same with Turing.

How do you explain that ?
 

Vope45

Member
Oct 4, 2020
114
168
86
Not sure what you mean here. Going for the highest density isn't the end all, be all of silicon design. Density and clocks are generally inversely related, and judging by how highly clocked RDNA 2 can be, it makes sense that the density isn't going up much further.

What I meant is when compared to similar Nvidia products, AMD physical design team still has a lot of work to do. I still remember when AMD tried to cut cost when they use software for such task, this was 2 or 3 years before Bulldozer.

Also, getting smaller and efficient is now the end all be all with the difficulties and cost going way up.
 

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,404
146
AMD from 2013 till 2018 was compute focus per Raja's quote, Nvidia on the other hand focus solely on rasterization with much higher clock. RDNA was stripped down all unnecessary compute function and suddenly it clocks sky high compare to Vega. At the same time, Nvidia was pushing ray tracing which is compute heavy and suddenly Turing clocks about the same as Pascal. Same with Turing.

How do you explain that ?

RDNA1 didn't clock sky high compared to Vega on the same node though? Look at Radeon VII vs 5700XT. Unless you think a 10-15% lead is sky high
 

Vope45

Member
Oct 4, 2020
114
168
86
RDNA1 didn't clock sky high compared to Vega on the same node though? Look at Radeon VII vs 5700XT. Unless you think a 10-15% lead is sky high

You are right, I meant Vega 20. My 5700xt boosts to 2.05Ghz, thats sky high compared to 1.75Ghz on Vega 20. Both are undervolted btw.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Yes. It's nothing compare to the jump from Kepler to Maxwell to Pascal. Ampere even with so much more added compute capability still clocks quite high on Samsung 8nm.

It all boils down to R&D.
This is top - down BS.

First of all. Maxwell - Pascal jump was on a new node.

RDNA2 is a jump in clock speeds, on the same node.

Nvidia were not able to clock their GPUs higher, specifically BECAUSE they added all of that Compute capability into gaming architecture. Remember, 3080 consumes 320W of power, and 3090 - 350. USING BRAND NEW NODE.

AMD will still be within reasonable 250-280W power draw range, while clocking largest GPU to 2.2 GHz. ON THE SAME NODE as previous generation architecture!

Dude. Its all due to physical design. Stop moving the goalposts.
this is nothing hand waving, what in all of ampere's "added compute capabilities" is a limiting factor on clocks given they have almost gone nowhere?
Those compute capabilities cause excessive power draw, which disallowes Nvidia to clock their GPUs higher.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |