Question Speculation: RDNA2 + CDNA Architectures thread

Page 156 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 
Apr 30, 2020
68
170
76
You guys realize that TSMC has been given little credit for the success of AMD products in the last few years. I think TSMC should be given much of the credit for their silicon.
What do you mean? Pretty much everyone acknowledges and even rants/raves about TSMC's 7nm process being superior to just about anyone else's right now. But you almost must keep in mind that 7nm isn't a magic problem solver. Look at Radeon Vega VII vs the 5700XT. The Vega VII is only 5% faster despite having 50% more CUs and consuming significantly more power. Both are on the same 7nm process. Now Navi2 is pushing that efficiency even further. That's AMD's design work there, iterating and improving.
 

Zoal

Junior Member
Oct 25, 2020
5
4
41
Consoles don't need it.

PS5 35 CU with 256 bus is enough.

SX with 52 CU and 320 bus is enough.

Consoles share the memory so there is less transfers and duplication like on PC, they need less raw bandwidth, relative.
Not to mention they're trying to speed up frame processing by leveraging the low-latency SSD subsystem
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Expect ray tracing performance to be slightly below expectations since most current games with ray tracing implementations aren't optimized to take advantage of DXR 1.1 or inline ray tracing ...

Ray tracing performance will improve as upcoming titles will start integrate this feature next year so I'm not sure if we should judge RDNA2's ray tracing performance so soon thereafter it's launch based on the current set of games ...
 

dzoni2k2

Member
Sep 30, 2009
153
198
116
Expect ray tracing performance to be slightly below expectations since most current games with ray tracing implementations aren't optimized to take advantage of DXR 1.1 or inline ray tracing ...

Ray tracing performance will improve as upcoming titles will start integrate this feature next year so I'm not sure if we should judge RDNA2's ray tracing performance so soon thereafter it's launch based on the current set of games ...

Of course. Let's not forget RT ran like dogsh* on Turing at first. BF5 was literally unplayable.
 

BlitzWulf

Member
Mar 3, 2016
165
73
101

Welp seems 3080ti is incoming now, that looks like almost everything from NV ,60ti,rumored cutdown ga102 "70ti" and now a supposed 78sm 80ti.

It looks like Jensen is willing to paper launch the entire stack from top to bottom at once,but to what end?
Does Nvidia think that customers will wait until next year to get cards based on mindshare and dlss?
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Do keep in mind that 128MB of cache is a huge chunk of silicon with an accompanying massive cost. Standard SRAM uses 6 transistors per bit, so 128MB of "LLC" in a GPU would be ~6.4 Billion transistors. 6.4 Billion transistors just for the LLC, never mind any other caches that may be there. The Radeon R9 290X's Hawaii core has less transistors total than just Navi2's last level cache. That is nuts!
It is nuts! But the alternarives are worse.

A 384-bit bus already is ~96mm2, at very high defect sensity... way more than SRAM...

HBM2 is costly, a gddr6x seems like a stretch (insane hot, palm4)
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,423
2,914
136
It is nuts! But the alternarives are worse.

A 384-bit bus already is ~96mm2, at very high defect sensity... way more than SRAM...

HBM2 is costly, a gddr6x seems like a stretch (insane hot, palm4)
You still have a 256-bit bus, so ~32mm2 more for additional 128bit is hardly a worse alternative than using up another >100mm2 just for extra cache.
 
Reactions: Olikan

leoneazzurro

Golden Member
Jul 26, 2016
1,005
1,599
136
You have to add costs for increased PCB complexity and additional RAM chips - also if you want to increase bandwidth even more you'll need more exotic RAM types (HBM, etc). Cache cost will instead decrease rapidly with new nodes (bus controllers will not as well). This may also be a necessity for multi-die scalable GPUs, and it wil definitely give a plus point to MAD in the case of mobile parts (reduced bus size at same performance point).
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,058
7,478
136
Is it possible to do unequal transistor densities across a die (maybe it's actually normal)? As in, can AMD throw in a huge cache but pattern that portion of the die to be extremely dense (60Mtr) while the remainder of the die that actually houses logic is less dense (40Mtr)?

Or does that already happen and the 40Mtr density that's thrown around an average of the entire die with large variations in density between different portions of the chip?
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,423
2,914
136
You have to add costs for increased PCB complexity and additional RAM chips - also if you want to increase bandwidth even more you'll need more exotic RAM types (HBM, etc). Cache cost will instead decrease rapidly with new nodes (bus controllers will not as well). This may also be a necessity for multi-die scalable GPUs, and it wil definitely give a plus point to MAD in the case of mobile parts (reduced bus size at same performance point).
That's true, but a big 128MB cache in RDNA2 doesn't increase only the die size and cost per die, but also lowers the number of GPUs made from one waffer and It's not like AMD has an infinite capacity to It's disposal. So I have to question If adding a bigger cache is a better option than a bigger memory controller.
BTW I am interested in price difference between let's say 1GB vs 2GB GDDR6 chips.
 

Gideon

Golden Member
Nov 27, 2007
1,709
3,927
136
Is it possible to do unequal transistor densities across a die (maybe it's actually normal)? As in, can AMD throw in a huge cache but pattern that portion of the die to be extremely dense (60Mtr) while the remainder of the die that actually houses logic is less dense (40Mtr)?

Or does that already happen and the 40Mtr density that's thrown around an average of the entire die with large variations in density between different portions of the chip?
They could also use on-die eDRAM (what Ibm uses for L3 cache on it's CPUs). Or it could be a separate die alltogether. Interesting times!
 

Leadbox

Senior member
Oct 25, 2010
744
63
91
That's true, but a big 128MB cache in RDNA2 doesn't increase only the die size and cost per die, but also lowers the number of GPUs made from one waffer and It's not like AMD has an infinite capacity to It's disposal. So I have to question If adding a bigger cache is a better option than a bigger memory controller.
BTW I am interested in price difference between let's say 1GB vs 2GB GDDR6 chips.
What's more likely, a defective chunk of memory controller or a defective cache? Maybe the additional costs of going cache route are offset by better yields? Either way, I'm pretty sure they ran the numbers, this change would have have been in the works long enough for them to know if it's worthwhile.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,423
2,914
136
What's more likely, a defective chunk of memory controller or a defective cache? Maybe the additional costs of going cache route are offset by better yields? Either way, I'm pretty sure they ran the numbers, this change would have have been in the works long enough for them to know if it's worthwhile.
Production cost of the whole card could be comparable, but If that cache increases the whole size by let's say 20%, then you have at least 20% less GPUs from waffer, so for example instead of 250 GPUs you could sell, you will have only 200 to sell. That's a lot of money, but as you said AMD should have made the calculations to know what is better for them.
 

Panino Manino

Senior member
Jan 28, 2017
846
1,061
136
Consoles don't need it.

PS5 35 CU with 256 bus is enough.

SX with 52 CU and 320 bus is enough.

Consoles share the memory so there is less transfers and duplication like on PC, they need less raw bandwidth, relative.

But the IF don't need to be 128MB big, right?
They way Cerny talked some seems to believe that there's a big amount of cache in the IO "die".
 

Mopetar

Diamond Member
Jan 31, 2011
8,004
6,446
136
You guys realize that TSMC has been given little credit for the success of AMD products in the last few years. I think TSMC should be given much of the credit for their silicon.

I'm not sure if that's the case. Go look back at older threads when AMD was doing a lot of their manufacturing at Global Foundries and you'll find plenty of comments talking about how it was holding them back and that they needed to get more production over to TSMC. There are also a lot of comments that suggest Ampere would have been a lot better if NVidia had used TSMC instead of Samsung. I think there's plenty of recognition for TSMC being the best fab around right now and there's so little debate about it, particularly with all of Intel's troubles the last few years, because it just goes without saying.

What's more likely, a defective chunk of memory controller or a defective cache? Maybe the additional costs of going cache route are offset by better yields? Either way, I'm pretty sure they ran the numbers, this change would have have been in the works long enough for them to know if it's worthwhile.

I don't think it's just a case of likelihood, but also of how much silicon must be disabled in response to a defect in one of those areas. The memory controllers probably take up less space overall, but it seems like the type of silicon that's more likely to be an all-or-nothing area of the chip where a defect would necessitate disabling the whole controller. There are certainly parts of the cache where a defect could have a similar impact, but it also seems that it would be much easier to bake some redundancy so that a defect could be worked around or that a much smaller amount of silicon would need to be disabled in response to it.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,058
7,478
136
I won't be surprised if someone creditable says this is a good trade of... SRAM is known to be easy to yield

- Its possible that an increase in cache is required for their RTRT implementation to work, so there was going to always be more cache anyway and AMD just decided to lean into that instead of going with the minimum amount of cache required and a larger bus.

Just one more day before we can start asking the real questions...
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
Well, as many people here are debating about bus size, I firmly believe that the cache was/is meant for a multi-gpu on single PCB. The architecture, patents, all of the Nvidia/AMD testing has lead to this.

You will see multi 40CU full GPU's sharing the cache in the near future. Nvidia is already doing it with hopper.
 
Reactions: Tlh97 and Saylick
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |