Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 26 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
As a reminder, TSMC N7 and N6 use same equipment and same production lines.
N5 use different line that is shared with N4

N7 uses EUV, which allows for less machine time per wafer. If everyone moved over to N6 tomorrow (assuming TSMC has enough EUV capacity), TSMC would have far more capacity available. That is why they are discounting it over N7. As a bonus, the chips themselves end up being smaller, which means that more chips will fit on each wafer.
 

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
That's what I mean. Doing RDNA3 on N6 makes no sense. A shrink of RDNA2 to N6 with some gutting would make more sense (and that doesn't appear to be what they are doing). But either of these isn't going to work if mining collapses.

If AMD went with RDNA2 for lower end parts, they'd get soundly beaten in a number of areas compared to the competition, including Ray Tracing. By moving to RDNA3 even on low end to midrange, they ensure they are competitive across the board.
 

Mopetar

Diamond Member
Jan 31, 2011
8,005
6,449
136
The only reason to do a shrink of some existing RDNA2 part to N6 would be to release it sooner because RDNA3 isn't ready. Otherwise AMD already has a really low-end N6 RDNA2 part in Navi 24.
 
Reactions: Tlh97

DeathReborn

Platinum Member
Oct 11, 2005
2,755
751
136
N7 uses EUV, which allows for less machine time per wafer. If everyone moved over to N6 tomorrow (assuming TSMC has enough EUV capacity), TSMC would have far more capacity available. That is why they are discounting it over N7. As a bonus, the chips themselves end up being smaller, which means that more chips will fit on each wafer.

N7 & N7P do not use EUV, only N7+ & N6 do.
 

jpiniero

Lifer
Oct 1, 2010
14,835
5,452
136
If AMD went with RDNA2 for lower end parts, they'd get soundly beaten in a number of areas compared to the competition, including Ray Tracing. By moving to RDNA3 even on low end to midrange, they ensure they are competitive across the board.

AD106 is going to be slower than the 3080. It's just a matter of how much. Navi 21 as is is fine. If you have to cut costs because you don't like selling it at $600 than okay.
Given that the high end parts are using a completely different chiplet method, it seems like it would have just been easier to gut Navi 21 instead of doing a new RDNA3 part that isn't going to be any faster.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,243
1,680
136
As a bonus, the chips themselves end up being smaller, which means that more chips will fit on each wafer.

TSMC literature heavily implies you get smaller dies just by switching your existing design to N6, but I've been told numerous times that isn't true unless you redesign for N6 to take advantage of the shrink.

Maybe someone here knows the truth.
 

Mopetar

Diamond Member
Jan 31, 2011
8,005
6,449
136
TSMC literature heavily implies you get smaller dies just by switching your existing design to N6, but I've been told numerous times that isn't true unless you redesign for N6 to take advantage of the shrink.

Maybe someone here knows the truth.

It uses the same design rules so N7 designs can be ported without changes and possibly even reuse some mask layers depending on how it's all set up.

However you don't get a free shrink. N6 allows for greater density, but you'd have to adjust the design to accommodate this as it doesn't just scale down an existing design.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,062
7,487
136
Not sure if this is old news or not:


7 chiplets on an N31 package: 2 Logic Dies, 4 memory controllers, 1 I/O die.

Not what I would expect: figured the memory controlers would be built into the logic or I/O die (so we'd have 2 logic dies and 1 I/O die).

I have to imagine there will be a lot of latency hiding tricks employed by AMD on this one to make it work, each die to die hop is an eternity by modern computing standards, and anything that requires more than 1 die hop would basically have to be thrown away.
 

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
Not sure if this is old news or not:


7 chiplets on an N31 package: 2 Logic Dies, 4 memory controllers, 1 I/O die.

Not what I would expect: figured the memory controlers would be built into the logic or I/O die (so we'd have 2 logic dies and 1 I/O die).

I have to imagine there will be a lot of latency hiding tricks employed by AMD on this one to make it work, each die to die hop is an eternity by modern computing standards, and anything that requires more than 1 die hop would basically have to be thrown away.

I think this way you can reuse the IO die with different bus size (eg memory controller dies).
 
Reactions: Tlh97 and coercitiv

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
Not sure if this is old news or not:


7 chiplets on an N31 package: 2 Logic Dies, 4 memory controllers, 1 I/O die.

Not what I would expect: figured the memory controlers would be built into the logic or I/O die (so we'd have 2 logic dies and 1 I/O die).

I have to imagine there will be a lot of latency hiding tricks employed by AMD on this one to make it work, each die to die hop is an eternity by modern computing standards, and anything that requires more than 1 die hop would basically have to be thrown away.

It will be interesting to see how high these cards can clock and what the temps are. One advantage of going with multiple chiplets is that heat is somewhat more distributed.

It would be pretty amazing if we saw 3+GHz out of this card.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Not sure if this is old news or not:


7 chiplets on an N31 package: 2 Logic Dies, 4 memory controllers, 1 I/O die.

Not what I would expect: figured the memory controlers would be built into the logic or I/O die (so we'd have 2 logic dies and 1 I/O die).

I have to imagine there will be a lot of latency hiding tricks employed by AMD on this one to make it work, each die to die hop is an eternity by modern computing standards, and anything that requires more than 1 die hop would basically have to be thrown away.
I'm very dubious about this. Why have a small 64 bit controller on a separate die? What is the cost of bonding, seeing that SoIC tech is probably used?

Maybe the total N3X family consists of 7 dies and the whisper got distorted with repetition.

N31 compute
N31 IO/cache
N32 compute
N32 IO/cache
N33
N34
N35
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
The MC dies, if they exist, would likely contain the Infinity Cache which would significantly increase their size.

But that would also rule out the use of IC as bridges between logic dies. Or wouldn't it? In any case sound like there are quite a few different ways to overhaul the package hierarchy, will be interesting to see which route AMD picks.
 

jpiniero

Lifer
Oct 1, 2010
14,835
5,452
136
I'm very dubious about this. Why have a small 64 bit controller on a separate die? What is the cost of bonding, seeing that SoIC tech is probably used?

Reuse, I'm guessing. Navi 31 would have 4 of them and Navi 32 would have 3. Maybe they could do an 8 die version for Radeon Pro or something.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
I'm very dubious about this. Why have a small 64 bit controller on a separate die? What is the cost of bonding, seeing that SoIC tech is probably used?

Maybe the total N3X family consists of 7 dies and the whisper got distorted with repetition.

N31 compute
N31 IO/cache
N32 compute
N32 IO/cache
N33
N34
N35
The reason why cache could be on a separate dies(s) is that it may be used for ANOTHER next generation product from AMD. Not only GPU.

Anyone can think of anything?
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
I don't know how TPU came to the conclusion that the MCDs contain memory controllers but my take on it was that the IOD contains all of the IO and memory controllers (256-bit bus) and each MCD is simply just a cache chiplet that contains 64MB or 128MB of cache, so that 4 of these would be either 256 MB or 512 MB of Infinity Cache. It is likely that the MCDs are vertically stacked onto the IOD.
 
Reactions: Tlh97 and Glo.

GodisanAtheist

Diamond Member
Nov 16, 2006
7,062
7,487
136
I'm very dubious about this. Why have a small 64 bit controller on a separate die? What is the cost of bonding, seeing that SoIC tech is probably used?

Maybe the total N3X family consists of 7 dies and the whisper got distorted with repetition.

N31 compute
N31 IO/cache
N32 compute
N32 IO/cache
N33
N34
N35

- I agree that it seems like a far too reductionist approach to chiplet design, especially at this stage in the substrate packaging game: you still want the number of chiplets to be as small as possible because having stuff on die is still orders of magnitude faster than off die. To your point as well, there is a cost and additional failure points for every additional chiplet that goes onto the interposer.

There is a sweet spot between monolithic die and having every last piece of the chip design get broken out into its own tiny 25mm2 chiplet.

Who knows, maybe AMD is already there thanks to everything they've learned in the CPU space, but it would be surprising to say the least.
 
Reactions: Tlh97 and maddie

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Here is a graphic, but in my opinion, fairly accurate relative proportioned die layout of N21.

The top and bottom contain 4 32 bit memory controllers each. Are we saying that AMD is fabbing 2 of those (64 bit) on a separate chiplet? Why? Maybe if we remember what is the reason for chiplets we can see if this is reasonable.

I can list 2 right away.
Higher yields for leading edge expensive nodes. It's important to know that this advantage is reduced as the node yield rises.
A wider range of products can be assembled from less unique dies.

The rumor of 7 dies for N31 has the largest sections of the GPU on expensive 5nm and the most & smallest dies on much cheaper 6nm. Sounds like the opposite application of the chiplet strategy to improve production costs.


 

jpiniero

Lifer
Oct 1, 2010
14,835
5,452
136
The rumor of 7 dies for N31 has the largest sections of the GPU on expensive 5nm and the most & smallest dies on much cheaper 6nm. Sounds like the opposite application of the chiplet strategy to improve production costs.

I think the strategy is performance (and ASP)
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
The rumor of 7 dies for N31 has the largest sections of the GPU on expensive 5nm and the most & smallest dies on much cheaper 6nm. Sounds like the opposite application of the chiplet strategy to improve production costs.


View attachment 59992
Nor caches, nor memory controllers scale with die shrinks like ALUs do.

I don't believe that memory controllers are separate from the IODie. Cache - it can be separate, from anything, and scalable with everything.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
I think the strategy is performance (and ASP)
Did you read what I wrote?

Chiplets allow equivalent performance at lower cost when implemented wisely. This 20 -25 mm^2 chiplet memory controller as a unique die makes no sense to me from an engineering & cost reduction viewpoint, and I'll let you know that a large part of production & design engineering is cost analysis. There is a view that design engineering and cost analysis are completely separate disciplines, but they are not, and the better engineers do both well.

6nm is already yielding well enough that shaving off a tiny die to then reconnect using an advanced technique (SoIC) seems to me, working without detailed data, as possibly lowering yield and raising costs even if yields are constant, for NO performance improvement.
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
Did you read what I wrote?

Chiplets allow equivalent performance at lower cost when implemented wisely. This 20 -25 mm^2 chiplet memory controller as a unique die makes no sense to me from an engineering & cost reduction viewpoint, and I'll let you know that a large part of production & design engineering is cost analysis. There is a view that design engineering and cost analysis are completely separate disciplines, but they are not, and the better engineers do both well.

6nm is already yielding well enough that shaving off a tiny die to then reconnect using an advanced technique (SoIC) seems to me, working without detailed data, as possibly lowering yield and raising costs even if yields are constant, for NO performance improvement.
I don't think anyone is disagreeing with you here.

Again, I am of the opinion that of the seven (7) rumored dies, they are as follows:
1) GCD0: 5nm, contains ALUs and other graphics related blocks, contains TSVs to connect to IOD
2) GCD1: See above
3) IOD: 6nm, contains all memory controllers, PHY for VRAM, TSVs to connect to MCDs and GCDs, PCIe interface, video encoders, etc.
4) MCD0: 6nm, contains 64 MB or 128 MB cache and TSVs to connect to other chips, likely directly to the IOD
5) MCD1: See above
6) MCD2: See above
7) MCD3: See above
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |