Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 27 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146

jpiniero

Lifer
Oct 1, 2010
14,835
5,452
136
Chiplets allow equivalent performance at lower cost when implemented wisely. This 20 -25 mm^2 chiplet memory controller as a unique die makes no sense to me from an engineering & cost reduction viewpoint, and I'll let you know that a large part of production & design engineering is cost analysis. There is a view that design engineering and cost analysis are completely separate disciplines, but they are not, and the better engineers do both well.

It's the building blocks concept. If they could make that work, it might save them time in the future in not having to worry about the memory controller. Or this could be thrown on an Epyc or APU product easily for instance and get GDDR bandwidth. If the reticle limit is going to get smaller as the nodes progress, you're going to have to do this anyway. May as well start now.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
- I agree that it seems like a far too reductionist approach to chiplet design, especially at this stage in the substrate packaging game: you still want the number of chiplets to be as small as possible because having stuff on die is still orders of magnitude faster than off die. To your point as well, there is a cost and additional failure points for every additional chiplet that goes onto the interposer.

There is a sweet spot between monolithic die and having every last piece of the chip design get broken out into its own tiny 25mm2 chiplet.

Who knows, maybe AMD is already there thanks to everything they've learned in the CPU space, but it would be surprising to say the least.

AFAIK it's Intel with the cheap chiplet bridge tech, and even they're not planning to go this crazy with it. Single unified IO die, if work distribution needs to come from that IO die anyway, seems ok. But breaking up the bus into yet further chiplets is just silly no matter who's doing it.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Mask/design costs at new nodes may drive chiplet size as low as possible (no matter how silly) as they have increased massively with each silicon shrink.
What is this? AFAIK, most of the costs is design, with the physical masks a small fraction of this.

The issue is however, that these costs cannot be reduced by using chiplets. You still have to design the blocks of circuitry no matter if monolithic or chiplet. AND, it's even possible that design costs could increase, as you now have to layout in a way to reconnect all these separate dies + make additional masks.

However, when using chiplets wisely, you can optimize and trade node size, design costs on that node, optimized libraries for that block, power efficiency, etc, to arrive at a lower total cost product.

What started this was the claim that splitting off the memory controllers into 64 bit segments from the IO die. Saving maybe 15% area, all on the same 6nm node, just to reattach them using cutting edge SoIC is more expensive. The tiny increase in yield, by saving a small area, is not worth it. Remember, you also get defects when some fail the bonding process.
 

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
Did you read what I wrote?

Chiplets allow equivalent performance at lower cost when implemented wisely. This 20 -25 mm^2 chiplet memory controller as a unique die makes no sense to me from an engineering & cost reduction viewpoint, and I'll let you know that a large part of production & design engineering is cost analysis. There is a view that design engineering and cost analysis are completely separate disciplines, but they are not, and the better engineers do both well.

6nm is already yielding well enough that shaving off a tiny die to then reconnect using an advanced technique (SoIC) seems to me, working without detailed data, as possibly lowering yield and raising costs even if yields are constant, for NO performance improvement.

I don’t know about the accuracy of the article; however, the memory controllers need to be as close to the cache and main memory as possible. The IO die sounds like it controls PCIE data and video output.

It sounds like in this setup, each memory controller is responsible for a segment of memory, and that segment is cached in its own bit of infinite cache. I could be wrong though.

EDIT: If they ARE doing it this way, I can definitely see why. Adding more MCDs will cause a linear uplift in bandwidth. Larger cache dies can be added, and MORE dies can be added. The dependency on insanely expensive super-fast GDDR6X (or even GDDR7) would be eliminated.
 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
Consider the possibility that this actual vCache will not only be used for GPUs.

When you know this, the masterpiece of chiplet design is truly shining.
Ryzen V-cache chiplets being reused for RDNA 3 is genius. If they are literally reusing four of those 64 MB chiplets for RDNA 3, that implies a 256 MB total Infinity Cache. I was really hoping they'd go for the full enchilada 512 MB to be honest.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Ryzen V-cache chiplets being reused for RDNA 3 is genius. If they are literally reusing four of those 64 MB chiplets for RDNA 3, that implies a 256 MB total Infinity Cache. I was really hoping they'd go for the full enchilada 512 MB to be honest.
You are still thinking too small in terms of what AMD actually can do with that VCache, and their products .
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136

I suggest waiting for this APU video, that Paul mentioned.

And yes, it will be interesting in the context of recent discussion in this thread .
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,062
7,487
136
Consider the possibility that this actual vCache will not only be used for GPUs.

When you know this, the masterpiece of chiplet design is truly shining.

- Now that would be the kind of CPU/GPU paradigm shifting synergies everyone has been waiting for since AMD picked up ATI way back when...
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
36mm2 64MB L3 cache costs pennies.
Pennies is a little exaggerated. I think the true cost per die is closer to dollars.

Assuming 6mm x 6mm dies, 12" wafer, 0.09 defect density, you get 1580 dies. If each N6 wafer is $6000 then that's around a few dollars per die. Even with the cost of packaging, it likely is still within single digit dollar amounts per wafer.

 
Reactions: Mopetar and psolord

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
Pennies is a little exaggerated. I think the true cost per die is closer to dollars.

Assuming 6mm x 6mm dies, 12" wafer, 0.09 defect density, you get 1580 dies. If each N6 wafer is $6000 then that's around a few dollars per die. Even with the cost of packaging, it likely is still within single digit dollar amounts per wafer.

View attachment 60063

The die space isn't the only cost, it is actually the cheapest part.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Pennies is a little exaggerated. I think the true cost per die is closer to dollars.

Assuming 6mm x 6mm dies, 12" wafer, 0.09 defect density, you get 1580 dies. If each N6 wafer is $6000 then that's around a few dollars per die. Even with the cost of packaging, it likely is still within single digit dollar amounts per wafer.

View attachment 60063
I used a 12mm x 12mm die and got this. Using your $6000 / wafer.

(4) 6mm x 6mm = $15.19
(1) 12mm x 12mm = $17.00

A die of 144mm^2 costs just $1.81 or 12% more than (4) x 36mm^2
I can't see the great cost savings here by using tiny die especially as you have 4X fusing operations to mess up.

 
Reactions: Saylick

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
I used a 12mm x 12mm die and got this. Using your $6000 / wafer.

(4) 6mm x 6mm = $15.19
(1) 12mm x 12mm = $17.00

A die of 144mm^2 costs just $1.81 or 12% more than (4) x 36mm^2
I can't see the great cost savings here by using tiny die especially as you have 4X fusing operations to mess up.

View attachment 60067
It makes the most sense if you have another use for the smaller die. Yes, 12mm x 12mm isn't that much less efficient but those dies can't be used for products that use the 6mm x 6mm dies.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
It makes the most sense if you have another use for the smaller die. Yes, 12mm x 12mm isn't that much less efficient but those dies can't be used for products that use the 6mm x 6mm dies.
Can they even fuse 3+ die in one operation taking the same time as doing a 1+1 operation? If no, then no one is thinking about the production bottleneck in assembly. We have focused on fabbing limits, but not on the other assembly steps needed.

If yes, then forget everything I'm writing.
 

soresu

Platinum Member
Dec 19, 2014
2,959
2,181
136
7 dies ?
So 2x dies for compute (7680 shaders each)
1x memory controller
4x 3D-Vcache chips
I think it's reasonable to assume more than just memory/IO for the singular chip.

There will probably be some degree of control/sync circuitry in there too to hide latency between the chiplets and make them appear as a single device to the system.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Pennies is a little exaggerated. I think the true cost per die is closer to dollars.

Assuming 6mm x 6mm dies, 12" wafer, 0.09 defect density, you get 1580 dies. If each N6 wafer is $6000 then that's around a few dollars per die. Even with the cost of packaging, it likely is still within single digit dollar amounts per wafer.

View attachment 60063

Cache dies cost less than normal logic dies as they have a lot less layers to get through. Which seems a big part of the reason AMD went this route. Packaging might cost as much as the chip.

I'm also assuming the cache dies take up a lot of the interconnect bandwidth. If any logic chiplet can address any LLC chiplet then that should be a lot of the work, you retire waves to LLC then the next available CUs consume from LLC.
 
Last edited:
Reactions: Tlh97 and RnR_au

Bigos

Member
Jun 2, 2019
138
322
136
I don't see how AMD could reuse the same cache dies between CPU and GPU. There are various factors that make them incompatible:
  • CPU is using 64-byte cache lines and GPU is using 128-byte ones. Unless the whole cache hierarchy in RDNA3 is overhauled.
  • The CPU L3 cache is a high frequency design that needs to operate at over 3GHz with low latency, there is no such high of a requirement for GPU.
  • The current v-cache die is an add-on on top of 32MB of base die cache, it will not work without it. I also doubt AMD will remove L3 cache from Zen base dies until they get VERY confident about SoIC, including having low cost.
I believe that the CPU v-cache die has been specifically tailored for the Zen 3 base die. If they want to use SoIC with RDNA3 they would need to design something specific for that.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
I don't see how AMD could reuse the same cache dies between CPU and GPU. There are various factors that make them incompatible:
  • CPU is using 64-byte cache lines and GPU is using 128-byte ones. Unless the whole cache hierarchy in RDNA3 is overhauled.
  • The CPU L3 cache is a high frequency design that needs to operate at over 3GHz with low latency, there is no such high of a requirement for GPU.
  • The current v-cache die is an add-on on top of 32MB of base die cache, it will not work without it. I also doubt AMD will remove L3 cache from Zen base dies until they get VERY confident about SoIC, including having low cost.
I believe that the CPU v-cache die has been specifically tailored for the Zen 3 base die. If they want to use SoIC with RDNA3 they would need to design something specific for that.
I think in this topic upcoming video that Paul from RedGamingTech touted will be very important, and will shed some light.

Stay tuned, guys .
 

Mopetar

Diamond Member
Jan 31, 2011
8,005
6,449
136
I don't see how AMD could reuse the same cache dies between CPU and GPU. There are various factors that make them incompatible:
  • CPU is using 64-byte cache lines and GPU is using 128-byte ones. Unless the whole cache hierarchy in RDNA3 is overhauled.
  • The CPU L3 cache is a high frequency design that needs to operate at over 3GHz with low latency, there is no such high of a requirement for GPU.
  • The current v-cache die is an add-on on top of 32MB of base die cache, it will not work without it. I also doubt AMD will remove L3 cache from Zen base dies until they get VERY confident about SoIC, including having low cost.
I believe that the CPU v-cache die has been specifically tailored for the Zen 3 base die. If they want to use SoIC with RDNA3 they would need to design something specific for that.

Just store the two halves of the line in two different chips. Since the v-cache chips don't have the controller (or any more of it than necessary for basic functionality) built in it's probably not too difficult to recycle them. There's also no reason that they couldn't run the cache at a different frequency than the core clock rate. RDNA can already clock high, but if RDNA3 won't see uplifts there just clock the cache at 2x the core of synchronization is a big issue.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,155
136
Did I read right that the first RDNA3 card being released will be the entry model that will use a legacy GPU die and not MCM?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |