Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

jpiniero · Apr 13, 2022

maddie said:
Chiplets allow equivalent performance at lower cost when implemented wisely. This 20 -25 mm^2 chiplet memory controller as a unique die makes no sense to me from an engineering & cost reduction viewpoint, and I'll let you know that a large part of production & design engineering is cost analysis. There is a view that design engineering and cost analysis are completely separate disciplines, but they are not, and the better engineers do both well.

It's the building blocks concept. If they could make that work, it might save them time in the future in not having to worry about the memory controller. Or this could be thrown on an Epyc or APU product easily for instance and get GDDR bandwidth. If the reticle limit is going to get smaller as the nodes progress, you're going to have to do this anyway. May as well start now.

Frenetic Pony · Apr 13, 2022

GodisanAtheist said:
- I agree that it seems like a far too reductionist approach to chiplet design, especially at this stage in the substrate packaging game: you still want the number of chiplets to be as small as possible because having stuff on die is still orders of magnitude faster than off die. To your point as well, there is a cost and additional failure points for every additional chiplet that goes onto the interposer.

There is a sweet spot between monolithic die and having every last piece of the chip design get broken out into its own tiny 25mm2 chiplet.

Who knows, maybe AMD is already there thanks to everything they've learned in the CPU space, but it would be surprising to say the least.

AFAIK it's Intel with the cheap chiplet bridge tech, and even they're not planning to go this crazy with it. Single unified IO die, if work distribution needs to come from that IO die anyway, seems ok. But breaking up the bus into yet further chiplets is just silly no matter who's doing it.

soresu · Apr 14, 2022

Frenetic Pony said:
But breaking up the bus into yet further chiplets is just silly no matter who's doing it.

Mask/design costs at new nodes may drive chiplet size as low as possible (no matter how silly) as they have increased massively with each silicon shrink.

maddie · Apr 14, 2022

soresu said:
Mask/design costs at new nodes may drive chiplet size as low as possible (no matter how silly) as they have increased massively with each silicon shrink.

What is this? AFAIK, most of the costs is design, with the physical masks a small fraction of this.

The issue is however, that these costs cannot be reduced by using chiplets. You still have to design the blocks of circuitry no matter if monolithic or chiplet. AND, it's even possible that design costs could increase, as you now have to layout in a way to reconnect all these separate dies + make additional masks.

However, when using chiplets wisely, you can optimize and trade node size, design costs on that node, optimized libraries for that block, power efficiency, etc, to arrive at a lower total cost product.

What started this was the claim that splitting off the memory controllers into 64 bit segments from the IO die. Saving maybe 15% area, all on the same 6nm node, just to reattach them using cutting edge SoIC is more expensive. The tiny increase in yield, by saving a small area, is not worth it. Remember, you also get defects when some fail the bonding process.

SmokSmog · Apr 14, 2022

7 dies ?
So 2x dies for compute (7680 shaders each)
1x memory controller
4x 3D-Vcache chips

Glo. · Apr 14, 2022

SmokSmog said:
7 dies ?
So 2x dies for compute (7680 shaders each)
1x memory controller
4x 3D-Vcache chips

Consider the possibility that this actual vCache will not only be used for GPUs.

When you know this, the masterpiece of chiplet design is truly shining.

eek2121 · Apr 14, 2022

maddie said:
Did you read what I wrote?

Chiplets allow equivalent performance at lower cost when implemented wisely. This 20 -25 mm^2 chiplet memory controller as a unique die makes no sense to me from an engineering & cost reduction viewpoint, and I'll let you know that a large part of production & design engineering is cost analysis. There is a view that design engineering and cost analysis are completely separate disciplines, but they are not, and the better engineers do both well.

6nm is already yielding well enough that shaving off a tiny die to then reconnect using an advanced technique (SoIC) seems to me, working without detailed data, as possibly lowering yield and raising costs even if yields are constant, for NO performance improvement.

I don’t know about the accuracy of the article; however, the memory controllers need to be as close to the cache and main memory as possible. The IO die sounds like it controls PCIE data and video output.

It sounds like in this setup, each memory controller is responsible for a segment of memory, and that segment is cached in its own bit of infinite cache. I could be wrong though.

EDIT: If they ARE doing it this way, I can definitely see why. Adding more MCDs will cause a linear uplift in bandwidth. Larger cache dies can be added, and MORE dies can be added. The dependency on insanely expensive super-fast GDDR6X (or even GDDR7) would be eliminated.

Saylick · Apr 14, 2022

Glo. said:
Consider the possibility that this actual vCache will not only be used for GPUs.

When you know this, the masterpiece of chiplet design is truly shining.

Ryzen V-cache chiplets being reused for RDNA 3 is genius. If they are literally reusing four of those 64 MB chiplets for RDNA 3, that implies a 256 MB total Infinity Cache. I was really hoping they'd go for the full enchilada 512 MB to be honest.

Glo. · Apr 14, 2022

Saylick said:
Ryzen V-cache chiplets being reused for RDNA 3 is genius. If they are literally reusing four of those 64 MB chiplets for RDNA 3, that implies a 256 MB total Infinity Cache. I was really hoping they'd go for the full enchilada 512 MB to be honest.

You are still thinking too small in terms of what AMD actually can do with that VCache, and their products .

eek2121 · Apr 14, 2022

Glo. said:
You are still thinking too small in terms of what AMD actually can do with that VCache, and their products .

Indeed, I don't know how much the cache costs (apparently 1 layer is significantly less than $100 ) , but they could potentially stack quite a few of them up if they wanted!

Glo. · Apr 14, 2022

I suggest waiting for this APU video, that Paul mentioned.

And yes, it will be interesting in the context of recent discussion in this thread .

GodisanAtheist · Apr 14, 2022

Glo. said:
Consider the possibility that this actual vCache will not only be used for GPUs.

When you know this, the masterpiece of chiplet design is truly shining.

- Now that would be the kind of CPU/GPU paradigm shifting synergies everyone has been waiting for since AMD picked up ATI way back when...

SmokSmog · Apr 14, 2022

36mm2 64MB L3 cache costs pennies.

Saylick · Apr 14, 2022

SmokSmog said:
36mm2 64MB L3 cache costs pennies.

Pennies is a little exaggerated. I think the true cost per die is closer to dollars.

Assuming 6mm x 6mm dies, 12" wafer, 0.09 defect density, you get 1580 dies. If each N6 wafer is $6000 then that's around a few dollars per die. Even with the cost of packaging, it likely is still within single digit dollar amounts per wafer.

eek2121 · Apr 14, 2022

Saylick said:
Pennies is a little exaggerated. I think the true cost per die is closer to dollars.

Assuming 6mm x 6mm dies, 12" wafer, 0.09 defect density, you get 1580 dies. If each N6 wafer is $6000 then that's around a few dollars per die. Even with the cost of packaging, it likely is still within single digit dollar amounts per wafer.

View attachment 60063

The die space isn't the only cost, it is actually the cheapest part.

maddie · Apr 14, 2022

Saylick said:
Pennies is a little exaggerated. I think the true cost per die is closer to dollars.

Assuming 6mm x 6mm dies, 12" wafer, 0.09 defect density, you get 1580 dies. If each N6 wafer is $6000 then that's around a few dollars per die. Even with the cost of packaging, it likely is still within single digit dollar amounts per wafer.

View attachment 60063

I used a 12mm x 12mm die and got this. Using your $6000 / wafer.

(4) 6mm x 6mm = $15.19
(1) 12mm x 12mm = $17.00

A die of 144mm^2 costs just $1.81 or 12% more than (4) x 36mm^2
I can't see the great cost savings here by using tiny die especially as you have 4X fusing operations to mess up.

Saylick · Apr 14, 2022

maddie said:
I used a 12mm x 12mm die and got this. Using your $6000 / wafer.

(4) 6mm x 6mm = $15.19
(1) 12mm x 12mm = $17.00

A die of 144mm^2 costs just $1.81 or 12% more than (4) x 36mm^2
I can't see the great cost savings here by using tiny die especially as you have 4X fusing operations to mess up.

View attachment 60067

It makes the most sense if you have another use for the smaller die. Yes, 12mm x 12mm isn't that much less efficient but those dies can't be used for products that use the 6mm x 6mm dies.

maddie · Apr 15, 2022

Saylick said:
It makes the most sense if you have another use for the smaller die. Yes, 12mm x 12mm isn't that much less efficient but those dies can't be used for products that use the 6mm x 6mm dies.

Can they even fuse 3+ die in one operation taking the same time as doing a 1+1 operation? If no, then no one is thinking about the production bottleneck in assembly. We have focused on fabbing limits, but not on the other assembly steps needed.

If yes, then forget everything I'm writing.

soresu · Apr 15, 2022

SmokSmog said:
7 dies ?
So 2x dies for compute (7680 shaders each)
1x memory controller
4x 3D-Vcache chips

I think it's reasonable to assume more than just memory/IO for the singular chip.

There will probably be some degree of control/sync circuitry in there too to hide latency between the chiplets and make them appear as a single device to the system.

Frenetic Pony · Apr 15, 2022

Saylick said:
Pennies is a little exaggerated. I think the true cost per die is closer to dollars.

Assuming 6mm x 6mm dies, 12" wafer, 0.09 defect density, you get 1580 dies. If each N6 wafer is $6000 then that's around a few dollars per die. Even with the cost of packaging, it likely is still within single digit dollar amounts per wafer.

View attachment 60063

Cache dies cost less than normal logic dies as they have a lot less layers to get through. Which seems a big part of the reason AMD went this route. Packaging might cost as much as the chip.

I'm also assuming the cache dies take up a lot of the interconnect bandwidth. If any logic chiplet can address any LLC chiplet then that should be a lot of the work, you retire waves to LLC then the next available CUs consume from LLC.

Bigos · Apr 15, 2022

I don't see how AMD could reuse the same cache dies between CPU and GPU. There are various factors that make them incompatible:

CPU is using 64-byte cache lines and GPU is using 128-byte ones. Unless the whole cache hierarchy in RDNA3 is overhauled.
The CPU L3 cache is a high frequency design that needs to operate at over 3GHz with low latency, there is no such high of a requirement for GPU.
The current v-cache die is an add-on on top of 32MB of base die cache, it will not work without it. I also doubt AMD will remove L3 cache from Zen base dies until they get VERY confident about SoIC, including having low cost.

I believe that the CPU v-cache die has been specifically tailored for the Zen 3 base die. If they want to use SoIC with RDNA3 they would need to design something specific for that.

Glo. · Apr 15, 2022

Bigos said:
I don't see how AMD could reuse the same cache dies between CPU and GPU. There are various factors that make them incompatible:

CPU is using 64-byte cache lines and GPU is using 128-byte ones. Unless the whole cache hierarchy in RDNA3 is overhauled.

The CPU L3 cache is a high frequency design that needs to operate at over 3GHz with low latency, there is no such high of a requirement for GPU.

The current v-cache die is an add-on on top of 32MB of base die cache, it will not work without it. I also doubt AMD will remove L3 cache from Zen base dies until they get VERY confident about SoIC, including having low cost.

I believe that the CPU v-cache die has been specifically tailored for the Zen 3 base die. If they want to use SoIC with RDNA3 they would need to design something specific for that.

I think in this topic upcoming video that Paul from RedGamingTech touted will be very important, and will shed some light.

Stay tuned, guys .

Mopetar · Apr 15, 2022

Bigos said:
I don't see how AMD could reuse the same cache dies between CPU and GPU. There are various factors that make them incompatible:

CPU is using 64-byte cache lines and GPU is using 128-byte ones. Unless the whole cache hierarchy in RDNA3 is overhauled.

The CPU L3 cache is a high frequency design that needs to operate at over 3GHz with low latency, there is no such high of a requirement for GPU.

The current v-cache die is an add-on on top of 32MB of base die cache, it will not work without it. I also doubt AMD will remove L3 cache from Zen base dies until they get VERY confident about SoIC, including having low cost.

I believe that the CPU v-cache die has been specifically tailored for the Zen 3 base die. If they want to use SoIC with RDNA3 they would need to design something specific for that.

Just store the two halves of the line in two different chips. Since the v-cache chips don't have the controller (or any more of it than necessary for basic functionality) built in it's probably not too difficult to recycle them. There's also no reason that they couldn't run the cache at a different frequency than the core clock rate. RDNA can already clock high, but if RDNA3 won't see uplifts there just clock the cache at 2x the core of synchronization is a big issue.

RnR_au · Apr 16, 2022

Frenetic Pony said:
Cache dies cost less than normal logic dies as they have a lot less layers to get through.

Thats a really interesting point. Do we know the layer difference between logic and cache dies? Roughly?

A/// · Apr 17, 2022

Did I read right that the first RDNA3 card being released will be the entry model that will use a legacy GPU die and not MCM?

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Lifer

Senior member

Platinum Member

Diamond Member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Member

Diamond Member

Diamond Member

Golden Member

Diamond Member