I don't think these 5x32b (20GB) and 3x32b (12GB) memory configurations are realistic. The IF cache has to be distributed close to the shaders for energy efficiency. There's a reason why Navi2 has the IF cache spread out on several sides of the die. Data locality maximized. If you start having odd memory controller configs, this can't be preserved. The iF cache needs to be spread out and situated close to the shaders it serves and the memory controller should be as close to the IF cache it services. This will work well for most data.
IF cache is actually close to Memory controllers and quite far from shaders. ALU-->VGPR-->L0-->L1-->L2-->IFC/LLC-->DRAM.
L2 slices are actually in the center of the chip close to each other near the CP (because there are crossbars between them)
Shaders (from different waves) which export to memory feed back to another set of shaders without going out to DRAM at L2. To maximize parallelism (and therefore to avoid multiple shaders grinding specific DRAM channels), each L2 slice is associated with IFC/LLC chunk.
However I am quite perplexed at so many MCDs. why make so many tiny MCDs instead of bigger and fewer MCDs.
One reason I can imagine is that there is broad reuse of these MCDs across product lines otherwise I am scratching my head, it is not like 100mm2+ dies are not getting great yields.
Wondering about packaging steps too. I think EFB not CoWoS.
And why 3D? In AMD's slide they were specifying 3D for MI300 but RDNA3 is just plain "Advanced Chiplet Packaging"
For 3D IFC not sure, because that would mean packaging at TMSC for SoIC and Tongfu for EFB.
But lets see, another 3 months.