Thanks for your diagrams! Here is my attempt to reconcile the chip layout with my ideas about a "quad-tree" topology discussed in my CCX speculation thread (
here), assuming each CPU chiplet consists of two 4-core CCXs, where each pair of chiplets forms a fully connected cluster of 4 CCXs (16 cores), and where these clusters are fully connected to each other.
There's close to no chance that there's enough room in each chiplet for 2 distinct core complexes if there's also 32MB of L3, and even with only 16MB of L3 you certainly aren't going to find the space for that much IO.
Perhaps that's why the chiplets forming each pair are mounted so close to each other? For some edge connection? Or the pair sits on top of an interposer which provides the interconnect?
I also agree that in order to have all of those IF-links, a small silicon interposer is required under each pair of chiplets. I don't see any interposers there but on the other hand, there's too much glue to see anything. IFOB-links
(Infinity Fabric On-Package) have power efficiency
(PE) of ~2 pJ/b while IFIS-links
(Infinity Fabric InterSocket) have PE of ~11 pJ/b (
source) or rather ~9 pJ/b (
source). On-die has PE of ~0.1 pJ/b and PCIe/DDR something like ~20 pJ/b. Technologies like silicon interposer or EMIB has PE of under 1 pJ/b (
source) and therefore in the future AMD should (and they probably already have) really look into those bridge chiplets.
I have no idea what the underlying topology would be either for the chiplets or the I/O-die but they both have the same amount of nodes (eight cores and eight chiplets). However I'm interested to hear any ideas anyone might have tough.
Edit: Let me rephrase that. If each chiplet has two CCXs each having four fully connected cores (crossbar topology) then what's the point of having 8 chiplets instead of 16 (4-core chiplets). If there really are silicon interposers under those pairs of chiplets then those four CCXs (on two chiplets) would be fully connected (crossbar again) on a higher level like
Vattila has shown in the diagram. As long as there is enough room for all the microbumps in the chiplets (I think that normal C4 bumps are much larger) this should be somewhat possible using a silicon interposer.
The highest level is currently a bit weird because some of the connections run through the I/O die (as I understand it) and some are directly between chiplets. If the interposers would be active (containing logic and transistors) then maybe all four active interposers would again be fully connected together (crossbar once more) even on a higher level. That would be a routing nighmare probably even worst than Naples because now there's also the I/O die on the way. On top of that there would be additional IF-links from each chiplet to the I/O die. I feel like the organic package in Rome only contains 8 links from each chiplet to the I/O die and all complexity of the routing is hidden inside each node (either a chiplet or the I/O die). Choosing a good routing topology to more than 4 nodes is quite a hard problem it seems.
While Vattila's basic idea is good there are a lot of problems related to running a lot of wiring in an organic package where the Rome chiplets and I/O die sits on. Then again there are a lot of problems with 8-core CCXs also.
And then there is always the possibility that they (AMD) have switched to some kind of ring bus topology which might be fine. I think that a mesh or something like the ButterDonut topology should probably be reserved for cases with more nodes and maybe active interposers.