There might be more reason to split an IO die if it was only doing IO and different market segments only differed based on need for a scaling amount of IO.
Now that AMD has put graphical capabilities on their desktop Zen CPUs, a split die isn't going to happen. It would require too much duplication of resources that a server IO die has no need for at all.
Furthermore, it's not so simple in just the server market. There is a niche where someone doesn't care all that much about the core count, but does want the maximum number of PCI lanes or memory channels.
Designing and building a split die isn't going to be as economical as one might at first assume. The requirements between desktop and server have diverged enough where you need two separate designs and server isn't as simple as matching the number of cores to the amount of IO either.
I am not sure that is true. The consumer IO die is around 125 mm2; Genoa IO die is close to 400, even with extra switches and connectivity. The consumer IO die contains things not needed by servers, but that is easy to get around by stacking another chiplet on top (like a gpu chiplet) or embedding another die underneath with the needed functionality. If they use embedded die with micro-solder ball style stacking, then the embedded die can be made anywhere, like Global Foundries. The SoIC stacking would require both die to be made at TSMC. Stacking tech could allow for a very general base die since embedded die underneath can be used to attach to any type of memory, it can be independent of memory type. Just use a different embedded bridge die with a common interface to the top die.
You may be right partially right; I don’t know if I would expect to see something like this in most consumer parts. Most consumer parts should actually be APUs anyway. Maybe a high end consumer part with more than one stack, but I am not sure where that would fit in the market. For all but the highest end consumer parts, I have wondered if they basically could use an APU with GMI link(s) or just bridge die. Embed some low power Zen 4 based cores in an APU (basically an IO die) and then connect a Zen 5 chiplet for when more power is needed. That would make an excellent laptop chip and possibly cover most of the consumer product stack.
Splitting the server IO die would likely not be difficult. There are internal connectivity diagrams for it showing internal switches and such. It is split into 4 quadrants already with different latencies between them. It has Numa Per Socket (NPS) settings to take advantage of this. It can be set to NPS1, NPS2, or NPS4. The NPS settings also change the memory interleave. NPS1 interleaves across all 8 channels, NPS2 across 4 channels in each half, and NPS4 just interleaves across the 2 channels in each quadrant. These settings trade off between maximum bandwidth or lower latency, but require the application be numa aware. This seems like it would be very easy to split into separate die; it has never really been monolithic.
One other thing I have been thinking about is that Zen 5 will likely have massively increased FP power, so they will likely be adding HBM to HPC processors in addition to mixing CPU and GPU chiplets in the same package. This seems to imply that the cpu chiplets will need to be stacked somehow. I don’t know if it could be embedded die and/or GMI for everything. If you think about the layout of Genoa, with a centralized IO die, where would you put HBM? You want the HBM as close to the cpu or gpu cores as possible, not limited by a GMI link. How do you also scale it up to at least 8 compute chiplets? This makes me believe that the cpu cores may use similar, if not the same set-up as the GPUs. A base die with the cpu (or gpu or FPGA or whatever accelerator) stacked on top and then bridge chips to HBM or system memory would allow memory access without going through a GMI link.
The diagrams I have seen were showing that the next gen GPUs would be two elements, possibly stacks with embedded die and/or SoIC stacked on top. They are connected together with very high bandwidth, likely embedded bridge. They have HBM along one side and the other side is used to connect to another dual gpu module. Then two such sets can be connected together to make an 8 gpu chiplet device. This is why I was pointing at the diagram for crusher system here:
ORNL has published the overview of its Crusher system which is powered by AMD's Optimized 3rd Gen EPYC CPUs & Instinct MI250X GPUs.
wccftech.com
This crusher system seems like it is very a similar to upcoming systems that will mix CPUs and GPUs. This shows 200 GB/s link between adjacent gpus and 50 GB/s links for “remote” GPUs. It may actually be a test platform to some extent. The adjacent devices may move to silicon bridge connection. The MI250X appears to have 4x high speed gpu-gpu links (200 GB/s) for the adjacent gpu, 3 gpu-gpu links (50 GB/s each) for other GPUs or network (3 gpu or 2 gpu + 1 network), and one cpu-cpu link (36 GB/s) for connection to the cpu (off package). This seems to need 7x 50 GB/s links and 1x 36 GB/s link per GPU. That is a bit of die area, so moving that to a stacked die seems like a good idea. Also, they would want to use the same chiplets everywhere, so putting all of these links on the compute die itself doesn’t make that much sense. It would also waste die area on the compute die which generally uses the latest and greatest (and most expensive) process tech. Not all of the interfaces would be needed on all products, but splitting them out and making it on a cheaper node can be a win since you are wasting cheap silicon rather than expensive silicon.
It gets very difficult to speculate once 2.5D and 3D stacking become common since there are a lot of possibilities. This is more Zen 5 speculation rather than Zen 4. Although, we don’t seem to know exactly what Bergamo or Siena will actually be at this point. I was hoping for stacking with Bergamo, but is seems very unlikely to be anything other than a normal Genoa IO die. Also still hard to tell whether Siena will be salvage die only or a new IO die layout. With 64-cores, they have to do something more complicated than just half of a Genoa IO die. The 4 quadrants are somewhat independent, but it would only be able to support 6 chiplets with 2 quadrants, not 8. That means a completely new layout with a lot of units removed would be necessary.