Okay, time for a reality check.
Let's lay down a base context first:
* AMD's R&D budget has been constrained a lot in past 10 years
* AMD started working on znver2 probably in 2013 or 2014
* Naples was the first iteration of the Zen server line - its roots are back to 10h Magny Cours
* Naples use Zeppelin die which is reused in client Ryzen and Threadripper; Zeppelin CCX is reused in Raven APU
* back in summer 2017 there was no Rome but a
Starship instead - 48c/96t znver2
* later in 2017 the reliable
Canard PC specified "EPYC 2" as 64c, 256MB L3 (4MB per core), PCIe4
* nowadays Charlie@SA has been happy with current Rome config, AMD is confident, etc.
So Rome seems to be a 64c chip or a bit less likely a 48c one.
Now, let's introduce the current "rumor mill" favorite plan aka chiplets. According to
a youtuber, the Rome top SKU consists of 9 chips - 1 IO and 8 compute. Details are sparse, but it seems the IO chip would be manufactured at an older process than the compute ones. This idea was further detailed in the diagram posted by OP.
== Naples scaled ==
* double L3 per core - Keep the traffic levels down.
* 8 cores in a CCX - The core interconnect can't probably be a Nehalemish xbar but for instance a SandyBridge's ring bus or whatever. It adds complexity (as in Sandy in 2011) and requires a special CCX for APUs.
* 2 CCXs on a die - This opens up possibilities for a nice TR and scaled down Ryzens. At the same time it keeps the level of complexity down - identical CCXs. Uniform intercore latency for ubiquitous 8c is a nice bonus.
* 4 dies on a package - Simply keep the socket, NUMA mapping, etc. the same.
=> Major investments are: new CCX for APUs, redone intra CCX interconnect and cutting-down Ryzens.
== The chiplets ==
* 8 cores in a CCX - The same issues as above. 8c intercore latency also the same.
* New type of "low latency" interconnect - low latency, a super-high power efficiency (all traffic past L3 goes out of the chip, back to the IO chip, then to RAM) => R&D
* The IO ccMaster - dealing with traffic from all 64c at low latency => R&D
* L4 - R&D
* IO chip itself - can it be reused for ordinary Ryzens - 1x IO + 1x compute? Wasting server-grade IO and L4 for desktop? A different die?
=> Major investments: ???
Now, it's time to lay the Occam's razor. The chiplet solution vs an ordinary one.
Does it make sense to throw away the Magny-Naples know-how given the budget? Mind you, this was really a decision made back in ~2014 (the times when Kaveri struggled with its crippled fw).
Does it make sense to reject znver1 and go to a super radical design which nobody has ever tried in x86 world with an evolution arch revision (znver2)?
Are you sure you can justify the power when going in/out to NB all the time? The same for minimal latency. Can you scale the IO ccMaster, etc.?
Are the benefits worth it? UMA, yields, etc.