TR has plenty of cache for the total package, but, we're looking at very specific use cases here. A large L4 cache for each die would enable each die to keep a local copy of the contents of each remote L3 cache local to each die. This would alleviate the strain on the IF links between the various Die and reduce the power demand on it as a result. It would also alleviate the issues that having distant RAM on TR-WX packages can cause. Yes, there would be invalidation and copy traffic, but, in most use cases, that would be a "less than frequent" case for a properly NUMA aware scheduler OS. Now, will it be expensive? You betcha! Will it be worth the cost at 12nm? Absolutely not. At 7nm? Maybe on 7nm+ for a hypothetical die that's targeted at EPYC/TR/HEDT AM4. I think that as AMD gets better market penetration and volume on their products, they will also have enough revenue to have three different actively developed die: Power and Size optimized Ryzen Mobile (current Raven Ridge), Clock optimized Ryzen Desktop (current Zepplin) for AM4/TR-X, and balanced Ryzen HCC for EPYC/TR-WX/HEDT AM4. Mobile and Desktop can stay at GloFo and Server can live at TSMC, each on a process node that fits their use case. With a separate HCC die, they can afford to make tweaks to it that address these shortcomings, such as adding extra CCXs and an L4 cache.