My guess is, that it will be an all inclusive cache and will therefore grow, to fit the L2 of all cores in CCD (in order not to effectively shrink in size compared to zen 2). This means extra 4MB, provided the L2 remains unchanged - so in total 36MB of L3 per chiplet.
As the L3 latency in Zen2 is already measurably slower than Zen+ (11ns vs 9ns) and unifying the cache will probably make it a tad worse still, I wouldn't rule out L2 being enlarged to compensate it, so 1MB L1 per core + 40MB L3 per Chiplet is also a (less likely) possibility.
If one is to believe the ~15% IPC gain rumors, I think the entire cache hierarchy will be redesigned as the unification of L3 is a major redesign anyway. My (somewhat wild and wishful) predictions for Milan memory hierarchy
in that case are:
- Memory Compression for chiplet-to-chiplet communication at least on server (probably configurable in BIOS). They have issued patents for it a while a go and it would save considerable amount of power (in EPYC and Threadripper) that could be used in the core-chiplets instead of it being wasted transporting data.
- 40MB of all-inclusive L3 cache per CCD (36MB if L2 stays the same)
- 1MB of L2
- 48KB of 12-way L1 Data Cache "Ice Lake Style" (this is the least likely prediciton IMO)
- improvements to the uop cache, so that is competitively shared between SMT threads, rather than statically partitioned (effectively doubling it for lightly threaded loads).