Die stacking is still a non-trivial expense and still adds to the power draw of the CCD/cache combo. It is strongly rumored that mobile and desktop will be using the same CCDs, with server and low end mobile being different. If that's the case, they still needs there to be product differentiation up and down the stack. This means that they need to be able to use CCDs that don't meet binning requirements in other ways, and having any L3 at all on the CCD is still important to enable that.
Since SRAM is scaling slower than logic, it makes sense to reduce it on the CCD going forward, but it still needs to be sufficient to the task. 2MB per core is adequate for lower stack items, and is also easily divisible in binary. It also follows AMD's mobile strategy.
I do suspect that the 3d cache die will grow in capacity from switching to N4C, and suspect that it will be 128MB, giving a total of 152MB of L3 per CCD stack. That's an important mark to hit if Intel tops out at a rumored 144MB.