The MS Team clearly thought 256bits GDDR6 bandwidth wasn't enough to feed 12tflops RDNA2 + Zen 2 (which shouldn't take much in itself) , hence the awkward dual bandwidth memory solution. Curious to see how AMD plans to feed 20+ tflops RDNA2 SKUs on 256bit GDDR6 if that's true.
I'd guess that's because of the new overall data system, tied to implementing NAND. They probably need some dedicated channels to manage the data just to/from that.
And the point would be? I don't see a good reason to use up more die space for 128MB L2 or L3 instead of adding another 128bit GDDR6 memory controller + PHY, when the effect would be the same at best.
And your evidence showing that's the same? I'd guess the weird setup in the consoles is due to the NAND, where they probably need some channels linked to it (serving as buffer). Plus the consoles have specialized compression/decompression blocks, so its possible that Microsoft wants all the bandwidth they can get since they're working with a total pool of memory comparable to just the GPU memory (where with dGPU they'll use that extra memory to just keep it in VRAM versus juggling it to and from the SSD like the new consoles will apparently be doing so you can switch between games quickly).
I'm catching up now so has it been confirmed that's what the cache is? If not, then it seems odd to be making arguments without knowing. If it is, I think you have to also consider that RDNA2 isn't meant to be the final implementation (people talk about it as though it is revolutionary, but while it is obviously very significant, keep in mind, its an iteration on their GPU development path). So, its possible that the cache implementation is the start and we'll see it change in the future, where it later becomes a separate chip altogether, or integrated into I/O or some other chiplet. Which, I think that was the idea/plan for HBM (where it would function both as cache and memory), but for whatever reason it didn't work out that way.
As for why the consoles wouldn't have it, costs would be a big reason. Its why console versions of things tend to have smaller caches and the like, its an easy cost cutting part, where the highly leveraged programming of the consoles mixed with other limitations mitigates that somewhat.
One last bit. I personally have a hunch that Microsoft actually had a stronger chip (I think they were looking at 15TF possibly room to push higher) planned, but reined it in when they saw the PS4 was going to be substantially below them. Which, perhaps they kept CU count low (maybe they were looking at 60-64CU or something) for yields, or maybe they plan on iterating more quickly (since new consoles still won't apparently be quite enough for flawless 4K, and then add in ray-tracing), where perhaps they can add CUs without messing with memory configs or anything.