I think full tonga would be hit a little with memory bandwidth penalties.
I don't. The 384-bit bus on Tahiti was probably overkill to begin with, and especially so if faster RAM is used. (The 7970 debuted with GDDR5 running at 1375 MHz.) AMD erred on the side of higher bandwidth than needed, because Tahiti was a dual-use card, intended as much for professional HPC/GPGPU applications as for gaming.
Note that when tested at stock core clocks (1000 MHz in both cases), the R9 280X shows
absolutely no benefit over the 7970 GHz Edition from the additional memory bandwidth given by its faster GDDR5. This indicates that whatever was holding Tahiti back, it wasn't memory bandwidth. With Tonga's delta color compression, a 256-bit bus should be plenty. And if they need a bit more bandwidth, it would be easier to do so by increasing the RAM speed from 1375 to 1500 MHz than to increase the bus width. Maxwell got good results by using narrower buses and faster RAM.
Tahiti, Tonga, and Fiji stick out as having fewer ROPs than they should.
Cape Verde - 640 shaders, 16 ROPs
Pitcairn - 1280 shaders, 32 ROPs
Tahiti/Tonga - 2048 shaders, 32 ROPs
Hawaii - 2816 shaders, 64 ROPs
Fiji - 4096 shaders, 64 ROPs
It's clear that the highlighted parts are being shorted in ROP count, and with Fiji we can immediately tell how it affects performance. I think if Tonga had 64 ROPs, its performance would be quite a bit closer to GM204 at 1080p. If Fiji had 128 ROPs, it would likely not show the embarrassing performance deficit it exhibits at that same resolution.
AMD seems to like strapping huge shader arrays to their chips. It helped with tahiti/pitcarin with tahiti pulling away as games became more shader intensive.
Pitcairn is actually one of their better balanced chips. Take the same number of shaders, TMUs, and ROPs, engineer it with GCN 1.2, do some tweaks for better clocks (pipeline length increase?), update the fixed-function blocksm and put it on the 14nm FinFET process, and you'd have something reasonably competitive at the low end.