The more I consider this situation, the more I think that GloFo 14LPP yields are still very poor and that AMD is using silicon that really should have been cut down to a lower SKU or discarded in the RX 480 reference cards. I suspect the best bins are being held back for mobile/AIO products (especially for Apple). And I wouldn't be at all surprised if the next tier down was sold to AIBs, with the reference cards getting the worst, most marginal chips.
What I can't piece together is why many of the cards out there don't seem to be running optimally. The release-day slide deck made a big deal of features like adaptive clocking and boot-time power supply calibration. I was under the impression that this was supposed to eliminate the need for AMD to overvolt everything out of the box, but that doesn't seem to be the case; multiple users report substantial power reductions (and less throttling!) with no loss of stability by undervolting in the new WattMan software. But why isn't this being done automatically using the new features? There would still be inconsistency if they want to maximize yields, but at least the cards that are physically capable of better economy would be able to achieve it out of the box. Could the new features simply be broken or inoperable? That would be incredibly embarrassing for AMD if true, that such an oversight could get past pre-release quality control, but at least it means that the existing cards could be made much more efficient with BIOS and/or driver updates.
GloFo should improve their process in time - everyone does, and they've been able to eventually get the 32nm and 28nm processes well-refined despite some early stumbles. By the time AMD rebrands P10 next year, parametric yields should have improved enough to be able to provide a decent boost to efficiency, performance, or both.
Various tests indicate that RX 480 is bottlenecked by memory bandwidth, so I'm not quite clear why it was set to 1266 MHz. Comparing with rumors and early benchmarks, it looks like clocks were increased fairly late in the production process. Why? Obviously the cooler and board were designed for something closer to 110-120W than 150-160W. It would be interesting to see how the card does if it was downclocked to, say, 1150 MHz, with concomitant voltage reduction. I bet that perf/watt would go way up.