Nvidia says a lot of things, not all of them true. Of course they want to play up the amount of R&D they put into Pascal, so they can justify increasing the price point for medium-sized chips yet again.
They did put a lot of R&D into Pascal, and that R&D has clearly paid off in terms of very high clocks and an efficient, compact design.
Don't forget that the increased clock speeds of Pascal do come with a cost: lower shader density. 2560 shaders for a 314mm^2 chip on 16nm FinFET really isn't that high. Nvidia presumably would have had lower clocks if they'd packed the shaders more densely on the die, as AMD did.
GP104 features more TMUs than Polaris 10 (160 vs 144), twice the ROPs (64 versus 32), and there are obviously other parts of the GPU that aren't related to shader count (Polymorph engine, Simultaneous Multiprojection block, GDDR5X controller, etc.) that may add to the area/xtor count while not ballooning shader count.
In this particular case it appears the trade-off was worth it. But it's closer than you think. GTX 1080 peaks at ~8.9 TFlops at max default boost clock. RX 480 with its 232mm^2 die peaks at ~5.8 TFlops at max default boost. This means Polaris 10 has ~65% of the raw computing power of GP104, at ~74% of the die size. The reason Nvidia comes out much further ahead than that is because Nvidia's drivers and architecture are much better at translating TFlops into real-world gaming performance - at least in DX11.
As I said above, there's more to gaming performance than just raw FLOPs.
In terms of xtor density NVIDIA put 7.2 billion xtors in a 314mm^2 area, while AMD put 5.7 billion in 232mm^2. AMD's chip has ~24.57 million transistors/mm^2, while NVIDIA's is at ~23 million/mm^2.
AMD's design is slightly denser, but the slight areal disadvantage that NVIDIA has is more than offset by the perf/mm^2 advantage that NVIDIA has.