GP100 indeed has 60 shader modules, or SM's in other words. They indeed contain only 64 CUDA cores, compared to 128 in Maxwell and Consumer Pascal.
The problem is, that GP100 indeed will be twice as fast. The amount of resources available to each 64 core is doubled compared to Maxwell and consumer Pascal(consumer Pascal indeed is Maxwell architecture on 16 nm TSMC). Nvidia did this before.
Kepler - Maxwell - Pascal. The amount of resources available for each number of cores is exactly the same, but the number of cores that has access to that level of resources is different.
192 cores for Kepler, 128 for Maxwell, 64 for GP100. Nvidia have said that 128 Maxwell cores have 90% of performance of Kepler's 192 cores, because of this very reason. So 64 cores in GP100 will have 90-100% of 128 Maxwell/Consumer Pascal cores.
End of off-top.
The only resources that are doubled per CUDA core, relative to GP102, are the registry sizes, and that alone won't allow 64 CUDA cores in GP100 to get anywhere near 128 CUDA cores in GP102, when it comes to gaming (which is what we're talking about here). Bandwidth per core also goes up, but not double (from 480 GB/s to 732 GB/s or a 52.5% increase), so does onboard shared memory (from 96KB for 128 cores to 64KB for 64 cores, or an increase of 33%).
So no, the 64 cores in GP100 will not have 90-100% the performance of 128 cores in GP102, at best it will have 50-60% the performance, mainly due to the increase in memory bandwidth (in reality it will probably have less than 50% the performance, since GP100 likely can't clock as high as GP102).
That sort of very big die can be (reasonably safely I think) logically presumed to be due with big Volta, due out whenever. (+- 2018ish.). There isn't very much power headroom in GP102 for a gaming GPU - especially not fully enabled - so they've got to up the efficiency a chunk for that to make sense as an idea.
I absolutely agree with you that Volta will probably be the next performance jump we see from Nvidia, and not a 600 mm2 gaming Pascal GPU .
With that being said though I do think that there is theoretically a bit more headroom than one might initially think. Obviously with GP102 at 471 mm2 and a TDP of 250W, there is very little room, however this theoretical 600 mm2 GPU would probably need a 512 bit memory interface, at which point Nvidia might as well switch over to HBM2. This would both leave a bit more room on the die for shaders and whatnot (40-44 SM or 33-47% more than GP102), plus it would lower power usage (512bit GDDR5 at 8GHz switched over to 4096 bit HBM1 at 1 GHz
allegedly saves 55 W, GDDR5X to HBM2 would likely be in the same ballpark), thus hopefully keeping the whole thing within 300 W.
But again this is all theorycrafting, and I don't really see this happening.