I expect that Pascal will be a relatively incremental improvement over Maxwell. Basically, third-generation Maxwell with HBM2 support and some extra features (like the Mixed Precision mode touted by JHH). The biggest gains will come from the 16nm/FinFET+ die shrink.
Given the fact that the pace of node shrinks has slowed way down, I expect to see Nvidia transition to a "Tick-Tock" cadence, much like Intel has: using a new node with a slightly modified shrink of a proven architecture (Tick), then developing a new architecture once that node is fully mature (Tock). Releases will probably be on roughly a two-year schedule, so assuming Pascal arrives in 2016, I don't expect to see Volta until 2018. This is based on the fact that 28nm will have lasted for over four years by the time the FinFET processes are viable for GPU production, and a prudent corporation must assume that TSMC's execution delays will continue to be an issue. By switching to a new architecture half way through a node's life cycle, Nvidia can continue to sell upgrades that aren't just rebrands. After all, Maxwell was a massive hit, to which AMD didn't really have any coherent response. On the R&D side, only having to tackle either a new node or an innovative architecture, rather than both at once, should help prevent Fermi-style delays from happening again.
I think there's also a good chance that Nvidia will adopt a tradition of focusing on Double Precision performance on the top chip only one out of every two generations. We've already seen them skip Double Precision on Maxwell, relying instead on a Kepler refresh (GK210) to fill that role. Pascal will have Double Precision support on the big-die professional cards (and maybe Titan Y or whatever they call it), but I think that Volta will probably omit it, and instead be a product focused on gaming and Single Precision throughput like Maxwell.
Nvidia's 28nm GPUs averaged about 12.5 million transistors per square millimeter, though the exact density varies based on a number of factors. The general assumption is that 16nm FinFET+ will give roughly double the transistor density of 28nm, so a reasonable estimate is that the new chips will average about 25 million transistors per square millimeter. The number of transistors per CUDA core varies, but tends to fluctuate around 2.5 million on the 28nm products. Part of the variance is due to fixed-function blocks (which take up a larger proportion of die space on the smaller chips) and memory controllers. HBM2 should drastically reduce the die space needed for memory controllers, but I don't think Nvidia is going to use it across the board on all chips. For the less expensive SKUs, it may not yet be economical to do so. My guess is that the two biggest chips get HBM2, but their lesser cousins must remain on GDDR5 for the time being.
All that having been said, here are my predictions about the Pascal chip lineup:
GP107: 150 sq. mm. die, 3.75 billion transistors, 1280 CUDA cores. 192-bit GDDR5 memory bus. The GP107-based cards will come with 3GB of RAM, and have a TDP of about 65 watts, thus not requiring an external power connector. They will start around $199, and drop to $149 once they've been on the market for a while and the process becomes less expensive. Performance will be roughly 30%-40% better than the GTX 960.
GP106: 225 sq. mm. die, 5.6 billion transistors, 2048 CUDA cores. 256-bit GDDR5 memory bus. The GP106-based cards will come with 4GB of RAM, and have a TDP of about 120 watts, requiring one 6-pin PCIe power connector. Most likely, the arrival will come after the GP107 card has been on the market for a while, and will debut at $199, thus triggering the GP107 card's price drop to $149. Performance will be slightly better than the GTX 980 (maybe 10% improvement).
GP104: 360 sq. mm. die, 9.0 billion transistors, 4096 CUDA cores. 3072-bit HBM2 memory bus with 6GB of RAM in three 4-high stacks on the interposer (Quadro versions will offer up to 12GB using either 8-high stacking or higher density RAM chips). TDP will be around 180 watts, and the cards will have one 8-pin plus one 6-pin PCIe power connector. This will probably be the first consumer-focused Pascal product to hit the market, though professional GP100 products may make an appearance first. At debut, I expect Nvidia to price similarly to other mid-size chips that temporarily take the flagship role: $549 for the full-fat version, and probably about $379 for the cut-down SKU. Depending on yields, there may be a third-tier salvage part as well, maybe starting around $299. Those prices will eventually drop when yields improve and GP100 makes its consumer debut. Performance will be considerably better than any of today's single-GPU cards, probably outclassing the Titan X by 40%-50%. (30% or so for the cut-down version)
GP100: 550 sq. mm. die, 13.75 billion transistors, 6144 CUDA cores. Double Precision support at 1/2 on Titan, Quadro, and Tesla; 1/32 on GeForce. 4096-bit HBM2 memory bus with 8GB of RAM in four 4-high stacks on the interposer (Titan and Quadro versions will offer 16GB using either 8-high stacking or higher density RAM chips, and Tesla cards may offer up to 32GB). TDP will be around 250 watts, with one 8-pin plus one 6-pin PCIe power connector. It's rumored that this chip may already have taped out. Nonetheless, as has been the case in the past, I expect Nvidia to hold this chip back from the consumer market for a while, focusing on the far more lucrative Tesla sales at first. Again following precedent, when it does arrive on the consumer market, it will first do so in a $999 Titan card (Titan II? Titan Y?) After about three months, it will then appear in a cut-down version in a $649 consumer card. Depending on competition from AMD, the full-fat version (lacking only Double Precision support) may appear later on in a consumer card at the $749 price point. In terms of performance, we'll be looking at no less than a full doubling of the Titan X's power.
What does everyone else think? Do these speculations seem plausible?
Given the fact that the pace of node shrinks has slowed way down, I expect to see Nvidia transition to a "Tick-Tock" cadence, much like Intel has: using a new node with a slightly modified shrink of a proven architecture (Tick), then developing a new architecture once that node is fully mature (Tock). Releases will probably be on roughly a two-year schedule, so assuming Pascal arrives in 2016, I don't expect to see Volta until 2018. This is based on the fact that 28nm will have lasted for over four years by the time the FinFET processes are viable for GPU production, and a prudent corporation must assume that TSMC's execution delays will continue to be an issue. By switching to a new architecture half way through a node's life cycle, Nvidia can continue to sell upgrades that aren't just rebrands. After all, Maxwell was a massive hit, to which AMD didn't really have any coherent response. On the R&D side, only having to tackle either a new node or an innovative architecture, rather than both at once, should help prevent Fermi-style delays from happening again.
I think there's also a good chance that Nvidia will adopt a tradition of focusing on Double Precision performance on the top chip only one out of every two generations. We've already seen them skip Double Precision on Maxwell, relying instead on a Kepler refresh (GK210) to fill that role. Pascal will have Double Precision support on the big-die professional cards (and maybe Titan Y or whatever they call it), but I think that Volta will probably omit it, and instead be a product focused on gaming and Single Precision throughput like Maxwell.
Nvidia's 28nm GPUs averaged about 12.5 million transistors per square millimeter, though the exact density varies based on a number of factors. The general assumption is that 16nm FinFET+ will give roughly double the transistor density of 28nm, so a reasonable estimate is that the new chips will average about 25 million transistors per square millimeter. The number of transistors per CUDA core varies, but tends to fluctuate around 2.5 million on the 28nm products. Part of the variance is due to fixed-function blocks (which take up a larger proportion of die space on the smaller chips) and memory controllers. HBM2 should drastically reduce the die space needed for memory controllers, but I don't think Nvidia is going to use it across the board on all chips. For the less expensive SKUs, it may not yet be economical to do so. My guess is that the two biggest chips get HBM2, but their lesser cousins must remain on GDDR5 for the time being.
All that having been said, here are my predictions about the Pascal chip lineup:
GP107: 150 sq. mm. die, 3.75 billion transistors, 1280 CUDA cores. 192-bit GDDR5 memory bus. The GP107-based cards will come with 3GB of RAM, and have a TDP of about 65 watts, thus not requiring an external power connector. They will start around $199, and drop to $149 once they've been on the market for a while and the process becomes less expensive. Performance will be roughly 30%-40% better than the GTX 960.
GP106: 225 sq. mm. die, 5.6 billion transistors, 2048 CUDA cores. 256-bit GDDR5 memory bus. The GP106-based cards will come with 4GB of RAM, and have a TDP of about 120 watts, requiring one 6-pin PCIe power connector. Most likely, the arrival will come after the GP107 card has been on the market for a while, and will debut at $199, thus triggering the GP107 card's price drop to $149. Performance will be slightly better than the GTX 980 (maybe 10% improvement).
GP104: 360 sq. mm. die, 9.0 billion transistors, 4096 CUDA cores. 3072-bit HBM2 memory bus with 6GB of RAM in three 4-high stacks on the interposer (Quadro versions will offer up to 12GB using either 8-high stacking or higher density RAM chips). TDP will be around 180 watts, and the cards will have one 8-pin plus one 6-pin PCIe power connector. This will probably be the first consumer-focused Pascal product to hit the market, though professional GP100 products may make an appearance first. At debut, I expect Nvidia to price similarly to other mid-size chips that temporarily take the flagship role: $549 for the full-fat version, and probably about $379 for the cut-down SKU. Depending on yields, there may be a third-tier salvage part as well, maybe starting around $299. Those prices will eventually drop when yields improve and GP100 makes its consumer debut. Performance will be considerably better than any of today's single-GPU cards, probably outclassing the Titan X by 40%-50%. (30% or so for the cut-down version)
GP100: 550 sq. mm. die, 13.75 billion transistors, 6144 CUDA cores. Double Precision support at 1/2 on Titan, Quadro, and Tesla; 1/32 on GeForce. 4096-bit HBM2 memory bus with 8GB of RAM in four 4-high stacks on the interposer (Titan and Quadro versions will offer 16GB using either 8-high stacking or higher density RAM chips, and Tesla cards may offer up to 32GB). TDP will be around 250 watts, with one 8-pin plus one 6-pin PCIe power connector. It's rumored that this chip may already have taped out. Nonetheless, as has been the case in the past, I expect Nvidia to hold this chip back from the consumer market for a while, focusing on the far more lucrative Tesla sales at first. Again following precedent, when it does arrive on the consumer market, it will first do so in a $999 Titan card (Titan II? Titan Y?) After about three months, it will then appear in a cut-down version in a $649 consumer card. Depending on competition from AMD, the full-fat version (lacking only Double Precision support) may appear later on in a consumer card at the $749 price point. In terms of performance, we'll be looking at no less than a full doubling of the Titan X's power.
What does everyone else think? Do these speculations seem plausible?