Nvidia Pascal Lineup Speculation

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
I expect that Pascal will be a relatively incremental improvement over Maxwell. Basically, third-generation Maxwell with HBM2 support and some extra features (like the Mixed Precision mode touted by JHH). The biggest gains will come from the 16nm/FinFET+ die shrink.

Given the fact that the pace of node shrinks has slowed way down, I expect to see Nvidia transition to a "Tick-Tock" cadence, much like Intel has: using a new node with a slightly modified shrink of a proven architecture (Tick), then developing a new architecture once that node is fully mature (Tock). Releases will probably be on roughly a two-year schedule, so assuming Pascal arrives in 2016, I don't expect to see Volta until 2018. This is based on the fact that 28nm will have lasted for over four years by the time the FinFET processes are viable for GPU production, and a prudent corporation must assume that TSMC's execution delays will continue to be an issue. By switching to a new architecture half way through a node's life cycle, Nvidia can continue to sell upgrades that aren't just rebrands. After all, Maxwell was a massive hit, to which AMD didn't really have any coherent response. On the R&D side, only having to tackle either a new node or an innovative architecture, rather than both at once, should help prevent Fermi-style delays from happening again.

I think there's also a good chance that Nvidia will adopt a tradition of focusing on Double Precision performance on the top chip only one out of every two generations. We've already seen them skip Double Precision on Maxwell, relying instead on a Kepler refresh (GK210) to fill that role. Pascal will have Double Precision support on the big-die professional cards (and maybe Titan Y or whatever they call it), but I think that Volta will probably omit it, and instead be a product focused on gaming and Single Precision throughput like Maxwell.

Nvidia's 28nm GPUs averaged about 12.5 million transistors per square millimeter, though the exact density varies based on a number of factors. The general assumption is that 16nm FinFET+ will give roughly double the transistor density of 28nm, so a reasonable estimate is that the new chips will average about 25 million transistors per square millimeter. The number of transistors per CUDA core varies, but tends to fluctuate around 2.5 million on the 28nm products. Part of the variance is due to fixed-function blocks (which take up a larger proportion of die space on the smaller chips) and memory controllers. HBM2 should drastically reduce the die space needed for memory controllers, but I don't think Nvidia is going to use it across the board on all chips. For the less expensive SKUs, it may not yet be economical to do so. My guess is that the two biggest chips get HBM2, but their lesser cousins must remain on GDDR5 for the time being.

All that having been said, here are my predictions about the Pascal chip lineup:

GP107: 150 sq. mm. die, 3.75 billion transistors, 1280 CUDA cores. 192-bit GDDR5 memory bus. The GP107-based cards will come with 3GB of RAM, and have a TDP of about 65 watts, thus not requiring an external power connector. They will start around $199, and drop to $149 once they've been on the market for a while and the process becomes less expensive. Performance will be roughly 30%-40% better than the GTX 960.

GP106: 225 sq. mm. die, 5.6 billion transistors, 2048 CUDA cores. 256-bit GDDR5 memory bus. The GP106-based cards will come with 4GB of RAM, and have a TDP of about 120 watts, requiring one 6-pin PCIe power connector. Most likely, the arrival will come after the GP107 card has been on the market for a while, and will debut at $199, thus triggering the GP107 card's price drop to $149. Performance will be slightly better than the GTX 980 (maybe 10% improvement).

GP104: 360 sq. mm. die, 9.0 billion transistors, 4096 CUDA cores. 3072-bit HBM2 memory bus with 6GB of RAM in three 4-high stacks on the interposer (Quadro versions will offer up to 12GB using either 8-high stacking or higher density RAM chips). TDP will be around 180 watts, and the cards will have one 8-pin plus one 6-pin PCIe power connector. This will probably be the first consumer-focused Pascal product to hit the market, though professional GP100 products may make an appearance first. At debut, I expect Nvidia to price similarly to other mid-size chips that temporarily take the flagship role: $549 for the full-fat version, and probably about $379 for the cut-down SKU. Depending on yields, there may be a third-tier salvage part as well, maybe starting around $299. Those prices will eventually drop when yields improve and GP100 makes its consumer debut. Performance will be considerably better than any of today's single-GPU cards, probably outclassing the Titan X by 40%-50%. (30% or so for the cut-down version)

GP100: 550 sq. mm. die, 13.75 billion transistors, 6144 CUDA cores. Double Precision support at 1/2 on Titan, Quadro, and Tesla; 1/32 on GeForce. 4096-bit HBM2 memory bus with 8GB of RAM in four 4-high stacks on the interposer (Titan and Quadro versions will offer 16GB using either 8-high stacking or higher density RAM chips, and Tesla cards may offer up to 32GB). TDP will be around 250 watts, with one 8-pin plus one 6-pin PCIe power connector. It's rumored that this chip may already have taped out. Nonetheless, as has been the case in the past, I expect Nvidia to hold this chip back from the consumer market for a while, focusing on the far more lucrative Tesla sales at first. Again following precedent, when it does arrive on the consumer market, it will first do so in a $999 Titan card (Titan II? Titan Y?) After about three months, it will then appear in a cut-down version in a $649 consumer card. Depending on competition from AMD, the full-fat version (lacking only Double Precision support) may appear later on in a consumer card at the $749 price point. In terms of performance, we'll be looking at no less than a full doubling of the Titan X's power.

What does everyone else think? Do these speculations seem plausible?
 

dave1029

Member
May 11, 2015
94
1
0
If that GP100 prediction is anywhere close to being accurate, I'm going to be all over that [redacted] I don't care that I already have two Titan X's.


Profanity isn't allowed in VC&G.

- Elfear
 
Last edited by a moderator:

NTMBK

Lifer
Nov 14, 2011
10,322
5,352
136
No way the GP106 will have 256-bit GDDR5. Too much die area, and too power hungry. I'd expect it to have 2 HBM2 stacks.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
I agree with the basic premise that Pascal will be an incremental improvement architecturally to Maxwell. Maxwell + mixed precision + NVLink + 1/2 rate fp64 (for GP200 alone). I don't expect GP200 to launch before Q3 2017.

With HBM2 Nvidia can get rid of those massive L2 caches and cut die size. btw HBM2 brings 4 times the capacity and twice the bandwidth of HBM1. So my guesses are

GP204 - 4096 sp, 128 ROPs, 2048 bit HBM2 memory bus, 8 GB HBM2, 300-320 sq mm. 200w TDP. (will power atleast 3 GPUs)
GP206 - 2048 sp, 64 ROPs, 1024 bit HBM2 memory bus, 4 GB HBM2, 200 - 220 sq mm. 100w TDP.
GP208 - 1024 sp, 32 ROPs, 128 bit GDDR5, 100 sq mm. 50W TDP.

Volta will be a true grounds up new architecture built on mature 16/14nm FINFET and come sometime in H2 2018.
 
Last edited:

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
With HBM2 Nvidia can get rid of those massive L2 caches and cut die size. btw HBM2 brings 4 times the capacity and twice the bandwidth of HBM1. So my guesses are

The L2 cache is tiny. Its 2 MB on GM204. Maybe 5 mm^2 at the absolute most and likely far smaller.
 

Genx87

Lifer
Apr 8, 2002
41,091
513
126
I'd guess L2 cache is also much faster than HBM2 as well. While memory speeds have increased Intel has added more and more cache to their processors.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,923
3,550
136
I'd guess L2 cache is also much faster than HBM2 as well. While memory speeds have increased Intel has added more and more cache to their processors.

they haven't added much cache we are still in the <10mb for "consumer" CPU's. The issue with big caches is the bigger they are the slower they become so there is a balance between hit rate and latency (or complexity).

Now a GPU on the other hand doesn't care about latency, but caching plays a very important role in terms of power consumption, remember moving data is expensive executing is cheap , you want to keep data as close to the execution units as possible. There is no way NV is going to lower the amount of cache they have, regardless of HBM or not because that will increase power consumption.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
No way the GP106 will have 256-bit GDDR5. Too much die area, and too power hungry. I'd expect it to have 2 HBM2 stacks.

Depends on HBM pricing. Not sure we will see HMB on lower end cards this generation.
 

NTMBK

Lifer
Nov 14, 2011
10,322
5,352
136
Depends on HBM pricing. Not sure we will see HMB on lower end cards this generation.

The pricing should scale the same way that GDDR5 does- lower end parts have fewer memory chips and lower memory capacity, so e.g. a GTX960 spends half as much on memory as a GTX970 does. And interposer cost should scale linearly with area (GPU + HBM area, that is).

Not sure about the fixed cost adders that might arise from changing testing + packaging methods, though.
 

happy medium

Lifer
Jun 8, 2003
14,387
480
126
I think Pascal will be 2x times faster than a gtx980 with 16gb of memory, released Summer/Fall of 2016 @ 700$.
Die shrink + new architecture = speed
 
Last edited:

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
The pricing should scale the same way that GDDR5 does- lower end parts have fewer memory chips and lower memory capacity, so e.g. a GTX960 spends half as much on memory as a GTX970 does. And interposer cost should scale linearly with area (GPU + HBM area, that is).

Not sure about the fixed cost adders that might arise from changing testing + packaging methods, though.
With 20nm GDDR5 (from Samsung) there's still ample room for low/mid range cards to be paired with GDDR5 considering they'll still be than HBM2, also what about GDDR3 -> when will these dinosaurs retire D:
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Your die sizes are way too ambitious. Maybe in the second or 3rd generation of 14/16nm.

Also 4x4 hi stacks means 16GB. Rememebr HBM2=8Gbit dies.

I say 16GB highend, 8GB middle, 4GB lowend.

And dont expect any 8HI stacks. That product can be cancelled just like it did for HBM1.
 
Last edited:

bryanW1995

Lifer
May 22, 2007
11,144
32
91
You basically just doubled all specs.

Maybe NV hired away a few of the D3 devs, they're notorious for taking everything and "doubling" it.

It seems unlikely that either NV or AMD will be able to double current performance just from the node shrink. The only truly transformative gpus that I can recall were 9700 pro and 8800 gtx, it seems a bit premature for the OP to predict the Third Coming.

http://www.anandtech.com/show/970

http://www.anandtech.com/show/2116

Man, it makes me sad to go back and read Anand's articles now. I wonder what he's up to at Apple.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,322
5,352
136
Maybe NV hired away a few of the D3 devs, they're notorious for taking everything and "doubling" it.

It seems unlikely that either NV or AMD will be able to double current performance just from the node shrink. The only truly transformative gpus that I can recall were 9700 pro and 8800 gtx, it seems a bit premature for the OP to predict the Third Coming.

http://www.anandtech.com/show/970

http://www.anandtech.com/show/2116

Man, it makes me sad to go back and read Anand's articles now. I wonder what he's up to at Apple.

To be fair, 14nm should be a big jump from 28nm.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
I think Pascal will be 2x times faster

Node shrink + new architecture + HBM. I think Nvidia has a chance to make the biggest leap in performance per watt than they've ever done since power consumption has become a noticeable metric.

At release, the GTX 680 was about 50% better perf/w than the GTX 560 TI (GF114) at 1080/1200p. It grew to about 70% better perf/w vs. gtx 560 ti during it's lifespan with driver improvements.

At release, Maxwell GM204 GTX 980 was about 70% more efficient than GTX 680 but only about 52% more efficient than GTX 770 (newer GK104) at 1440p. As driver improvements have trickled out, GTX 980 is now about 64% more efficient than GTX 770.

I think out of the gate we'll get higher perf/w improvements than what both Kepler had over Fermi and Maxwell had over Kepler. Assuming Nvidia targets the same TDP with GP104 that GM204 has, getting literally twice the perf/w would yield a chip 55% faster than GTX 980 TI at 4k, which would make it the first single GPU from Nvidia capable of driving 2015 games at 60fps with very little to no compromises.

**All of my perf/w claims come from techpower up graphs from the first reviews of GTX 680 and GTX 980, to the most recent reviews that have all the relevant comparisons.
 
Last edited:

Azix

Golden Member
Apr 18, 2014
1,438
67
91
Guys.... pascal...


pascal...

pascal is a year away.

For the fun of it speculation might be cool, but it's way too far off to start this now.

HBM 2 won't be 8GB for nvidia or AMD I think. If things go as those slides say, one stack out hold a full 8GB. sure you could stop there, but 16GB is just 2 stacks bruh. Actually for lower end GPUs might well just be 4GB or 8GB. higher end over that
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
Your die sizes are way too ambitious. Maybe in the second or 3rd generation of 14/16nm.

Also 4x4 hi stacks means 16GB. Rememebr HBM2=8Gbit dies.

I say 16GB highend, 8GB middle, 4GB lowend.

And dont expect any 8HI stacks. That product can be cancelled just like it did for HBM1.

I don't think the die sizes are out of line. TSMC's 16FF+ process is specifically designed for high performance parts like GPUs, so large dice are expected. And Nvidia is probably going to be the biggest customer for this process (especially since AMD will use GloFo's 14nm FinFET process instead), so there's a good chance that 16FF+ is being designed with their particular needs in mind. And Nvidia likes to do big GPU dice for the top part.

With 28nm, the very first part Nvidia released was the GK104 (294 sq. mm.), and they were clearly already working on GK110 (561 sq. mm.) at that time. The GTX 680 was released in March 2012. Around October, the first GK110-based Tesla cards were installed in the Oak Ridge Titan supercomputer. The Tesla K20X became publicly available in November. All this implies that both the GK104 and GK110 were being developed simultaneously, but the GK104 made it to market first because of the smaller die size. It wouldn't be surprising to see the same thing happen here. As I said in the original post, I think that Nvidia will release the GP104 as their initial consumer flagship and save the GP100 for later - just like they did with Kepler and Maxwell. But I do think the GP100 will appear in Tesla cards as soon as it's ready, and this probably won't be more than 6 to 9 months after GP104 hits the market. And while I may be wrong about the exact sizes, I would be quite surprised if the GP104 was substantially smaller than 300 sq. mm., or GP100 was smaller than 500 sq. mm.

Regarding HBM sizes, isn't it possible to get HBM2 chips in different capacities? Nvidia is definitely going to want to give their Quadro cards more RAM than their GeForce counterparts, and I suspect they'd probably rather not use multiple interposer designs (it would increase costs and complicate binning). If they could just use different capacity HBM2 chips, then the same interposer and GPU design could serve both professional and consumer purposes.

I don't expect to see 16GB of RAM on anything except Titan, Quadro, and Tesla cards. There seems to be a consensus that 6GB is fairly future-proof now; 8GB will probably be fine for consumer cards for the next two to four years at least. Even with SLI, it's going to be hard to hit that cap. Remember, most AAA games are console ports, and the consoles don't allow more than 6GB of combined RAM usage for both CPU and GPU (the remainder is taken by the OS).

There's no reason to believe that 8-Hi stacking will be cancelled for HBM2. This process will see much more widespread use compared to first-generation HBM, which is pretty much a beta project. I expect them to do it right this time.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
I agree with the basic premise that Pascal will be an incremental improvement architecturally to Maxwell. Maxwell + mixed precision + NVLink + 1/2 rate fp64 (for GP200 alone). I don't expect GP200 to launch before Q3 2017.

Reports indicate that Nvidia has already taped out GP100, and TSMC says that they achieved "risk production" on the 16FF+ process in November 2014, with volume production beginning in July 2015.

Skepticism of TSMC's timelines is understandable, but it's starting to look like the delays are finally nearing an end. I think we'll see GP104 in Q1-Q2 2016, and GP100 in Q3 (though probably it will debut only in ultra-expensive Tesla cards). If it doesn't show up until Q3 2017, then that would be an indication that something went very wrong.
 

Prefix-NA

Junior Member
May 17, 2015
8
0
36
The 1080 ti may be 30% better at 4k than the 980ti I don't expect much more Pascal is basically going to be a shrink + HBM gen 2 very little architectural improvements its going to be more focused on HBM & the shrink.

Just expect the top cards to be big improvement the 1080 will be like maybe 10-15% faster than 980 at like 1080/1440p with a $600 tag.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
The 1080 ti may be 30% better at 4k than the 980ti I don't expect much more Pascal is basically going to be a shrink + HBM gen 2 very little architectural improvements its going to be more focused on HBM & the shrink.

Just expect the top cards to be big improvement the 1080 will be like maybe 10-15% faster than 980 at like 1080/1440p with a $600 tag.

This has never been the case in the past though. Drop a node size, increase transistors to make it the same size the old die was, which means you have way more transistors than the previous gen.
 

xpea

Senior member
Feb 14, 2014
451
153
116
up
After GP100 announcement and before Pascal consumer cards, let's review predictions and make new ones...
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |