NVIDIA Pascal Thread

Page 38 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Yes, you are right on that, it does not use FP32 to get FP64, my bad. But it still is also not 53something GPU. It is still 3584 CUDA core GPU.

P.S. They still show DP on Maxwell and Kepler it should tell you how to understand Pascal also.

For what it's worth using FP32 cores to run FP64 is how it was done with Fermi and earlier Nvidia architectures (and how it is still done by AMD with GCN), but ever since Kepler, Nvidia has utilized separate FP64 cores to run DP, and the FP32 cores are incapable of running DP (even at reduced rate like Fermi), moreover the FP64 cores are also incapable of running SP.

So the GPU is very much a 5376 core GPU since the are 5376 distinct physical cores on there. however only 3564 cores are available for SP stuff (and games), and only 1792 cores are available for DP. The only way to make use of all 5376 cores would be to run both SP and DP stuff at the same time, but I'm not sure if the front end allows for that.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
3584 CUDA cores.

If you read the NV article, GP100 is 3840 CUDA cores / 240 Texture Units. 64 CUDA cores/SM with 4 TMUs per SM. Sounds like the fully unlocked GP100 is 60 SMs and they disabled 4 SMs to increase yields (Tesla P100 is 56/64 with corresponding 224 / 240 TMUs).

If we ignore overclocking, it means 1480mhz 3840 full chip is 72% faster than 3072 CUDA core Titan X with 1075mhz Boost. They key questions are how high will a stock GP100 GeForce be? And how much overclocking headroom does it have? We know Titan X can hit 1400-1450mhz (air), with 980Ti in the 1500+ range.

Stock vs. stock though should wipe the floor with GM200. Also, the full GP100 should have 1TB/sec HBM2 vs. 720GB/sec on the Tesla P100.

This reminds me of the original Titan launch where they didn't release the full Big chip right away.

610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.

What's interesting is NV is gunning for clocks while hardly changing that amount of CUDA cores. 780Ti was 2816, TX was 3072 and now they are at 3840. If someone told me 3 years ago that after Kepler's 2816 CUDAs we'd be on 3840 with Pascal, I wouldn't have believed it. Looks like the FP64 and massive focus on compute ate a TON of transistor space on GP100; hence they had went all in with Boost clocks to compensate. Pretty impressive considering it's usually harder to clock larger chips higher but NV keeps breaking this rule over and over

Seems AMD has been going the opposite way -- low clocks but wider and wider chip. Will be interesting to see if AMD reverses course and sticks with 4096 SPs for Vega or goes 5000-6000 SPs with much lower 1100 clocks.
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,587
1,748
136
This thing is an engineering marvel. Would be willing to pay $$ for a pair of these for gaming!

Get another four guys together, and you can go quarters on a DGX1.

I agree though, it's an amazing piece of technology. For gamers expecting a successor to Maxwell it may end up being a letdown, but for the market they're targeting it's a massive step forward.
 

IllogicalGlory

Senior member
Mar 8, 2013
934
346
136
AMD themselves stated they have "uplifted" the frequency in Polaris, and/or Vega so 1.6 ~1.7 Ghz OC for Pascal is not out of reach.

I think Pascal is surely impressive, but not to the degree Maxwell compared to Kepler.
They already needed to use 300W to hit the clocks we're seeing now. The Kepler and Maxwell Teslas used 235W/250W respectively, with clocks not being that much lower than the desktop counterparts, especially for Maxwell. How much power is a 1700MHz GPU going to need?
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Guys, if I'm correct Pascal will only bring around 75 % increase in gaming performance. Looking at gm200 to P100 the GFLOPS is only 75 % increase.

I'm not a computer expert but is this correct?

Maybe it will or maybe it won't ...

If the geometry processing is not a bottleneck there should be a 70% uplift in performance ...
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
If you read the NV article, GP100 is 3840 CUDA cores / 240 Texture Units. 64 CUDA cores/SM with 4 TMUs per SM. Sounds like the fully unlocked GP100 is 60 SMS and they disabled 4 SMs to increase yields (Tesla P100 is 56/64 with corresponding 224 / 240 TMUs).

If we ignore overclocking, it means 1480mhz 3840 full chip is 72% faster than 3072 CUDA core Titan X with 1075mhz Boost. They key questions are how high will a stock GP100 GeForce be? And how much overclocking headroom does it have? We know Titan X can hit 1400-1450mhz (air), with 980Ti in the 1500+ range.

Stock vs. stock though should wipe the floor with GM200. Also, the full GP100 should have 1TB/sec HBM2 vs. 720GB/sec on the Tesla P100.

This reminds me of the original Titan launch where they didn't release the full Big chip right away.

610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.

JHH said that GP100 will have 10.6 TFLOPs FP32, and that is the amount you get with 1480 MHz and 3584 CUDA cores. So either two versions of the GPU will have the same compute power, or something is wrong with their specs.
 

Maverick177

Senior member
Mar 11, 2016
411
70
91
They already needed to use 300W to hit the clocks we're seeing now. The Kepler and Maxwell Teslas used 235W/250W respectively, with clocks not being that much lower than the desktop counterparts, especially for Maxwell. How much power is a 1700MHz GPU going to need?

Remember Maxwell?
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
If you read the NV article, GP100 is 3840 CUDA cores / 240 Texture Units. 64 CUDA cores/SM with 4 TMUs per SM. Sounds like the fully unlocked GP100 is 60 SMS and they disabled 4 SMs to increase yields (Tesla P100 is 56/64 with corresponding 224 / 240 TMUs).

If we ignore overclocking, it means 1480mhz 3840 full chip is 72% faster than 3072 CUDA core Titan X with 1075mhz Boost. They key questions are how high will a stock GP100 GeForce be? And how much overclocking headroom does it have? We know Titan X can hit 1400-1450mhz (air), with 980Ti in the 1500+ range.

Stock vs. stock though should wipe the floor with GM200. Also, the full GP100 should have 1TB/sec HBM2 vs. 720GB/sec on the Tesla P100.

This reminds me of the original Titan launch where they didn't release the full Big chip right away.

610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.

A more interesting question imho, is how fast a theoretical big Pascal GPU with only FP32 cores would be. Assuming a Pascal and a Maxwell SM is the same size, and that a FP64 core takes up twice the amount of space as a FP32 core, then Pascal should also be capable of fitting 128 FP32 cores in a SM. With 60 SMs, that would be a total of 7680 cores running at 1480MHz.

Nvidia tends to get pretty close to linear scaling so such a GPU would be approximately 275% faster than a stock 980 Ti. Make it happen Nvidia

Another interesting thing is that if Nvidia could squeeze in 128 FP32 in a SM if they remove the FP64 cores like described, then they could make a gaming oriented GPU with 3584 FP32 cores with just 28 SMs, half of what GP100 has, and thus theoretically half the size. So a GP104 with 3584 cores running at ~1400-1500 MHz and a size of just 305 mm2.
 
Last edited:

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.

Did they have a working silicon?
He said it will ship next year as proffesional GPU. A titan would be probably half a year later, so 3Q2017.
Nice to see nv is going for HBM
 

Adored

Senior member
Mar 24, 2016
256
1
16
If you read the NV article, GP100 is 3840 CUDA cores / 240 Texture Units. 64 CUDA cores/SM with 4 TMUs per SM. Sounds like the fully unlocked GP100 is 60 SMS and they disabled 4 SMs to increase yields (Tesla P100 is 56/64 with corresponding 224 / 240 TMUs).

If we ignore overclocking, it means 1480mhz 3840 full chip is 72% faster than 3072 CUDA core Titan X with 1075mhz Boost. They key questions are how high will a stock GP100 GeForce be? And how much overclocking headroom does it have? We know Titan X can hit 1400-1450mhz (air), with 980Ti in the 1500+ range.

Stock vs. stock though should wipe the floor with GM200. Also, the full GP100 should have 1TB/sec HBM2 vs. 720GB/sec on the Tesla P100.

This reminds me of the original Titan launch where they didn't release the full Big chip right away.

610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.

610mm2 with any kind of yield would be impressive - note that they didn't show any silicon. At $10K+ per GPU they only need 1 or 2 out of the 90 on the wafer to work though.

P100 here is a 300W GPU compared to 250W of the Titan X so you have to factor that into maximum clock speeds. There will be a lot less headroom and 16FF+ likely falls off the cliff faster than 28nm.
 

MrTeal

Diamond Member
Dec 7, 2003
3,587
1,748
136
JHH said that GP100 will have 10.6 TFLOPs FP32, and that is the amount you get with 1480 MHz and 3584 CUDA cores. So either two versions of the GPU will have the same compute power, or something is wrong with their specs.

JHH said P100 (the module) would have 10.6TFLOPS. That's also what's listed in the table, Tesla P100. It will likely be a situation similar to Kepler, where the chip is shipped with one disabled to improve yields. Whether we'll see a fully enabled version a year or so later similar to the 780Ti remains to be seen.

A more interesting question imho, is how fast a theoretical big Pascal GPU with only FP32 cores would be. Assuming a Pascal and a Maxwell SM is the same size, and that a FP64 core takes up twice the amount of space as a FP32 core, then Pascal should also be capable of fitting 128 FP32 cores in a SM. With 60 SMs, that would be a total of 7680 cores running at 1480MHz.

Nvidia tends to get pretty close to linear scaling so such a GPU would be approximately 275% faster than a stock 980 Ti. Make it happen Nvidia

Well, and that's a really interesting question. Will we ever see GP100 in a video card at all? If we do get a 300mm² GPU without NV link, without DP compute, and somewhere in the ballpark of 3k CUDA cores, at similar clocks it's going to get pretty darn close to the performance of P100, HBM2 or not. Could GP100 be relegated to a Quadro (and maybe eventually a Titan) card, and 1080Ti (or 1180Ti) coming from the rumoured GP102?
 

Azix

Golden Member
Apr 18, 2014
1,438
67
91
whats so special about these tesla cards that you could not use a radeon or firepro? Nvidia is going hard in this market, but how easily could others take their cake and eat it. is it just a matter of DP Flops or is there something else? If its pure compute performance the field is wide open to anyone bringing a powerful chip.

610mm2 with any kind of yield would be impressive - note that they didn't show any silicon. At $10K+ per GPU they only need 1 or 2 out of the 90 on the wafer to work though.

P100 here is a 300W GPU compared to 250W of the Titan X so you have to factor that into maximum clock speeds. There will be a lot less headroom and 16FF+ likely falls off the cliff faster than 28nm.

related to my question above. Lets say this chip is rocking 18Tflops at 300W and $10k+

why would you buy that over a couple 8TFlop $1000 chips at <200W? etc. If the only thing on these chips is pure power they haven't got things cornered at all. If its CUDA and other software, that can be overcome more easily.
 
Last edited:

Osjur

Member
Sep 21, 2013
92
19
81
Soo, this is the first time ever when they did not reveal new generation Geforce at GTC.

I guess we can be pretty sure we will not see any new consumer GPU's before computex.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
whats so special about these tesla cards that you could not use a radeon or firepro? Nvidia is going hard in this market, but how easily could others take their cake and eat it. is it just a matter of DP Flops or is there something else? If its pure compute performance the field is wide open to anyone bringing a powerful chip.

CUDA. Their hardware is inferior in comparison to AMD, but they have CUDA.
 

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
If you read the NV article, GP100 is 3840 CUDA cores / 240 Texture Units. 64 CUDA cores/SM with 4 TMUs per SM. Sounds like the fully unlocked GP100 is 60 SMs and they disabled 4 SMs to increase yields (Tesla P100 is 56/64 with corresponding 224 / 240 TMUs).

If we ignore overclocking, it means 1480mhz 3840 full chip is 72% faster than 3072 CUDA core Titan X with 1075mhz Boost. They key questions are how high will a stock GP100 GeForce be? And how much overclocking headroom does it have? We know Titan X can hit 1400-1450mhz (air), with 980Ti in the 1500+ range.

Stock vs. stock though should wipe the floor with GM200. Also, the full GP100 should have 1TB/sec HBM2 vs. 720GB/sec on the Tesla P100.

This reminds me of the original Titan launch where they didn't release the full Big chip right away.

610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.

What's interesting is NV is gunning for clocks while hardly changing that amount of CUDA cores. 780Ti was 2816, TX was 3072 and now they are at 3840. If someone told me 3 years ago that after Kepler's 2816 CUDAs we'd be on 3840 with Pascal, I wouldn't have believed it. Looks like the FP64 and massive focus on compute ate a TON of transistor space on GP100; hence they had went all in with Boost clocks to compensate. Pretty impressive considering it's usually harder to clock larger chips higher but NV keeps breaking this rule over and over

Seems AMD has been going the opposite way -- low clocks but wider and wider chip. Will be interesting to see if AMD reverses course and sticks with 4096 SPs for Vega or goes 5000-6000 SPs with much lower 1100 clocks.

The Nvidia GF100 was an engineering marvel compared to ATI Cypress and nearly 60% larger in surface area,but it did not equate to that level of improvement in gaming performance and AMD is apparently using a denser process node too this time.

It also debuted an enhanced focus on compute,GDDR5, a new process node and was the last Nvidia over 500MM2 to debut on a process node. It was also around six months late compared to the competition and cost significantly more.

The GK110,GK210 and GM200 debuted on more mature process nodes.

It is more likely a smaller Pascal based GPU will be fighting Vega,not a 610MM2 P100 based card - that is my opinion OFC.
 
Last edited:

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
A more interesting question imho, is how fast a theoretical big Pascal GPU with only FP32 cores would be. Assuming a Pascal and a Maxwell SM is the same size, and that a FP64 core takes up twice the amount of space as a FP32 core, then Pascal should also be capable of fitting 128 FP32 cores in a SM. With 60 SMs, that would be a total of 7680 cores running at 1480MHz.

Nvidia tends to get pretty close to linear scaling so such a GPU would be approximately 275% faster than a stock 980 Ti. Make it happen Nvidia

Look what NV did on 28nm with GK110 (high compute) -> GM200 (low compute focus on gaming). I would guesstimate that with 610mm2 die size, they will repeat the same strategy with GP100 -> Volta. Being stuck on the same node means to get another 50-75% boost probably means going all in on lean gaming focused Volta. Then GP100 would serve as their compute backbone until 7/10nm. If they create a straight up FP32 gaming Pascal, with the performance characteristics you describe, they will have nothing to sell during Volta generation.

This could also explain the aggressive 2017/2018 span between Pascal and Volta. NV could neuter compute and improve the architecture. Pretty surprising turn of events after some people were predicting 3500-4072 CUDA GP104. This means if GP104 has 3072 CUDA cores and it overclockes better than GP100, the gap between them could be 25% or less. A max overclocked Titan X/980Ti smashes 980 OC by 30-40%. Interesting times ahead.
 

jpiniero

Lifer
Oct 1, 2010
14,842
5,457
136
The haters were right about one thing - no public demo does kind of make it seem like a launch in June unlikely. September might be a more realistic timeframe for any consumer GPU.

This means if GP104 has 3072 CUDA cores

I think you should assume it will have less than that now.
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Look what NV did on 28nm with GK110 (high compute) -> GM200 (low compute focus on gaming). I would guesstimate that with 610mm2 die size, they will repeat the same strategy with GP100 -> Volta. Being stuck on the same node means to get another 50-75% boost probably means going all in on lean gaming focused Volta. Then GP100 would serve as their compute backbone until 7/10nm. If they create a straight up FP32 gaming Pascal, with the performance characteristics you describe, they will have nothing to sell during Volta generation.

This could also explain the aggressive 2017/2018 span between Pascal and Volta. NV could neuter compute and improve the architecture. Pretty surprising turn of events after some people were predicting 3500-4072 CUDA GP104. This means if GP104 has 3072 CUDA cores and it overclockes better than GP100, the gap between them could be 25% or less. A max overclocked Titan X/980Ti smashes 980 OC by 30-40%. Interesting times ahead.

You're absolutely right that it would be financially stupid to release such a GPU right of the bat, just fun to dream a little.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
Hitting the die size on a node is entirely a question of economics. I'm sure any ASIC vendor could have pushed the limits of nearly any node in recent memory. It just has never made economic sense to do so, because the cost per unit is very high with such low yields. That's why 600mm2 came last for 28nm for consumers -- only then did it make financial sense.

But this chip is going into a 130 thousand dollar machine. You can afford to throw away a lot of dead silicon when you just need 8 working chips per 130k machine.

My prediction is both AMD and nVidia pull Kepler->Maxwell style architectures on 16/14. One arch on the node for compute, second arch on the node more specialized for gaming w/ fp64 stripped and using it for more cores. The stripped architecture gets larger chips as time goes on and yield improves since $$ is the limiting factor.

Regardless, extremely impressive stuff. Modern AI is truly outstanding.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |