Arachnotronic
Lifer
- Mar 10, 2006
- 11,715
- 2,012
- 126
Exceptional, too bad we mortals will have to play with 300mm2 or less for now.
This thing is an engineering marvel. Would be willing to pay $$ for a pair of these for gaming!
Exceptional, too bad we mortals will have to play with 300mm2 or less for now.
Yes, you are right on that, it does not use FP32 to get FP64, my bad. But it still is also not 53something GPU. It is still 3584 CUDA core GPU.
P.S. They still show DP on Maxwell and Kepler it should tell you how to understand Pascal also.
3584 CUDA cores.
This thing is an engineering marvel. Would be willing to pay $$ for a pair of these for gaming!
They already needed to use 300W to hit the clocks we're seeing now. The Kepler and Maxwell Teslas used 235W/250W respectively, with clocks not being that much lower than the desktop counterparts, especially for Maxwell. How much power is a 1700MHz GPU going to need?AMD themselves stated they have "uplifted" the frequency in Polaris, and/or Vega so 1.6 ~1.7 Ghz OC for Pascal is not out of reach.
I think Pascal is surely impressive, but not to the degree Maxwell compared to Kepler.
Guys, if I'm correct Pascal will only bring around 75 % increase in gaming performance. Looking at gm200 to P100 the GFLOPS is only 75 % increase.
I'm not a computer expert but is this correct?
If you read the NV article, GP100 is 3840 CUDA cores / 240 Texture Units. 64 CUDA cores/SM with 4 TMUs per SM. Sounds like the fully unlocked GP100 is 60 SMS and they disabled 4 SMs to increase yields (Tesla P100 is 56/64 with corresponding 224 / 240 TMUs).
If we ignore overclocking, it means 1480mhz 3840 full chip is 72% faster than 3072 CUDA core Titan X with 1075mhz Boost. They key questions are how high will a stock GP100 GeForce be? And how much overclocking headroom does it have? We know Titan X can hit 1400-1450mhz (air), with 980Ti in the 1500+ range.
Stock vs. stock though should wipe the floor with GM200. Also, the full GP100 should have 1TB/sec HBM2 vs. 720GB/sec on the Tesla P100.
This reminds me of the original Titan launch where they didn't release the full Big chip right away.
610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.
They already needed to use 300W to hit the clocks we're seeing now. The Kepler and Maxwell Teslas used 235W/250W respectively, with clocks not being that much lower than the desktop counterparts, especially for Maxwell. How much power is a 1700MHz GPU going to need?
If you read the NV article, GP100 is 3840 CUDA cores / 240 Texture Units. 64 CUDA cores/SM with 4 TMUs per SM. Sounds like the fully unlocked GP100 is 60 SMS and they disabled 4 SMs to increase yields (Tesla P100 is 56/64 with corresponding 224 / 240 TMUs).
If we ignore overclocking, it means 1480mhz 3840 full chip is 72% faster than 3072 CUDA core Titan X with 1075mhz Boost. They key questions are how high will a stock GP100 GeForce be? And how much overclocking headroom does it have? We know Titan X can hit 1400-1450mhz (air), with 980Ti in the 1500+ range.
Stock vs. stock though should wipe the floor with GM200. Also, the full GP100 should have 1TB/sec HBM2 vs. 720GB/sec on the Tesla P100.
This reminds me of the original Titan launch where they didn't release the full Big chip right away.
610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.
https://devblogs.nvidia.com/parallelforall/inside-pascal/
Nvidia specifically says 610mm² for the GPU die itself, and shows the known 601mm² die size for Maxwell.
Maxwell overclocks very well, but has lower clocks and lower TDP.Remember Maxwell?
610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.
If you read the NV article, GP100 is 3840 CUDA cores / 240 Texture Units. 64 CUDA cores/SM with 4 TMUs per SM. Sounds like the fully unlocked GP100 is 60 SMS and they disabled 4 SMs to increase yields (Tesla P100 is 56/64 with corresponding 224 / 240 TMUs).
If we ignore overclocking, it means 1480mhz 3840 full chip is 72% faster than 3072 CUDA core Titan X with 1075mhz Boost. They key questions are how high will a stock GP100 GeForce be? And how much overclocking headroom does it have? We know Titan X can hit 1400-1450mhz (air), with 980Ti in the 1500+ range.
Stock vs. stock though should wipe the floor with GM200. Also, the full GP100 should have 1TB/sec HBM2 vs. 720GB/sec on the Tesla P100.
This reminds me of the original Titan launch where they didn't release the full Big chip right away.
610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.
JHH said that GP100 will have 10.6 TFLOPs FP32, and that is the amount you get with 1480 MHz and 3584 CUDA cores. So either two versions of the GPU will have the same compute power, or something is wrong with their specs.
A more interesting question imho, is how fast a theoretical big Pascal GPU with only FP32 cores would be. Assuming a Pascal and a Maxwell SM is the same size, and that a FP64 core takes up twice the amount of space as a FP32 core, then Pascal should also be capable of fitting 128 FP32 cores in a SM. With 60 SMs, that would be a total of 7680 cores running at 1480MHz.
Nvidia tends to get pretty close to linear scaling so such a GPU would be approximately 275% faster than a stock 980 Ti. Make it happen Nvidia
610mm2 with any kind of yield would be impressive - note that they didn't show any silicon. At $10K+ per GPU they only need 1 or 2 out of the 90 on the wafer to work though.
P100 here is a 300W GPU compared to 250W of the Titan X so you have to factor that into maximum clock speeds. There will be a lot less headroom and 16FF+ likely falls off the cliff faster than 28nm.
whats so special about these tesla cards that you could not use a radeon or firepro? Nvidia is going hard in this market, but how easily could others take their cake and eat it. is it just a matter of DP Flops or is there something else? If its pure compute performance the field is wide open to anyone bringing a powerful chip.
If you read the NV article, GP100 is 3840 CUDA cores / 240 Texture Units. 64 CUDA cores/SM with 4 TMUs per SM. Sounds like the fully unlocked GP100 is 60 SMs and they disabled 4 SMs to increase yields (Tesla P100 is 56/64 with corresponding 224 / 240 TMUs).
If we ignore overclocking, it means 1480mhz 3840 full chip is 72% faster than 3072 CUDA core Titan X with 1075mhz Boost. They key questions are how high will a stock GP100 GeForce be? And how much overclocking headroom does it have? We know Titan X can hit 1400-1450mhz (air), with 980Ti in the 1500+ range.
Stock vs. stock though should wipe the floor with GM200. Also, the full GP100 should have 1TB/sec HBM2 vs. 720GB/sec on the Tesla P100.
This reminds me of the original Titan launch where they didn't release the full Big chip right away.
610mm2 on FinFET right off the bat is the most impressive thing to me. That means there should be no reason at all GP104 cannot be 400-450mm2 if NV wanted to. AMD is probably shocked that NV managed to hit 610mm2 on FinFET.
What's interesting is NV is gunning for clocks while hardly changing that amount of CUDA cores. 780Ti was 2816, TX was 3072 and now they are at 3840. If someone told me 3 years ago that after Kepler's 2816 CUDAs we'd be on 3840 with Pascal, I wouldn't have believed it. Looks like the FP64 and massive focus on compute ate a TON of transistor space on GP100; hence they had went all in with Boost clocks to compensate. Pretty impressive considering it's usually harder to clock larger chips higher but NV keeps breaking this rule over and over
Seems AMD has been going the opposite way -- low clocks but wider and wider chip. Will be interesting to see if AMD reverses course and sticks with 4096 SPs for Vega or goes 5000-6000 SPs with much lower 1100 clocks.
A more interesting question imho, is how fast a theoretical big Pascal GPU with only FP32 cores would be. Assuming a Pascal and a Maxwell SM is the same size, and that a FP64 core takes up twice the amount of space as a FP32 core, then Pascal should also be capable of fitting 128 FP32 cores in a SM. With 60 SMs, that would be a total of 7680 cores running at 1480MHz.
Nvidia tends to get pretty close to linear scaling so such a GPU would be approximately 275% faster than a stock 980 Ti. Make it happen Nvidia
This means if GP104 has 3072 CUDA cores
So is it 3584 cores or 3584 + 1792?
Look what NV did on 28nm with GK110 (high compute) -> GM200 (low compute focus on gaming). I would guesstimate that with 610mm2 die size, they will repeat the same strategy with GP100 -> Volta. Being stuck on the same node means to get another 50-75% boost probably means going all in on lean gaming focused Volta. Then GP100 would serve as their compute backbone until 7/10nm. If they create a straight up FP32 gaming Pascal, with the performance characteristics you describe, they will have nothing to sell during Volta generation.
This could also explain the aggressive 2017/2018 span between Pascal and Volta. NV could neuter compute and improve the architecture. Pretty surprising turn of events after some people were predicting 3500-4072 CUDA GP104. This means if GP104 has 3072 CUDA cores and it overclockes better than GP100, the gap between them could be 25% or less. A max overclocked Titan X/980Ti smashes 980 OC by 30-40%. Interesting times ahead.