NVIDIA Pascal Thread

3DVagabond · Apr 6, 2016

Silverforce11 said:
Buddy, it was all the FUD that has NV releasing Titan Pascal in April 2016 that was cheered upon. Then it was later pushed to June. Then no, it's a GP104 mid-range, not the big Pascal. Now we're hearing no, it's not even GP104, but GP106/107 first. -_-

Turns out GP100 Tesla is Q1 2017.

In fact, that's much later that many of us expected. I was expecting GP100 Tesla in June. But it seems NV doesn't want it to go out then, keeping it for their $127K cluster only until Q1 2017.

They need to sell it to gamers because "that's where the big bucks are". /s

HurleyBird · Apr 6, 2016

Silverforce11 said:
You see the count it as half FP64 when they present the specs for FP64 TFlops, it's exactly half of the FP32. But yes, they only have 1 in 3 CC as FP64.

Yes, 1-in-3 are FP64. What you aren't getting (hey, it's late, no worries) is that means 2-in-3 are FP32. Out of every group of 3 shaders, 1 is FP64 and 2 are FP32. Ergo there are half as many FP64 shaders as there are FP32 shaders. The ratio is 1:2, not 1:3.

Silverforce11 · Apr 6, 2016

beginner99 said:
In terms of GP102 being faster than GP100 in FP32 due to lack of DP:

What actually are the scenarios you need DP compute? Seems like mostly for deep learning.

GP102 focusing on SP makes sense. You can use it as 1080Ti, Titan and in the Quadro line. You then can also cram in more stuff into GP100 as it does not need any silicon for actually display stuff (ROPs) and get a much smaller die for GP102 with at least same if not higher FP32.

Actually, what they call "deep learning" or this new category of HPC application doesn't need high precision maths, they make do with FP16, which prior NV GPUs could not do. So it's wasted GPU FP32 flops basically.

The question is what can FP64/DP be used for? Usually anything scientific that requires the precision in their numbers basically. How accurate do you want to be, FP64 is for those that want the most accuracy.

Gaming is FP32, and in fact I've seen several presentations about the ability to use FP16 for many tasks, as they run much faster, it equates to a performance benefit for GPUs that are capable of running 2x FP16 per clock instead of 1x FP32.

Would not be surprised if some new Pascal "Optimized" GameWorks utilize FP16, and Pascal will easily be 3x faster than Maxwell at those features.

As to why not make a pure gaming chip? They did, Maxwell was it. But at some point, once they dominate the marketshare, gaming profit is not going to continue to grow. There's a finite mature gaming market and it's shared with AMD for dGPU. This point was raised several times by myself and others. Even if they continue to dominate 80:20, they aren't going to spike up their profits much more than it is currently.

So gaming profit growth will stagnate.

But it's also very risky to rely on maintaining 80:20, AMD could be very competitive and take back some and that means a drop in revenue & profit for NV's gaming financial.

Rather, they can make very flexible uarch that can split up FP32 for 2x FP16, join it up for FP64 and going with some specialized FP64 for the high end that needs it, this means they can attack and expand into multiple markets that are growing very fast while still having very strong gaming performance.

Silverforce11 · Apr 6, 2016

HurleyBird said:
Yes, 1-in-3 are FP64. What you aren't getting (hey, it's late, no worries) is that means 2-in-3 are FP32. Out of every group of 3 shaders, 1 is FP64 and 2 are FP32. Ergo there are half as many FP64 shaders as there are FP32 shaders. The ratio is 1:2, not 1:3.

Lol I know that.

I am talking about performance. Throughput here.

FP64 Flops is counted at half ratio of FP32. But looking at the CC, it should only be one third of FP32. This means their new scheduler is taking the FP32 CC and making it work in FP64 mode, albeit slower.

The scheduler can also take FP32 and make it work twice as fast in FP16 mode. This is the major uarch change.

HurleyBird · Apr 6, 2016

Silverforce11 said:
FP64 Flops is counted at half ratio of FP32. But looking at the CC, it should only be one third of FP32.=

No, it should be half, which is exactly what it is. You're going to feel a bit silly when this all dawns on you... which I'm sure it will in a little bit

But like I said before, no worries man, it's late

Erenhardt · Apr 6, 2016

Silverforce11 said:
Lol I know that.

I am talking about performance. Throughput here.

FP64 Flops is counted at half ratio of FP32. But looking at the CC, it should only be one third of FP32. This means their new scheduler is taking the FP32 CC and making it work in FP64 mode, albeit slower.

There are 2 FP32 cores and 1 FP64 core. At 1Hz you have 2 FP32 FLOPS and 1 FP64 FLOPS

Result 2 FP32 and 1 FP64
ratio 2:1

TheRyuu · Apr 6, 2016

Techreport's article on Pascal[1] is claiming that it supports full pre-emption for compute tasks. I'm not sure if that translates to allowing full async compute support (graphics + compute) but at least it seems to have a greater capability than the current Maxwell chips in that regard.

GP100 includes full pre-emption support for compute tasks. It also uses an improved unified memory architecture to simplify its programming model. GP100's 49-bit virtual address space allows programs to address the full address spaces of both the CPU and the GPU. Older Tesla accelerators could only have a shared memory address space as large as the memory on board the GPU.

[1] https://techreport.com/news/29946/pascal-makes-its-debut-on-nvidia-tesla-p100-hpc-card

nvgpu · Apr 6, 2016

http://www.computerbase.de/bildstrecke/71580/12/

"Nvidia only showed renders" said the trolls here.

antihelten · Apr 6, 2016

Silverforce11 said:
It can also split up the CC for FP16 operations, which is where this deep learning ~21 TFlops half-precision comes from. Kepler & Maxwell could NOT do that.

Actually Maxwell was perfectly capable of this as well, it just wasn't enabled in the desktop cards, only in mobile SoCs (i.e. Tegra X1):

Last but certainly not least however, X1 will also be launching with a new mobile-centric GPU feature not found on desktop Maxwell. For X1 NVIDIA is implanting what they call “double speed FP16” support in their CUDA cores, which is to say that they are implementing support for higher performance FP16 operations in limited circumstances.

So the only new thing with Pascal is that it will finally be enabled on the desktop as well (there are also some indications that the implementation might be more flexible than on Maxwell, i.e. it doesn't have to be two identical FP16 OPs, but I'm not entirely certain on that yet)

nvgpu said:
http://www.computerbase.de/bildstrecke/71580/12/

...

"Nvidia only showed renders" said the trolls here.

Interestingly enough, those pictures would seem to confirm the rumor from yesterday that GP100 is indeed using a 1200mm2 interposer, since the 610mm2 GPU appears to take up roughly half the space.

Edit: based upon this render and a bit of pixel counting, I get the size of the Interposer to be 1167mm2

Aristotelian · Apr 6, 2016

I'm trying to keep up with all of the information that is being shared here, guys. As far as I can tell, this is a GP100 - full Pascal flagship kind of card, but it is NOT the basis for a Titan release - is that true? I hope I misunderstand. I may be one of the few(ish) people willing to pay serious cash for a 32GB HBM2 flagship Pascal product now, but it doesn't seem like this is what Nvidia is offering - and I'm referring to numerous posts about how lucrative the GTX market is for Nvidia.

CentroX · Apr 6, 2016

Aristotelian said:
I'm trying to keep up with all of the information that is being shared here, guys. As far as I can tell, this is a GP100 - full Pascal flagship kind of card, but it is NOT the basis for a Titan release - is that true? I hope I misunderstand. I may be one of the few(ish) people willing to pay serious cash for a 32GB HBM2 flagship Pascal product now, but it doesn't seem like this is what Nvidia is offering - and I'm referring to numerous posts about how lucrative the GTX market is for Nvidia.

not sure what you need 32GB for at this Point? 8GB is fine as of today and 16GB is killing it.

It's like desktop RAM. I had 16GB since 2009 and dont see the need for 32GB.

Cookie Monster · Apr 6, 2016

Aristotelian said:
I'm trying to keep up with all of the information that is being shared here, guys. As far as I can tell, this is a GP100 - full Pascal flagship kind of card, but it is NOT the basis for a Titan release - is that true? I hope I misunderstand. I may be one of the few(ish) people willing to pay serious cash for a 32GB HBM2 flagship Pascal product now, but it doesn't seem like this is what Nvidia is offering - and I'm referring to numerous posts about how lucrative the GTX market is for Nvidia.

This product is definitely tailor made for the HPC market. Its not a gaming product and may well never be e.g. titan. That said, we may see a 3D/game tailored GPU(s) instead hopefully tomorrow or during Computex in June.

nvgpu · Apr 6, 2016

There won't be any 32GB SKUs until Samsung can mass produce 8 stack HBM2 and it will only be on professional Quadro and Tesla products, not GeForce.

Max you'll get is 16GB on TITAN Pascal, whatever they'll call the final name of it.

dacostafilipe · Apr 6, 2016

nvgpu said:

The left GPUs look strange. Why are the HBM stacks so dark compared to the right dies?

vs

3DVagabond · Apr 6, 2016

NeoLuxembourg said:
The left GPUs look strange. Why are the HBM stacks so dark compared to the right dies?

vs

Are those wood screws I see?

antihelten · Apr 6, 2016

NeoLuxembourg said:
The left GPUs look strange. Why are the HBM stacks so dark compared to the right dies?

The dark stuff might be some sort of epoxy covering, but it is definitely weird that there are two clearly different versions of the GPU.

Erenhardt · Apr 6, 2016

NeoLuxembourg said:
The left GPUs look strange. Why are the HBM stacks so dark compared to the right dies?

vs

Different angle of lighting makes different parts more and less visible. I'm more worried about those Woodstain stains on VRMs, what happened here?!

antihelten · Apr 6, 2016

3DVagabond said:
Are those wood screws I see?

Well it's definitely countersunk screws, which you would generally never see on a final product, but there are so many things that screams mock-up for that board that it's not really terribly surprising.

Erenhardt said:
Different angle of lighting makes different parts more and less visible. I'm more worried about those Woodstain stains on VRMs, what happened here?!

It's quite clearly not simply an issue of angle, as you can clearly see the outline of the HBM modules on the "dark" GPU. This outline is much smaller than what you see on the "shiny" GPUs.

If I didn't know any better I might be inclined to think that the "dark" GPUs are actually early development samples using smaller HBM1 modules, whereas the "shiny" GPU are final (or close to final), samples with the larger HBM2 modules.

Also notice how the "dark" GPUs have very clear scuff marks around some of the screw holes, indicating that they have been moved around quite a bit, whereas the "shiny" ones look comparatively pristine.

To honest I'm quite certain that we're looking at 2 separate batches of GP100 attached to the same board, one of these batches possibly using HBM1 modules.

Silverforce11 · Apr 6, 2016

Cookie Monster said:
This product is definitely tailor made for the HPC market. Its not a gaming product and may well never be e.g. titan. That said, we may see a 3D/game tailored GPU(s) instead hopefully tomorrow or during Computex in June.

So was Fermi 480 and Kepler Titan/780Ti, these big chips were HPC focused with 1/3 FP64. They were good for gaming too.

Why the expectations that it will be different this time?

airfathaaaaa · Apr 6, 2016

perhaps they are demoing new parts to take over a piece of the cake from ikea

dacostafilipe · Apr 6, 2016

Erenhardt said:
Different angle of lighting makes different parts more and less visible.

Does not seem to be the case.

The left dies also have reflections on them, but the HBM stacks not.

Erenhardt · Apr 6, 2016

Oh herp derp. Just noticed the difference in HBMs sizes. Like they run out of stickers or smth. LOL

Cookie Monster · Apr 6, 2016

Could be leftover prototypes with HBM1 instead for early validation tests when HBM2 wasn't available.

Aristotelian · Apr 6, 2016

CentroX said:
not sure what you need 32GB for at this Point? 8GB is fine as of today and 16GB is killing it.

It's like desktop RAM. I had 16GB since 2009 and dont see the need for 32GB.

It's not about "need" - just about buying the proper flagship Pascal product.

nvgpu said:
There won't be any 32GB SKUs until Samsung can mass produce 8 stack HBM2 and it will only be on professional Quadro and Tesla products, not GeForce.

Max you'll get is 16GB on TITAN Pascal, whatever they'll call the final name of it.

Thanks for that clarification. Hopefully the GP 100 leads quickly to a consumer card then, and a high end one at that.

Silverforce11 · Apr 6, 2016

HurleyBird said:
No, it should be half, which is exactly what it is. You're going to feel a bit silly when this all dawns on you... which I'm sure it will in a little bit

But like I said before, no worries man, it's late

Holy crap you are right. My bad!

I just compared GM200 vs GP100 diagrams again carefully.

They only count the 64 FP32 CC in each SM. The 32x FP64 CC is not counted at all.

I misunderstood and thought that it is capable of running FP64 through FP32 CC. It can't.

That means the throughput they list for FP64 comes directly from the FP64 CC only, there's 32 of them per SM.

In total each SM has 96 CC, but only 64x FP32 CC usable in gaming.

Okay... that is such a strange design. Basically there's a lot of wasted die space that's useless for gaming, and a ton of die space wasted for people who need FP64.

They could make the same chip without any FP64 CC and it'll be ~400mm2. Hmm.

Any more guesses for GP104?

NVIDIA Pascal Thread

Lifer

Platinum Member

Lifer

Lifer

Platinum Member

Diamond Member

Diamond Member

Senior member

Golden Member

Golden Member

Senior member

Diamond Member

Senior member

Senior member

Lifer

Golden Member

Diamond Member

Golden Member

Lifer

Senior member

Senior member

Diamond Member

Diamond Member

Golden Member

Lifer