NVIDIA Pascal Thread

Page 45 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Buddy, it was all the FUD that has NV releasing Titan Pascal in April 2016 that was cheered upon. Then it was later pushed to June. Then no, it's a GP104 mid-range, not the big Pascal. Now we're hearing no, it's not even GP104, but GP106/107 first. -_-

Turns out GP100 Tesla is Q1 2017.

In fact, that's much later that many of us expected. I was expecting GP100 Tesla in June. But it seems NV doesn't want it to go out then, keeping it for their $127K cluster only until Q1 2017.

They need to sell it to gamers because "that's where the big bucks are". /s
 

HurleyBird

Platinum Member
Apr 22, 2003
2,735
1,357
136
You see the count it as half FP64 when they present the specs for FP64 TFlops, it's exactly half of the FP32. But yes, they only have 1 in 3 CC as FP64.

Yes, 1-in-3 are FP64. What you aren't getting (hey, it's late, no worries) is that means 2-in-3 are FP32. Out of every group of 3 shaders, 1 is FP64 and 2 are FP32. Ergo there are half as many FP64 shaders as there are FP32 shaders. The ratio is 1:2, not 1:3.
 
Last edited:
Feb 19, 2009
10,457
10
76
In terms of GP102 being faster than GP100 in FP32 due to lack of DP:

What actually are the scenarios you need DP compute? Seems like mostly for deep learning.

GP102 focusing on SP makes sense. You can use it as 1080Ti, Titan and in the Quadro line. You then can also cram in more stuff into GP100 as it does not need any silicon for actually display stuff (ROPs) and get a much smaller die for GP102 with at least same if not higher FP32.

Actually, what they call "deep learning" or this new category of HPC application doesn't need high precision maths, they make do with FP16, which prior NV GPUs could not do. So it's wasted GPU FP32 flops basically.

The question is what can FP64/DP be used for? Usually anything scientific that requires the precision in their numbers basically. How accurate do you want to be, FP64 is for those that want the most accuracy.

Gaming is FP32, and in fact I've seen several presentations about the ability to use FP16 for many tasks, as they run much faster, it equates to a performance benefit for GPUs that are capable of running 2x FP16 per clock instead of 1x FP32.

Would not be surprised if some new Pascal "Optimized" GameWorks utilize FP16, and Pascal will easily be 3x faster than Maxwell at those features.

As to why not make a pure gaming chip? They did, Maxwell was it. But at some point, once they dominate the marketshare, gaming profit is not going to continue to grow. There's a finite mature gaming market and it's shared with AMD for dGPU. This point was raised several times by myself and others. Even if they continue to dominate 80:20, they aren't going to spike up their profits much more than it is currently.

So gaming profit growth will stagnate.

But it's also very risky to rely on maintaining 80:20, AMD could be very competitive and take back some and that means a drop in revenue & profit for NV's gaming financial.

Rather, they can make very flexible uarch that can split up FP32 for 2x FP16, join it up for FP64 and going with some specialized FP64 for the high end that needs it, this means they can attack and expand into multiple markets that are growing very fast while still having very strong gaming performance.
 
Feb 19, 2009
10,457
10
76
Yes, 1-in-3 are FP64. What you aren't getting (hey, it's late, no worries) is that means 2-in-3 are FP32. Out of every group of 3 shaders, 1 is FP64 and 2 are FP32. Ergo there are half as many FP64 shaders as there are FP32 shaders. The ratio is 1:2, not 1:3.

Lol I know that.

I am talking about performance. Throughput here.

FP64 Flops is counted at half ratio of FP32. But looking at the CC, it should only be one third of FP32. This means their new scheduler is taking the FP32 CC and making it work in FP64 mode, albeit slower.

The scheduler can also take FP32 and make it work twice as fast in FP16 mode. This is the major uarch change.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,735
1,357
136
FP64 Flops is counted at half ratio of FP32. But looking at the CC, it should only be one third of FP32.=

No, it should be half, which is exactly what it is. You're going to feel a bit silly when this all dawns on you... which I'm sure it will in a little bit

But like I said before, no worries man, it's late
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
Lol I know that.

I am talking about performance. Throughput here.

FP64 Flops is counted at half ratio of FP32. But looking at the CC, it should only be one third of FP32. This means their new scheduler is taking the FP32 CC and making it work in FP64 mode, albeit slower.

There are 2 FP32 cores and 1 FP64 core. At 1Hz you have 2 FP32 FLOPS and 1 FP64 FLOPS

Result 2 FP32 and 1 FP64
ratio 2:1
 

TheRyuu

Diamond Member
Dec 3, 2005
5,479
14
81
Techreport's article on Pascal[1] is claiming that it supports full pre-emption for compute tasks. I'm not sure if that translates to allowing full async compute support (graphics + compute) but at least it seems to have a greater capability than the current Maxwell chips in that regard.

GP100 includes full pre-emption support for compute tasks. It also uses an improved unified memory architecture to simplify its programming model. GP100's 49-bit virtual address space allows programs to address the full address spaces of both the CPU and the GPU. Older Tesla accelerators could only have a shared memory address space as large as the memory on board the GPU.

[1] https://techreport.com/news/29946/pascal-makes-its-debut-on-nvidia-tesla-p100-hpc-card
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
It can also split up the CC for FP16 operations, which is where this deep learning ~21 TFlops half-precision comes from. Kepler & Maxwell could NOT do that.

Actually Maxwell was perfectly capable of this as well, it just wasn't enabled in the desktop cards, only in mobile SoCs (i.e. Tegra X1):

Last but certainly not least however, X1 will also be launching with a new mobile-centric GPU feature not found on desktop Maxwell. For X1 NVIDIA is implanting what they call “double speed FP16” support in their CUDA cores, which is to say that they are implementing support for higher performance FP16 operations in limited circumstances.



So the only new thing with Pascal is that it will finally be enabled on the desktop as well (there are also some indications that the implementation might be more flexible than on Maxwell, i.e. it doesn't have to be two identical FP16 OPs, but I'm not entirely certain on that yet)

http://www.computerbase.de/bildstrecke/71580/12/

...

"Nvidia only showed renders" said the trolls here.

Interestingly enough, those pictures would seem to confirm the rumor from yesterday that GP100 is indeed using a 1200mm2 interposer, since the 610mm2 GPU appears to take up roughly half the space.

Edit: based upon this render and a bit of pixel counting, I get the size of the Interposer to be 1167mm2
 
Last edited:

Aristotelian

Golden Member
Jan 30, 2010
1,246
11
76
I'm trying to keep up with all of the information that is being shared here, guys. As far as I can tell, this is a GP100 - full Pascal flagship kind of card, but it is NOT the basis for a Titan release - is that true? I hope I misunderstand. I may be one of the few(ish) people willing to pay serious cash for a 32GB HBM2 flagship Pascal product now, but it doesn't seem like this is what Nvidia is offering - and I'm referring to numerous posts about how lucrative the GTX market is for Nvidia.
 

CentroX

Senior member
Apr 3, 2016
351
152
116
I'm trying to keep up with all of the information that is being shared here, guys. As far as I can tell, this is a GP100 - full Pascal flagship kind of card, but it is NOT the basis for a Titan release - is that true? I hope I misunderstand. I may be one of the few(ish) people willing to pay serious cash for a 32GB HBM2 flagship Pascal product now, but it doesn't seem like this is what Nvidia is offering - and I'm referring to numerous posts about how lucrative the GTX market is for Nvidia.

not sure what you need 32GB for at this Point? 8GB is fine as of today and 16GB is killing it.

It's like desktop RAM. I had 16GB since 2009 and dont see the need for 32GB.
 

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
I'm trying to keep up with all of the information that is being shared here, guys. As far as I can tell, this is a GP100 - full Pascal flagship kind of card, but it is NOT the basis for a Titan release - is that true? I hope I misunderstand. I may be one of the few(ish) people willing to pay serious cash for a 32GB HBM2 flagship Pascal product now, but it doesn't seem like this is what Nvidia is offering - and I'm referring to numerous posts about how lucrative the GTX market is for Nvidia.

This product is definitely tailor made for the HPC market. Its not a gaming product and may well never be e.g. titan. That said, we may see a 3D/game tailored GPU(s) instead hopefully tomorrow or during Computex in June.
 

nvgpu

Senior member
Sep 12, 2014
629
202
81
There won't be any 32GB SKUs until Samsung can mass produce 8 stack HBM2 and it will only be on professional Quadro and Tesla products, not GeForce.

Max you'll get is 16GB on TITAN Pascal, whatever they'll call the final name of it.
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Are those wood screws I see?

Well it's definitely countersunk screws, which you would generally never see on a final product, but there are so many things that screams mock-up for that board that it's not really terribly surprising.

Different angle of lighting makes different parts more and less visible. I'm more worried about those Woodstain stains on VRMs, what happened here?!

It's quite clearly not simply an issue of angle, as you can clearly see the outline of the HBM modules on the "dark" GPU. This outline is much smaller than what you see on the "shiny" GPUs.

If I didn't know any better I might be inclined to think that the "dark" GPUs are actually early development samples using smaller HBM1 modules, whereas the "shiny" GPU are final (or close to final), samples with the larger HBM2 modules.

Also notice how the "dark" GPUs have very clear scuff marks around some of the screw holes, indicating that they have been moved around quite a bit, whereas the "shiny" ones look comparatively pristine.

To honest I'm quite certain that we're looking at 2 separate batches of GP100 attached to the same board, one of these batches possibly using HBM1 modules.
 
Last edited:
Feb 19, 2009
10,457
10
76
This product is definitely tailor made for the HPC market. Its not a gaming product and may well never be e.g. titan. That said, we may see a 3D/game tailored GPU(s) instead hopefully tomorrow or during Computex in June.

So was Fermi 480 and Kepler Titan/780Ti, these big chips were HPC focused with 1/3 FP64. They were good for gaming too.

Why the expectations that it will be different this time?
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
Oh herp derp. Just noticed the difference in HBMs sizes. Like they run out of stickers or smth. LOL
 

Aristotelian

Golden Member
Jan 30, 2010
1,246
11
76
not sure what you need 32GB for at this Point? 8GB is fine as of today and 16GB is killing it.

It's like desktop RAM. I had 16GB since 2009 and dont see the need for 32GB.

It's not about "need" - just about buying the proper flagship Pascal product.

There won't be any 32GB SKUs until Samsung can mass produce 8 stack HBM2 and it will only be on professional Quadro and Tesla products, not GeForce.

Max you'll get is 16GB on TITAN Pascal, whatever they'll call the final name of it.

Thanks for that clarification. Hopefully the GP 100 leads quickly to a consumer card then, and a high end one at that.
 
Feb 19, 2009
10,457
10
76
No, it should be half, which is exactly what it is. You're going to feel a bit silly when this all dawns on you... which I'm sure it will in a little bit

But like I said before, no worries man, it's late

Holy crap you are right. My bad!

I just compared GM200 vs GP100 diagrams again carefully.

They only count the 64 FP32 CC in each SM. The 32x FP64 CC is not counted at all.



I misunderstood and thought that it is capable of running FP64 through FP32 CC. It can't.

That means the throughput they list for FP64 comes directly from the FP64 CC only, there's 32 of them per SM.

In total each SM has 96 CC, but only 64x FP32 CC usable in gaming.

Okay... that is such a strange design. Basically there's a lot of wasted die space that's useless for gaming, and a ton of die space wasted for people who need FP64.

They could make the same chip without any FP64 CC and it'll be ~400mm2. Hmm.

Any more guesses for GP104?
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |