NVIDIA Pascal Thread

Page 37 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
inside Pascal:
https://devblogs.nvidia.com/parallelforall/inside-pascal/

again, who said Pascal is Maxwell on 16FF ?
Nvidia themselves:
http://cdn.wccftech.com/wp-content/uploads/2015/09/NVIDIA-Pascal-GPU_Compute-Performance.jpg

thats fp32 cores. + 1792 fp 64 cores means it is actually 5376

Also confirmed 610mm2 - even bigger than GM200. New node and right to the reticle limit right away. incredible

Nope. It looks like FP64 on Tesla P100 uses two FP32 cores, to get the code executed.

Yup:
Nvidiablog said:
GP100’s SM incorporates 64 single-precision (FP32) CUDA Cores. In contrast, the Maxwell and Kepler SMs had 128 and 192 FP32 CUDA Cores, respectively. The GP100 SM is partitioned into two processing blocks, each having 32 single-precision CUDA Cores, an instruction buffer, a warp scheduler, and two dispatch units. While a GP100 SM has half the total number of CUDA Cores of a Maxwell SM, it maintains the same register file size and supports similar occupancy of warps and thread blocks.

It is 3584 CUDA core FP32 GPU, and FP64 is executed by using two FP32 cores at the same time, to get FP64.
 
Last edited:

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Nope. It looks like FP64 on Tesla P100 uses two FP32 cores, to get the code executed.

Yup:

It is 3584 CUDA core FP32 GPU, and FP64 is executed by using two FP32 cores at the same time, to get FP64.

No it doesn't, look at the diagram (separate SP and DP cores):



Just like Kepler:

 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
I'm not sure if they changed the arrangement of their geometry processors but if the rule still holds that 1 GPC = 1 raster engine then Pascal is somewhat of a disappointment triangle output wise since it can still only rasterize 6 triangles per cycle which is the same triangle rate compared to GM200 ...

I can't be too sure but I don't see dedicated compute engines either on the GP100 diagram so interpret that as you will ...
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
antihelten go back again to my post. Whats more, your screen from GP100 shows exactly what I have said. Look at cache, and read what Nvidia said about it.

It is 3584 CUDA core GPU that uses Two FP32 cores to execute FP64.

I have to say: It is efficient design of FP64...
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
antihelten go back again to my post. Whats more, your screen from GP100 shows exactly what I have said. Look at cache, and read what Nvidia said about it.

It is 3584 CUDA core GPU that uses Two FP32 cores to execute FP64.

I have to say: It is efficient design of FP64...

Try counting the cores in the diagram. There are 64 separate SP cores and 32 separate DP cores (or units as they are called in the diagram).

And what you quoted from Nvidia makes no mention what so ever of SP cores doing double duty as DP cores. In fact if you'd read a bit further on the blog you would find this:

Each GP100 SM has 32 FP64 units, providing a 2:1 ratio of single- to double-precision throughput.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
Holy shit 1500mhz boost for big pascal.Also new architecture.
This GPU will anhilate GM200

i am eager to see Gp104...Without DP units ...Base clock 1400mhz and boost 1600Mhz?
Maybe:
2560SP
160TMU

TDP 180w

base clock 1400mhz
Boost clock 1600Mhz
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,611
1,813
136
Try counting the cores in the diagram. There are 64 separate SP cores and 32 separate DP cores (or units as they are called in the diagram).

And what you quoted from Nvidia makes no mention what so ever of SP cores doing double duty as DP cores. In fact if you'd read a bit further on the blog you would find this:

FP64 unit doesn't necessarily mean it's a full FP64 core that runs independently of the FP32 cores.

Does Kepler have 2880 FP32 cores + 960 FP64 cores, or does it have 2880 cores that can do FP64 at 1/3rd rate?
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
Try counting the cores in the diagram. There are 64 separate SP cores and 32 separate DP cores (or units as they are called in the diagram).

And what you quoted from Nvidia makes no mention what so ever of SP cores doing double duty as DP cores. In fact if you'd read a bit further on the blog you would find this:

If they would be separate then ok. But it does not work that way you described.

Overall you are right, it does not use FP32 to get FP64. But it has still 3584 CUDA cores.
 
Last edited:

Creig

Diamond Member
Oct 9, 1999
5,170
13
81
It's the GPU die size, not the substrate.
I don't mean the size of the substrate itself. I was talking about the components mounted on the substrate (GPU + memory).

Both Fiji and Maxwell had GPU's around 600mm^2 at 28nm. I'm a bit skeptical that Nvidia will come out with a GPU the exact same size on 16nm. The transistor core would have to be incredibly high to account for a GPU that large on 16nm. I think it's more likely that the 610mm^2 reference was for the area of the GPU + the area of the HBM2 since they're now packaged together on the substrate as a single unit.
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
inside Pascal:
https://devblogs.nvidia.com/parallelforall/inside-pascal/

again, who said Pascal is Maxwell on 16FF ?

the only true leak that came from nvidia was one

pascal was going to be a maxwell with bigger sp and dp perf this was the only actual leak we had from nvidia themselfs

turns out its true and turns out to be the only true rumor/leak so far

this is a titan x
http://images.anandtech.com/doci/9059/TITAN_X_Block_Diagram_FINAL.png
this is pascal
https://devblogs.nvidia.com/paralle...ads/2016/04/gp100_block_diagram-1-624x368.png

take out nvlink and the hbm memory links and what you have is what they actually said

(also on that devblog they said they have a bigger number of threads compared to kepler while they have the same as maxwell...
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
FP64 unit doesn't necessarily mean it's a full FP64 core that runs independently of the FP32 cores.

Does Kepler have 2880 FP32 cores + 960 FP64 cores, or does it have 2880 cores that can do FP64 at 1/3rd rate?

I'm not saying it's running independently as such, merely that a FP64 unit/core is not simply an abstraction of 2 FP32 cores joining together to run DP, but that it is instead a separate functional unit of it's own.

If they would be separate then ok. But it does not work that way you described.

I never said anything about how it worked, you were the only one to do so (saying that DP workloads where handled by 2 FP32 cores instead of by a separate FP64 core)

I simply said that the FP64 cores where distinct separate units.
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,611
1,813
136
I don't mean the size of the substrate itself. I was talking about the components mounted on the substrate (GPU + memory).

Both Fiji and Maxwell had GPU's around 600mm^2 at 28nm. I'm a bit skeptical that Nvidia will come out with a GPU the exact same size on 16nm. The transistor core would have to be incredibly high to account for a GPU that large on 16nm. I think it's more likely that the 610mm^2 reference was for the area of the GPU + the area of the HBM2 since they're now packaged together on the substrate as a single unit.

https://devblogs.nvidia.com/parallelforall/inside-pascal/

Nvidia specifically says 610mm² for the GPU die itself, and shows the known 601mm² die size for Maxwell.
 

xorbe

Senior member
Sep 7, 2011
368
0
76
So much for the posters claiming that the new single gpu Titan was imminent! I think we're in for another year's wait.
 

Adored

Senior member
Mar 24, 2016
256
1
16
It's 610mm2 big single die.

The clocks are ok but not that impressive given we're looking at a 300W part compared to a 250W Maxwell. Plenty of 980 Ti's out there at 1300MHz and beyond...
 

Actaeon

Diamond Member
Dec 28, 2000
8,657
20
76
Hearing some excitement from some folks and some disappointment from others. I am not as tech savvy as many folks here to have such an opinion, so if you have a reaction can you translate why you feel the way you do about GP100?

From my less tech-savvy perception, the increase in CUDA cores seems a bit incremental, less than 20% increase over GM200. On the other hand the factory base/boost clocks are pretty impressive at 1328/1480. You can get there with GM200 but that requires an overclock. So I guess it really depends on how much overclocking headroom is available over those factory clocks but we won't know that for years. The HBM2 is also very impressive. Memory Bandwidth limits will be a thing of the past with a 4096 bit rate memory interface. 15M transistors sounds really impressive but given the focus on compute I wonder how much of that will help with gaming performance which was Maxwell's primary focus.

Anyway, I am having a bit of a hard time understanding where things sit with GP100 compared to today's stuff. Surely its better but by how much?

EDIT: I am disappointed in the timelines being so far out. If we're looking at Q1 2017 for Tesla, we won't see consumer versions of this for at least another year.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
I'm not saying it's running independently as such, merely that a FP64 unit/core is not simply an abstraction of 2 FP32 cores joining together to run DP, but that it is instead a separate functional unit of it's own.



I never said anything about how it worked, you were the only one to do so (saying that DP workloads where handled by 2 FP32 cores instead of by a separate FP64 core)

I simply said that the FP64 cores where distinct separate units.

Yes, you are right on that, it does not use FP32 to get FP64, my bad. But it still is also not 53something GPU. It is still 3584 CUDA core GPU.

P.S. They still show DP on Maxwell and Kepler it should tell you how to understand Pascal also.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Hearing some excitement from some folks and some disappointment from others. I am not as tech savvy as many here but folks on either side to make such a decision, so if you have a reaction can you translate why you feel the way you do about GP100?

From my less tech-savvy perception, the increase in CUDA cores seems a bit incremental, less than 20% increase over GM200. On the other hand the factory base/boost clocks are pretty impressive at 1328/1480. You can get there with GM200 but that requires an overclock. So I guess it really depends on how much overclocking headroom is available over those factory clocks but I won't know that for years. The HBM2 is also very impressive. Memory Bandwidth limits will be a thing of the past with a 4096 bit rate memory interface. 15M transistors sounds really impressive but given the focus on compute I wonder how much of that will help with gaming performance which was Maxwell's primary focus.

Anyway, I am having a bit of a hard time understanding where things sit with GP100 compared to today's stuff. Surely its better but by how much?

It's meh, probably noticeably worse in perf/transistor in gaming scenarios but the engineering feat is nothing short of great ...
 

Maverick177

Senior member
Mar 11, 2016
411
70
91
Hearing some excitement from some folks and some disappointment from others. I am not as tech savvy as many folks here to have such a reaction, so if you have a reaction can you translate why you feel the way you do about GP100?

From my less tech-savvy perception, the increase in CUDA cores seems a bit incremental, less than 20% increase over GM200. On the other hand the factory base/boost clocks are pretty impressive at 1328/1480. You can get there with GM200 but that requires an overclock. So I guess it really depends on how much overclocking headroom is available over those factory clocks but I won't know that for years. The HBM2 is also very impressive. Memory Bandwidth limits will be a thing of the past with a 4096 bit rate memory interface. 15M transistors sounds really impressive but given the focus on compute I wonder how much of that will help with gaming performance which was Maxwell's primary focus.

Anyway, I am having a bit of a hard time understanding where things sit with GP100 compared to today's stuff. Surely its better but by how much?

AMD themselves stated they have "uplifted" the frequency in Polaris, and/or Vega so 1.6 ~1.7 Ghz OC for Pascal is not out of reach.

I think Pascal is surely impressive, but not to the degree Maxwell compared to Kepler.
 

Trumpstyle

Member
Jul 18, 2015
76
27
91
Guys, if I'm correct Pascal will only bring around 75 % increase in gaming performance. Looking at gm200 to P100 the GFLOPS is only 75 % increase.

I'm not a computer expert but is this correct?
 

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
AMD themselves stated they have "uplifted" the frequency in Polaris, and/or Vega so 1.6 ~1.7 Ghz OC for Pascal is not out of reach.

I think Pascal is surely impressive, but not to the degree Maxwell compared to Kepler.
Not to mention the ~18B transistors in Vega will hold them in good stead, 14nm might just be the difference between the two GPU makers this time around.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Exceptional, too bad we mortals will have to play with 300mm2 or less for now.
 

Maverick177

Senior member
Mar 11, 2016
411
70
91
Well AMD has a long history of impressive specs accompanied with disappointing reviews so there's that.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |