NVIDIA Pascal Thread

Silverforce11 · Apr 6, 2016

nvgpu said:
Nvidia will be selling every GP100 they can get from TSMC, demand is high.

Yes, and the Pascal Tesla is a harvested chip. Imagine the profits!

NV has no incentive to sell big Pascal for gamers when they can sell out at 10x the price, 20x the margins, for a harvested chip!

Just think about the 16nm wafers, do NV make a big Tesla that is so profitable, or do they make a mid-range GP104 to compete in a price war as GTX 970 replacement with AMD?

They would make more $$ using all the 16nm production they have access to for GP100 and cut prices on 28nm (does not compete for 16nm wafers!) to compete with AMD in the short term.

When TSMC has ramp up even more 16nm production, that's when they'll go for consumer SKUs, GP104, 106, 107 etc.

AtenRa · Apr 6, 2016

Guys, GP100 has 3840 Cuda Cores. There are no extra 64bit cores. There are LD/ST and SFUs though.

Each Cuda Core is comprised of an ALU and FPU. In order for the Cuda Core to be able to compute 64bit, it will need both 64bit Registers AND 64bit datapaths. So for GP100, in every Cuda Core pair only one of them has access to 64bit datapath and only half of the Cuda Cores of the entire GP100 will be able to compute at 64bit. Thus we have 1/2 the performance in 64bit precision vs 32bit.

The PDF representation is made that way (DP Units) so we can see both single and double precision capabilities. It doesnt mean there is an extra DP Cuda Core.

Timmah! · Apr 6, 2016

AtenRa said:
Guys, GP100 has 3840 Cuda Cores. There are no extra 64bit cores. There are LD/ST and SFUs though.

Each Cuda Core is comprised of an ALU and FPU. In order for the Cuda Core to be able to compute 64bit, it will need both 64bit Registers AND 64bit datapaths. So for GP100, in every Cuda Core pair only one of them has access to 64bit datapath and only half of the Cuda Cores of the entire GP100 will be able to compute at 64bit. Thus we have 1/2 the performance in 64bit precision vs 32bit.

The PDF representation is made that way (DP Units) so we can see both single and double precision capabilities. It doesnt mean there is an extra DP Cuda Core.

There seems to be major confusion about this and i still dont know whether to believe you or the people saying exact opposite.

AtenRa · Apr 6, 2016

Timmah! said:
There seems to be major confusion about this and i still dont know whether to believe you or the people saying exact opposite.

Well if anyone else has more technical data to share im all ears, but i find it completely illogical for NV to have an extra dedicated 64bit Cuda Core that will only work at 64bit. It is a complete and utterly waste of resources because you can use one of the other two CCs for the same task but also use it for 32bit as well.

Unless, GP100 has the ability to use both 32bit and 64bit simultaneously. Then yes for that scenario you would need a third CC capable of 64bit compute. But i find that completely unrealistic.

Sweepr · Apr 6, 2016

Inside Pascal: NVIDIA’s Newest Computing Platform

Today at the 2016 GPU Technology Conference in San Jose, NVIDIA CEO Jen-Hsun Huang announced the new NVIDIA Tesla P100, the most advanced accelerator ever built. Based on the new NVIDIA Pascal GP100 GPU and powered by ground-breaking technologies, Tesla P100 delivers the highest absolute performance for HPC, technical computing, deep learning, and many computationally intensive datacenter workloads.

In this blog post I’ll provide an overview of the Pascal architecture and its benefits to you as a developer.

Because of the importance of high-precision computation for technical computing and HPC codes, a key design goal for Tesla P100 is high double-precision performance.Each GP100 SM has 32 FP64 units, providing a 2:1 ratio of single- to double-precision throughput. Compared to the 3:1 ratio in Kepler GK110 GPUs, this allows Tesla P100 to process FP64 workloads more efficiently.

https://mygtc.gputechconf.com/form/...s=&session-tag-filters=&session-type-filters=

Some SKU predictions from Videocardz:

As of now, it’s unclear if NVIDIA is planning more GPUs. There are rumors that GP102 could be the GP100 for gamers, where NVLink and FP64 computing is not important. However the time for speculating about Big Pascal for gamers will definitely come at later time.

http://videocardz.com/58865/nvidia-1st-generation-pascal-speculation

#304 posts and 21K views in one day. Computex should be interesting.

moonbogg · Apr 6, 2016

Some people doubt GP100 will be a gaming chip. People said the same exact thing about GK110. Nvidia released GTX 680 and it was pretty good at the time so people thought the big chip was reserved for professional stuff only. Well no. No it wasn't. They will sell us the big chips again for our games and we will buy them with our body parts.

Face2Face · Apr 6, 2016

I'm pretty sure we'll see a GP100 in a consumer gaming variant, just not anytime soon.

It's also interesting to see the SM/CUDA arrangement, as it looks like what Mahigan said is coming true.

http://forums.anandtech.com/showpost.php?p=38117833&postcount=74

NVIDIA is evolving towards a more GCN-like architecture while AMD are refining GCN. GCN is more advanced than any NVIDIA architecture. People who claim that GCN is "old" simply don't understand GPU architectures.

Interestingly, the changes that Nvidia has been implementing in its streaming multiprocessors over the past several years, starting with 192 CUDA core Kepler SMX in 2011 to the Maxwell 128 CUDA core SMM and finally to Pascal have been morphing the company’s graphics architecture to something that’s much closer to that of AMD’s GCN. The basic building block of which, the Compute Unit, has 64 GCN cores.

Read more: http://wccftech.com/nvidia-gp100-pascal-gpu-specs-3840-cuda-cores/#ixzz453zR1x00

sontin · Apr 6, 2016

No, look at Fermi. Pascal is nearly the same.

Qwertilot · Apr 6, 2016

moonbogg said:
Some people doubt GP100 will be a gaming chip. People said the same exact thing about GK110. Nvidia released GTX 680 and it was pretty good at the time so people thought the big chip was reserved for professional stuff only. Well no. No it wasn't. They will sell us the big chips again for our games and we will buy them with our body parts.

Well I doubt if anyone sane is questioning the future existence of some huge, and very expensive, consumer chip from NV

I think the question is more if it'll be this chip or something else fairly big and distinct - this thing is really heavily quite specalised towards compute stuff and wouldn't be (relative to its size!) all that special for gaming.

Essentially unimportant of course.

antihelten · Apr 6, 2016

AtenRa said:
Well if anyone else has more technical data to share im all ears, but i find it completely illogical for NV to have an extra dedicated 64bit Cuda Core that will only work at 64bit. It is a complete and utterly waste of resources because you can use one of the other two CCs for the same task but also use it for 32bit as well.

Unless, GP100 has the ability to use both 32bit and 64bit simultaneously. Then yes for that scenario you would need a third CC capable of 64bit compute. But i find that completely unrealistic.

Nvidia has been using dedicated FP64 units ever since Kepler. Fermi (and earlier) used multiple FP32 cores to simulate a FP64 core.

GF1x4 has 48 FP32 cores per SM, 16 of which can perform FP64 operations at half speed (thus being equal to 8 dedicated FP64 units).
GF11x has 32 FP32 cores per SM, all of which can perform FP64 operations at half speed (thus being equal to 16 dedicated FP64 units).
GK10x has 8 dedicated FP64 units per SM, in addition to the 192 FP32 cores.
GK110 has 64 dedicated FP64 units per SM, in addition to the 192 FP32 cores.
GM107 and GM20x has 4 dedicated FP64 units per SM, in addition to the 128 FP32 cores.
GP100 has 32 dedicated FP64 units per SM, in addition to the 64 FP32 cores.

None of the above can utilize the FP32 and FP64 units within a single SM at the same time to my knowledge (although I might be wrong on that one).

AtenRa · Apr 6, 2016

antihelten said:
Nvidia has been using dedicated FP64 units ever since Kepler. Fermi (and earlier) used multiple FP32 cores to simulate a FP64 core.

GF1x4 has 48 FP32 cores per SM, 16 of which can perform FP64 operations at half speed (thus being equal to 8 dedicated FP64 units).

GF11x has 32 FP32 cores per SM, all of which can perform FP64 operations at half speed (thus being equal to 16 dedicated FP64 units).

GK10x has 8 dedicated FP64 units per SM, in addition to the 192 FP32 cores.

GK110 has 64 dedicated FP64 units per SM, in addition to the 192 FP32 cores.

GM107 and GM20x has 4 dedicated FP64 units per SM, in addition to the 128 FP32 cores.

GP100 has 32 dedicated FP64 units per SM, in addition to the 64 FP32 cores.

None of the above can utilize the FP32 and FP64 units within a single SM at the same time to my knowledge (although I might be wrong on that one).

Its a different thing to talk about dedicated 64bit FP units and another thing to talk about 64bit Cuda Cores.
If GP100 only has dedicated 64bit FP Uinits then ok i can go with that. But people here were talking about another 64bit Cuda Core.

antihelten · Apr 6, 2016

AtenRa said:
Its a different thing to talk about dedicated 64bit FP units and another thing to talk about 64bit Cuda Cores.
If GP100 only has dedicated 64bit FP Uinits then ok i can go with that. But people here were talking about another 64bit Cuda Core.

Actually it isn't different at all, Nvidia uses the names FP64 unit / FP64 CUDA core interchangeably.

Just have a look at their GP100 article

Each GP100 SM has 32 FP64 units

AtenRa · Apr 6, 2016

antihelten said:
Actually it isn't different at all, Nvidia uses the names FP64 unit / FP64 CUDA core interchangeably.

FP Unit and Cuda Core is not the same.

Cuda Core is comprise of an ALU + FPU.

MrTeal · Apr 6, 2016

I think some people are getting extremely optimistic on the area savings we'd see on a DP stripped GP104

With Fermi GF100, nVidia had each CUDA core capable of executing FP64, at 1/2 rate of FP32. For GF104, the DP was removed from two of three cores, giving a 1/12 rate. Effectively though, the one FP64 core was also counted as one of the FP32 cores. source

With GK104, each SMX had 192 CUDA cores, plus eight special FP64 CUDA cores that weren't counted in the general number and could only do FP64. source

With GK110, each SMX had 192 FP32 CUDA cores plus 64 FP64 CUDA cores.

Looking at the die shots and measuring the area of the GPCs (since they're easier to isolate in GK110 than the SMXs) a three SMX GK110 GPC is ~66.47mm² while a two SMX GK104 GPC is 36.84mm². That's not a wildly different number of FP32 per area between the two. For GK110, you get 8.67 FP32 CC / mm², while for GK104 it's 10.42 FP32 CC/mm². Overall on the whole chips though, Big Kepler at 1/3 FP64 had basically an identical (actually a little higher) number of FP32 CC/mm² than GK104.

There's more to strip with Pascal since its 1/2, so you might get better scaling there than in the Kepler generation. If AMD's experience with HBM is anything to go by, switching back to a GDDR5(X) interface for GP104 would cause an area increase vs GP100. I think anyone expecting a 4 GPC 2560 CC GP104 to come in at or less than 250mm² is being quite optimistic. That's 63% more FP32 cores per mm² of chip than GP100, over the whole chip.

antihelten · Apr 6, 2016

AtenRa said:
FP Unit and Cuda Core is not the same.

Cuda Core is comprise of an ALU + FPU.

First off, I never said FP64 FP unit, I said FP64 unit (this may be where the confusion stems from). Nvidia clearly uses the two terms (FP64 unit and FP64 CUDA core) interchangeably, and has done so ever since the introduction of Kepler.

It may very well be that what Nvidia interchangeably calls a FP64 unit or a FP64 CUDA cores, does not contain all the hardware you would normally expect from a CUDA core, but then you would have to take that up with Nvidia, since they are the ones messing up their own terms then.

AtenRa · Apr 6, 2016

antihelten said:
First of I never said FP64 FP unit, I said FP64 unit (this may be where the confusion stems from). Nvidia clearly uses the two terms (FP64 unit and FP64 CUDA core) interchangeably, and has done so ever since the introduction of Kepler.

NV uses the FP64 CUDA Core nomenclature to illustrate how many CUDA Cores can compute at 64bit precision.

From your link,

What this means is that half (32) of the Cuda Cores per SM (64) are 64bit FP capable. It doesnt mean there is a third 64bit Cuda Core present.

hawtdawg · Apr 6, 2016

We'll see the exact same type of release schedule as last time. GP104 for a year, then probably a Titan GP100 and then a 1080ti. Then they'll do the same with Volta, only will hopefully be on 10nm, otherwise they're stuck in the same position as they were with Maxwell, and I have a tough time believing they'll be able to get a similar jump from Pascal on the same node.

antihelten · Apr 6, 2016

AtenRa said:
NV uses the FP64 CUDA Core nomenclature to illustrate how many CUDA Cores can compute at 64bit precision.

From your link,

What this means is that half (32) of the Cuda Cores per SM (64) are 64bit FP capable. It doesnt mean there is a third 64bit Cuda Core present.

Wrong, Nvidia has used dedicated FP64 cores/units (whatever you want to call them) ever since Kepler.

From the Anandtech GTX 680 review:

The other change coming from GF114 is the mysterious block #15, the CUDA FP64 block. In order to conserve die space while still offering FP64 capabilities on GF114, NVIDIA only made one of the three CUDA core blocks FP64 capable. In turn that block of CUDA cores could execute FP64 instructions at a rate of ¼ FP32 performance, which gave the SM a total FP64 throughput rate of 1/12th FP32. In GK104 none of the regular CUDA core blocks are FP64 capable; in its place we have what we’re calling the CUDA FP64 block.

From the GK110 whitepaper:

Every hardware unit in Kepler was designed and scrubbed to provide outstanding performance per watt.
The best example of great perf/watt is seen in the design of Kepler GK110’s new Streaming
Multiprocessor (SMX), which is similar in many respects to the SMX unit recently introduced in Kepler
GK104, but includes substantially more double precision units for compute algorithms.

...

SMX: 192 single‐precision CUDA cores, 64 double‐precision units, 32 special function units (SFU), and 32 load/store units
(LD/ST).

From the GM204 whitepaper:

Figure 3: GM204 SMM Diagram (GM204 also features 4 DP units per
SMM, which are not depicted on this diagram)

And of course finally from Nvidias GP100 article:

Each GP100 SM has 32 FP64 units

Kris194 · Apr 6, 2016

Silverforce11 said:
If you notice in the big diagram, what the heck are TPCs by the way? Only info I found, dates back a long time. Texture/Thread Processing Cluster. It is separate from TMUs.

If I had to guess, I would say its a Thread Processing Cluster and that's the Hardware Scheduler.

Where do you see it? What I see is Tex, not TPC.

antihelten · Apr 6, 2016

Kris194 said:
Where do you see it? What I see is Tex, not TPC.

It's quite clearly TPC in this figure:

AtenRa · Apr 6, 2016

antihelten said:
Wrong, Nvidia has used dedicated FP64 cores/units (whatever you want to call them) ever since Kepler.

From the Anandtech GTX 680 review:

From the GK110 whitepaper:

From the GM204 whitepaper:

And of course finally from Nvidias GP100 article:

As i have said before, there is not an extra 64bit CUDA Core but a dedicated 64bit FP Unit.

CUDA Core = ALU + FP Unit

DP Unit in GP100 = 64bit FP Unit.

Those two are different. The 64bit FP Unit is very small compared to a CUDA Core.

So even if this 64bit FP Unit is inside half of the CUDA Cores or if it is outside of the CUDA Core we are still only have 3840 CUDA Cores.

So at the end we have one of this,

CUDA Core 1 = 32bit ALU + 32bit FPU
CUDA Core 2 = 32bit ALU + 32bit FPU
+
Dedicated 64bit FP Unit.

Or

CUDA Core 1 = 32bit ALU + 32bit FPU
CUDA Core 2 = 32bit ALU + 32bit FPU/64bit FPU

But always only 2x CUDA Cores.

96Firebird · Apr 6, 2016

Kris194 said:
Where do you see it? What I see is Tex, not TPC.

I believe he means from this layout:

Edit - Beat!

antihelten · Apr 6, 2016

AtenRa said:
As i have said before, there is not an extra 64bit CUDA Core but a dedicated 64bit FP Unit.

CUDA Core = ALU + FP Unit

DP Unit in GP100 = 64bit FP Unit.

Those two are different. The 64bit FP Unit is very small compared to a CUDA Core.

And as I've said countless times now, Nvidia also calls these units FP64 CUDA cores.

AtenRa · Apr 6, 2016

antihelten said:
And as I've said countless times now, Nvidia also calls these units FP64 CUDA cores.

Im with you but here people were talking about 3840 + 1920 = 5760 CUDA Cores. Which of course is not true.

Kris194 · Apr 6, 2016

antihelten said:
such as returning to the same 128 cores per SM layout that Maxwell had.

Why would they do that? It doesn't make any sense when you will consider that GCN like architecture will give additional boost in games developed with GCN architecture in mind. It's win-win for Nvidia. Also why people say that consumer version of GP100(GP102?) will have 25% higher clocks than GM 200 while Tesla P100 offers 40% higher base clock than Tesla M40.

NVIDIA Pascal Thread

Lifer

Lifer

Golden Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Golden Member

Lifer

Golden Member

Lifer

Diamond Member

Golden Member

Lifer

Golden Member

Golden Member

Member

Golden Member

Lifer

Diamond Member

Golden Member

Lifer

Member