NVIDIA Pascal Thread

AtenRa · Apr 4, 2016

poofyhairguy said:
The $300+ market is better. Easily. Why?

1. Most of the margin/profit is made on these cards.

2. Having the fastest overall card on the market creates a halo effect for lessor cards- basically uninformed customers that buy something like a 750 Ti because they hear "nvidia is the best (as based on the 980 ti)."

nope.

You may have higher margins but you will not only loose the higher volume $100-250 Desktop segment but the higher margin Laptop market.

So at the end you loose both volume and margins.

Sweepr · Apr 4, 2016

4GB GDDR5 (GP106?)

Vesku · Apr 4, 2016

Sweepr said:
4GB GDDR5 (GP106?)

Or some version of GP104, i.e. not necessarily full die, with sub 100W targets per card, that would put it close to 2x efficiency over GTX 980 (4.6GFLops @ ~165W) right? Whole of Drive PX2 is supposed to draw about 250W. GDDR5X on first wave of node shrink GPUs looking more unlikely, imo.

jpiniero · Apr 4, 2016

It's actually probably the cut GP106. I'll guess that the GP108 (?) IGP probally gives 1 TF for both cores. So it's probably 3.5 TF for each dGPU which is roughly the performance of the 970.

C@mM! · Apr 4, 2016

Considering availability of GDDR5x, I can't see any high ends releasing for Nvidia until September at the earliest. I daresay even those high ends will be 'low high end' cards like the 980, with HBM2 being saved for 'Ti' versions.

Personally, I reckon we will have lower end cards launch w/ GDDR5 to beat the gun on Polaris based GPU's, with Nvidia relying on its better memory compression and earlier launch to head off the midrange assault by AMD, then drip feeding of cards as memory becomes available throughout the 2nd half.

Vesku · Apr 5, 2016

jpiniero said:
It's actually probably the cut GP106. I'll guess that the GP108 (?) IGP probally gives 1 TF for both cores. So it's probably 3.5 TF for each dGPU which is roughly the performance of the 970.

But then shouldn't JHH have had GM107/108 on his mockup of Drive PX2 instead of GM204? To better represent actual die size.

xpea · Apr 5, 2016

At GTC2016 some partners start to talk about servers with Pascal and NVLink:
http://www.servethehome.com/quanta-qct-announcing-x86-server-nvidia-pascal-nvlink-support/
Pascal will be announced for sure tomorrow at the opening keynote by JHH

Timmah! · Apr 5, 2016

so, six more hours to the keynote?

Glo. · Apr 5, 2016

24 Deep Learning TFLOPs of compute power translates to 8 TFLOPs of Single Precision compute power. So the GPUs in DrivePX module have to have 2048 CUDA cores and core clock locked at 975 MHz.

It is GP106 module die in my humble opinion.

JDG1980 · Apr 5, 2016

Vesku said:
Or some version of GP104, i.e. not necessarily full die, with sub 100W targets per card, that would put it close to 2x efficiency over GTX 980 (4.6GFLops @ ~165W) right? Whole of Drive PX2 is supposed to draw about 250W. GDDR5X on first wave of node shrink GPUs looking more unlikely, imo.

GTX 980M (which is the MXM module that was shown in the prototype) uses a cut GM204 GPU (1536 of 2048 shaders enabled) and can do over 3 TFlops each. The slide says 6 TFlops total, so two GTX 980M modules alone already gets you there, even without taking into account the integrated GPUs in the Tegra chips. Are you seriously arguing that GP104 will show no performance improvement whatsoever over GM204?

No, it has to be GP106 at most. I suppose it could be GP107, depending on how powerful the Tegra iGPUs are. (GM107 does 1.3 TFlops, double that for the die-shrink and you've got 2.6, and there are two modules so you've got a total of 5.2, now the two Tegras only need to do 0.8 TFlops combined to hit the specified target. Could work.)

Vesku said:
But then shouldn't JHH have had GM107/108 on his mockup of Drive PX2 instead of GM204? To better represent actual die size.

I suspect the board was an actual working prototype and thus needed to use GM204 in order to have adequate computing power.

FatherMurphy · Apr 5, 2016

Perhaps this isn't the proper thread for this but: http://videocardz.com/58795/quantaplex-is-the-first-x86-server-with-pascal-nvlink

I thought NVlink was not compatible with x86, but there you go. Or does this not necessarily mean that the Pascal GPUs are connected to the CPU via NVlink, just that the GPUs are connected to each other via NVlink?

Anywho, I thought it might suggest that Nvidia has updated the capabilities of Pascal and its new technologies since Nvidia last spoke in depth about it.

On a side note, perhaps NVlink will be used for XDMA-type mGPU?

Silverforce11 · Apr 5, 2016

http://www.pcgameshardware.de/Nvidia-Pascal-Hardware-261713/News/Drive-PX-2-GDDR5-GP106-1191334/

80GB/s vram bandwidth via GDDR5 on the Drive module, if true, its a small and weak bus chip, GP107 potentially.

dacostafilipe · Apr 5, 2016

FatherMurphy said:
I thought NVlink was not compatible with x86, but there you go.

It's an interconnect and should work on everything. It's just that x86 can't "talk" NVLink so you need to use a PCIe-switch (for example).

AtenRa · Apr 5, 2016

Silverforce11 said:
http://www.pcgameshardware.de/Nvidia-Pascal-Hardware-261713/News/Drive-PX-2-GDDR5-GP106-1191334/

80GB/s vram bandwidth via GDDR5 on the Drive module, if true, its a small and weak bus chip, GP107 potentially.

Polaris 11 competition ??

Im getting very exited about this new 14/16nm GPUs, 28nm seams so long ago :thumbsup:

Glo. · Apr 5, 2016

Silverforce11 said:
http://www.pcgameshardware.de/Nvidia-Pascal-Hardware-261713/News/Drive-PX-2-GDDR5-GP106-1191334/

80GB/s vram bandwidth via GDDR5 on the Drive module, if true, its a small and weak bus chip, GP107 potentially.

No chance. Whole thing has 8 TFLOPs of compute power. So it has to have 4096 CUDA cores, or 2 GPUs with 2048 CUDA cores. It is GP106, or simply GTX1060.

Thats why it also has 128 bit memory bus with 80 GB/s.

Adored · Apr 5, 2016

Must be some heavy duty compression going on in that. Starting to look like GDDR5 + 256-bit is going to be enough to even beat the 980 Ti.

Silverforce11 · Apr 5, 2016

Glo. said:
No chance. Whole thing has 8 TFLOPs of compute power. So it has to have 4096 CUDA cores, or 2 GPUs with 2048 CUDA cores. It is GP106, or simply GTX1060.

Thats why it also has 128 bit memory bus with 80 GB/s.

That thing has 2 dGPU and 2 iGPU, and also the ARM clusters. 8TFlops is combined between them all.

Don't be so quick to rule out a very small chip. It makes more sense for yields and profits.

xthetenth · Apr 5, 2016

If they can chain two small GPUs together and get good scaling that would make a lot of sense depending on yields.

Silverforce11 · Apr 5, 2016

Adored said:
Must be some heavy duty compression going on in that. Starting to look like GDDR5 + 256-bit is going to be enough to even beat the 980 Ti.

No, just a weird concept that some folks are discounting from the 8TFlops the iGPU and ARM cores.

2x GP107 + 2x iGPU + 2x ARM cluster = 8TFlops, easily achieved.

Adored · Apr 5, 2016

Silverforce11 said:
No, just a weird concept that some folks are discounting from the 8TFlops the iGPU and ARM cores.

2x GP107 + 2x iGPU + 2x ARM cluster = 8TFlops, easily achieved.

Ah yeah, I should have looked at the actual slide. Yeah GP107 makes more sense.

nvgpu · Apr 5, 2016

http://vrworld.com/2016/04/05/nvidias-drive-px2-shows-next-gen-tegra-pascal-gpu/

DRIVE PX 2 already shipped to initial Tier 1 customers

All those people posting FUD back in January are eating crow now again.

Arachnotronic · Apr 5, 2016

nvgpu said:
http://vrworld.com/2016/04/05/nvidias-drive-px2-shows-next-gen-tegra-pascal-gpu/

All those people posting FUD back in January are eating crow now again.

Yep.

antihelten · Apr 5, 2016

Silverforce11 said:
No, just a weird concept that some folks are discounting from the 8TFlops the iGPU and ARM cores.

2x GP107 + 2x iGPU + 2x ARM cluster = 8TFlops, easily achieved.

That would mean an unusually powerful 107 chip, when compared to previous 107 chips (i.e. stuff like GM107, GK107 and GF107).

The iGPUs are probably a single Pascal SMM (or whatever Nvidia calls their shader clusters this time around), capable of 0.25-0.3 TFLOPS a piece. The 12 ARM cores are probably only capable of a very negligible amount (say 0.01 TFLOPS a piece), which then leaves roughly 7-7.5 TFLOPS for the 2 discrete Pascal GPUs, which as previously mentioned would put them at 970 level.

I don't think a 107 chip has ever matched a cut down 104 chip from the previous generation before. For instance the GTX 650 (GK107) was roughly 35-40% slower than the GTX 560 (cut down GF114), in fact it took a cut down 106 chip (GTX 650 Ti), to roughly match the 560.

Whilst anything is possible of course, it would seem to me that it would seem more likely that we're looking at a pair of cut down GP106 chip.

nvgpu said:
http://vrworld.com/2016/04/05/nvidias-drive-px2-shows-next-gen-tegra-pascal-gpu/

All those people posting FUD back in January are eating crow now again.

That's for the Maxwell based PX 2's, the Pascal based ones aren't shipping until Q3:

The topic was a deep dive into the DRIVE PX2, autonomous drive development kit which will start shipping later this year in its full performance capability – as the current units are only being shipped with Maxwell-class GPUs to Tier 1 customers.

...

Do note that Pascal-based DRIVE PX 2, one we describe in this article should ship during the third quarter of 2016.

tential · Apr 5, 2016

Why are you asking people to read the actual article antihelten? Not cool!

poofyhairguy · Apr 5, 2016

So the big April announcement came and went and we learned nothing really about desktop GPU plans?

I am starting to get a bad feeling Nvidia won't have any new GPUs till like September or something.

NVIDIA Pascal Thread

Lifer

Diamond Member

Diamond Member

Lifer

Member

Diamond Member

Senior member

Golden Member

Diamond Member

Golden Member

Senior member

Lifer

Senior member

Lifer

Diamond Member

Senior member

Lifer

Golden Member

Lifer

Senior member

Senior member

Lifer

Golden Member

Diamond Member

Lifer