NVIDIA Pascal Thread

Adored · Apr 8, 2016

ShintaiDK said:
I own a GTX980, the answer is yes.

Do you get different results from TPU? Or how can you tell it's a bandwidth issue instead of something else?

ShintaiDK · Apr 8, 2016

Adored said:
Do you get different results from TPU? Or how can you tell it's a bandwidth issue instead of something else?

There is no memory OC on a GTX980 vs stock in the TPU you linked. So the answer is yes.

Adored · Apr 8, 2016

ShintaiDK said:
There is no memory OC on a GTX980 vs stock in the TPU you linked. So the answer is yes.

The 980 Ti (the Gigabyte one) has the overclocked memory, not the 980. Everything else is at stock.

ShintaiDK · Apr 8, 2016

Adored said:
The 980 Ti (the Gigabyte one) has the overclocked memory, not the 980. Everything else is at stock.

We are not talking about the GTX980TI but the GTX980.

jpiniero · Apr 8, 2016

ShintaiDK said:
We are not talking about the GTX980TI but the GTX980.

You have to remember that any X70/X80 would have 8 Ghz GDDR5 though so you would get higher bandwidth than what the 980 has.

ShintaiDK · Apr 8, 2016

jpiniero said:
You have to remember that any X70/X80 would have 8 Ghz GDDR5 though so you would get higher bandwidth than what the 980 has.

Its only 14% tho in theoretical bandwidth. Its still going to need something extra.

Adored · Apr 8, 2016

We're still talking small margins here. Even with a heavy memory overclock any halfway-sensible GPU is ultimately limited by it's raw shader power.

A 980 Ti with a 256-bit bus would be unlikely to lose more than 10% performance same as a 980 with a 384-bit bus would be unlikely to gain more than 5%. Obviously a lot depends on the actual game being played too.

Silverforce11 · Apr 8, 2016

Spot on with the predictions earlier, @JDC & others who nailed it.

2560 SP (4x GPC, GP100 has 6x GPC with FP64), ~300mm2 size. This is a lean mean gaming chip.

They are going to need some major improvements in their memory compression.

But expect ~Titan X +15% performance at ~150W. In newer titles and GCN-optimized/DX12 games, you can expect a much bigger lead, potentially Titan X + 25-35%. It's definitely going to look more better against GCN in the new era of games than Maxwell, or LOL-Kepler.

With that down, GP106 will be 1280 SP and GP107 will be 680 SP.

^ The amazing thing, AMD's (Polaris 10: 2560 SP, Polaris 11: 1,280 SP) and NV's chips will have identical core counts overall at each segment here and in the same arrangement of cores per SM/TMU/ROP block!

Sweepr · Apr 8, 2016

2/3 of GP100's shaders at <1/2 the die size. Probably very agressively clocked too.

Bodes well for a future 'GP102' gaming GPU. We could be looking at up to 5120 SPs.

Kris194 · Apr 8, 2016

5120SPs? Maybe with Volta, definitely not with Pascal.

Silverforce11 · Apr 8, 2016

Sweepr said:
2/3 of GP100's shaders at <1/2 the die size. Probably very agressively clocked too.

Bodes well for a future 'GP102' gaming GPU. We could be looking at up to 5120 SPs.

GP104 is expected to be 4/6 GPCs compared to GP100, it's what NV has been doing for a long time. There is no precedent for change.

Vega 11 is the competitor to GP104. Polaris 10 is just too small to compete and it will face off versus GP106 as expected. If NV releases in retail Q3, they have a few months where there's no competition for the $400 - $550 segment.

USER8000 · Apr 8, 2016

Silverforce11 said:
Vega 11 is the competitor to GP104. Polaris 10 is just too small to compete and it will face off versus GP106 as expected. If NV releases in retail Q3, they have a few months where there's no competition for the $400 - $550 segment.

The problem is that you are forgetting that both companies are not using the same process node. A 232MM2 Polaris 10 is probably closer to a 250MM2 to 260MM2 16NM GPU,so in the end Polaris 10 is closer to the GP104. Plus AMD has a history of producing more transistor dense designs too.

nvgpu · Apr 8, 2016

https://forum.beyond3d.com/threads/nvidia-pascal-announcement.57763/page-13#post-1905723

GP100 is fully graphics capable. They aren't drawn on the diagrams, but it has display controllers, ROPs, etc. And I agree a Quadro is a good bet at some point.

Silverforce11 · Apr 8, 2016

nvgpu said:
https://forum.beyond3d.com/threads/nvidia-pascal-announcement.57763/page-13#post-1905723

We know that already. Only silly people would suggest it has no ROPs and only made for HPC. -_-

You guys can expect to wait until Q2 or Q3 2017 for a GTX GP100 Titan-class GPU.

GP104 is all there's gonna be for a long time at the high end.

Kris194 · Apr 8, 2016

Who is this Ryan Smith guy? Is he reliable?

Silverforce11 · Apr 8, 2016

Kris194 said:
Who is Ryan Smith?

Dude!

Anandtech editor.

NV's prior Tesla announcement don't list ROPs in their charts, don't show ROPs in their diagram, but it's got ROPs. Nothing surprising and I have to question why some folks are failing so hard at basic analysis.

There's no GP102 for gaming.

There's a huge GP100 for HPC, at massive profits each chip. The rest, gets made into a GTX after HPC demands are met. Each wafer, NV still banks a ton of money due to the HPC sales subsidizing the consumer GTX.

GP104 is the chip where they can yield better, at ~294mm2, expect the good dies to go into a Tesla refresh as well. Along with the $550-600 GTX 980 replacement. The harvested chip aka 970 replacement, can go for a good bang for buck, depending on how generous JHH feels.

The only reasons why there's no GP102 for gaming, yields of a bigger ~450mm2 chip will be worse. It competes with GP100 wafers, but per wafer, it earns much less because it can't go into expensive Tesla (nobody sane would pay $12K for each with neutered FP64) and it's big enough to suffer yield issues. As a gaming only chip, it's DoA, impossible unless the node is very mature, high volume and excellent yielding to make it worthwhile.

The ultimate driving logic at NVIDIA: How does JHH get the most $$ per 16nm wafer?

Silverforce11 · Apr 8, 2016

Oh, btw, guys!! Good news!!

Pascal has basic Async Compute support on the hardware level!

http://www.theregister.co.uk/2016/04/06/nvidia_gtc_2016/

Software running on the P100 can be preempted on instruction boundaries, rather than at the end of a draw call.

This means a thread can immediately give way to a higher priority thread, rather than waiting to the end of a potentially lengthy draw operation. This extra latency – the waiting for a call to end – can really mess up very time-sensitive applications, such as virtual reality headsets. A 5ms delay could lead to a missed Vsync and a visible glitch in the real-time rendering, which drives some people nuts.

By getting down to the instruction level, this latency penalty should evaporate, which is good news for VR gamers. Per-instruction preemption means programmers can also single step through GPU code to iron out bugs.

BOOOM! That's your real DX12 hardware, your real VR hardware. Any VR gamer that buys a Maxwell for VR, expect to upgrade to Pascal for vastly better motion to photon latency and less puke-factor.

NV is going to bank it big time on Pascal because of how gimped Maxwell is in VR for latency, people will realize soon enough, they bought obsolete GPUs for VR and will all upgrade.

What genius marketing PR. I am impressed. They held back talking about Async Compute all this time, promising it in their drivers... when Pascal comes, Async Compute is going to be "enabled" in the drivers, for Pascal.

There's gonna be a lot of VR coverage from NV for Pascal, a LOT. You'll see blog posts about how improvements in Pascal enable Async Timewarp latencies to drop below that magical 20ms barrier that is deemed acceptable. Much better than Maxwell. All the early VR adopters will upgrade in droves and NV is gonna laugh all the way to the bank. Pure genius.

raghu78 · Apr 8, 2016

Silverforce11 said:
Spot on with the predictions earlier, @JDC & others who nailed it.

2560 SP (4x GPC, GP100 has 6x GPC with FP64), ~300mm2 size. This is a lean mean gaming chip.

They are going to need some major improvements in their memory compression.

But expect ~Titan X +15% performance at ~150W. In newer titles and GCN-optimized/DX12 games, you can expect a much bigger lead, potentially Titan X + 25-35%. It's definitely going to look more better against GCN in the new era of games than Maxwell, or LOL-Kepler.

With that down, GP106 will be 1280 SP and GP107 will be 680 SP.

^ The amazing thing, AMD's (Polaris 10: 2560 SP, Polaris 11: 1,280 SP) and NV's chips will have identical core counts overall at each segment here and in the same arrangement of cores per SM/TMU/ROP block!

Silverforce Polaris 11 is roughly 110 sq mm and smaller than GK107 (118 sq mm) and Cape Verde (123 sq mm).

http://www.anandtech.com/show/9886/amd-reveals-polaris-gpu-architecture

"In any case, the GPU RTG showed off was a small GPU. And while Rajas hand is hardly a scientifically accurate basis for size comparisons, if I had to guess I would wager its a bit smaller than RTGs 28nm Cape Verde GPU or NVIDIAs GK107 GPU, which is to say that its likely smaller than 120mm"

If Polaris 11 is 1280sp then Polaris 10 is definitely 3072 sp assuming a 232 sq mm GPU. If you look at past GPUs like HD 7770/HD 7870 or GTX 960/GTX 980. The larger GPU with twice the resources is not 2x the die size. Its smaller as there are many components like PCI-E/video controller, video codec engines etc which are not doubled.

http://techreport.com/review/22573/amd-radeon-hd-7870-ghz-edition

HD 7770 - 123 sq mm
HD 7870 - 212 sq mm

http://techreport.com/review/27702/nvidia-geforce-gtx-960-graphics-card-reviewed

GTX 960 - 227 sq mm
GTX 980 - 398 sq mm

If we take Polaris 11 as 110 sq mm it would put a doubling of Polaris 10 at 110 x 1.75 = 192 sq mm. Instead we have a 232 sq mm GPU which hints at 3072 sp. There is also another hint that Polaris 10 has a slightly different organization.

http://videocardz.com/58634/amd-confirms-polaris-10-is-ellesmere-and-polaris-11-is-baffin

"Heres a difference between ELLESMERE and BAFFIN. It appears that ELLESMERE (Polaris 10) will have 256-bit (8*32) memory bus, while BAFFIN (Polaris 11) will feature 128-bit (4*32). Also the graphics pipes are different, where Polaris 10 has 6, while Polaris 11 has 5. I dont know if these correspond to Compute Units, but those two lines were the only differences between Polaris 10 and 11 in this file."

If we take Polaris 11 as 1280 = 640 x 2 (shader engines SE)
(64 x 5) x 2 = 640
(2 groups of 5 CUs per SE)

Polaris 10 as 3072 sp = 768 x 4 (shader engines SE)
= (64 x 6) x 2 = 768
(2 groups of 6 CUs per SE)

So we can see there is a good chance that Polaris 10 is 3072 sp. If we combine that with higher clocks, improved command processor with improved sp (higher IPC) and improved memory controller, L2 cache and memory compression I think we can see why Polaris 10 might be a potent combination of perf and perf/watt and likely to beat Fury X at 1080p/1440p. 4K is not going to be Polaris 10's forte. Thats for the Vega family.

Silverforce11 · Apr 8, 2016

@raghu78
I suspect given the new uarch (read that patent paper in the 16 compute unit thread), each extra SP requires an increase in back-end, front-end and registers to support it's newly gained "Hyper-threading" per SIMD based on ALUs, Vector and Scalar threads. So the die sizes need to go UP disproportionately to support the increase in SP.

There's more info in that other thread, but GCN 4 is damn amazing on paper at this point, basically much better than I expected.

sontin · Apr 8, 2016

Silverforce11 said:
Oh, btw, guys!! Good news!!

Pascal has basic Async Compute support on the hardware level!

http://www.theregister.co.uk/2016/04/06/nvidia_gtc_2016/

BOOOM! That's your real DX12 hardware, your real VR hardware. Any VR gamer that buys a Maxwell for VR, expect to upgrade to Pascal for vastly better motion to photon latency and less puke-factor.

NV is going to bank it big time on Pascal because of how gimped Maxwell is in VR for latency, people will realize soon enough, they bought obsolete GPUs for VR and will all upgrade.

What genius marketing PR. I am impressed. They held back talking about Async Compute all this time, promising it in their drivers... when Pascal comes, Async Compute is going to be "enabled" in the drivers, for Pascal.

There's gonna be a lot of VR coverage from NV for Pascal, a LOT. You'll see blog posts about how improvements in Pascal enable Async Timewarp latencies to drop below that magical 20ms barrier that is deemed acceptable. Much better than Maxwell. All the early VR adopters will upgrade in droves and NV is gonna laugh all the way to the bank. Pure genius.

Do you even understand what you are talking? Preemption has nothing to do with Async Compute. :\

Kris194 · Apr 8, 2016

remove please

Silverforce11 · Apr 8, 2016

sontin said:
Do you even understand what you are talking? Preemption has nothing to do with Async Compute. :\

Note I said "basic" Async Compute.

Do you understand one of the crippling problem for NV's current GPU is their crushing context switches at draw call boundaries whenever compute needs to run?

This new pre-emption feature basically nullifies that weakness.

When an Async Compute task is called, like Async Timewarp (go read the Occulus blog), on Maxwell, it can't do it, it has to wait for the current graphics in the pipeline to finish first. Stall. It's going to miss the timewarp window and the user sees a major stutter/lag.

With their change in hardware to allow priority context switches immediately, they don't have to wait for the graphics rendering to finish, that async timewarp call is processed instantly.

This is a major change for VR, and while they may or may not be able to run graphics + compute in parallel, they won't be as neutered losing performance when Async Compute is in play.

You wait and see. I will be 100% correct on this. Keep an eye on those NV VR blogs, they will go all wild on this when Pascal is launched.

sontin · Apr 8, 2016

Silverforce11 said:
Do you understand one of the crippling problem for NV's current GPU is their crushing context switches at draw call boundaries whenever compute needs to run?

This new pre-emption feature basically nullifies that weakness.

When an Async Compute task is called, like Async Timewarp (go read the Occulus blog), on Maxwell, it can't do it, it has to wait for the current graphics in the pipeline to finish first. Stall. It's going to miss the timewarp window and the user sees a major stutter/lag.

With their change in hardware to allow priority context switches immediately, they don't have to wait for the graphics rendering to finish, that async timewarp call is processed instantly.

This is a major change for VR, and while they may or may not be able to run graphics + compute in parallel, they won't be as neutered losing performance when Async Compute is in play.

You wait and see. I will be 100% correct on this.

Like i said: You havent understood what you talking about. :thumbsdown:
It would be great if you and other could stop this buzzword bingo and dont create fanfiction.

And no with Async Timewarp nVidia is preempting this workload after a draw call and interrupting a regular one. They dont let it run concurrently with the graphics pipeline. And there is no context switch with Async Compute because the Compute queue will be put after the graphics queue...

railven · Apr 8, 2016

Silverforce11 said:
Spot on with the predictions earlier, @JDC & others who nailed it.

2560 SP (4x GPC, GP100 has 6x GPC with FP64), ~300mm2 size. This is a lean mean gaming chip.

They are going to need some major improvements in their memory compression.

But expect ~Titan X +15% performance at ~150W. In newer titles and GCN-optimized/DX12 games, you can expect a much bigger lead, potentially Titan X + 25-35%. It's definitely going to look more better against GCN in the new era of games than Maxwell, or LOL-Kepler.

With that down, GP106 will be 1280 SP and GP107 will be 680 SP.

^ The amazing thing, AMD's (Polaris 10: 2560 SP, Polaris 11: 1,280 SP) and NV's chips will have identical core counts overall at each segment here and in the same arrangement of cores per SM/TMU/ROP block!

Sign me up! Polaris/Pascal whomever tickles my fancy shall get me monies!

Silverforce11 · Apr 8, 2016

sontin said:
They dont let it run concurrently with the graphics pipeline. And there is no context switch with Async Compute because the Compute queue will be put after the graphics queue...

So you can tell VR gamers to not move their heads until they are at a graphics draw call completion?

Don't be silly man.

That Async Timewarp needs to fire as soon as people move their heads, that's the entire point of it. You cannot control when people move to fall in-line to prevent stalls of Async Timewarp. -_-

But you just wait, when Pascal GTX debuts, the VR hype from NV is gonna be all aboard that latency train. We can come back and discuss how you are wrong, again.

NVIDIA Pascal Thread

Senior member

Lifer

Senior member

Lifer

Lifer

Lifer

Senior member

Lifer

Diamond Member

Member

Lifer

Golden Member

Senior member

Lifer

Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Member

Lifer

Diamond Member

Diamond Member

Lifer