NVIDIA Pascal Thread

Page 53 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jpiniero

Lifer
Oct 1, 2010
15,176
5,717
136
The only reasons why there's no GP102 for gaming, yields of a bigger ~450mm2 chip will be worse.

Any working GP100 chip is going to need to be multiple K. It's just unrealistic to think they could get many gamers to buy into that. High end Quadros yes, and I'm sure they could get Tesla buyers with less cores too. Yields at TSMC are obviously better than you think, otherwise you wouldn't be seeing a 610 mm2 die at this point.

Even a theoretical 450 mm2 GP102 would still be crazy expensive, but you could probably sell full dies as mid range Quadros and cut as $1099+ Titans.
 
Feb 19, 2009
10,457
10
76
@jpiniero
Not necessarily.

It's $ per wafer they can get. If a bunch of dies goes to $12k Tesla, a bunch more goes to $5K Quadros, the rest can be $1.5K GTX Titans. Overall they will make a lot per wafer that way. Telsa/Quadro in effect, subsidizes the existence of a huge high-end gaming chip from NV. Like it has for many generations.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
If Polaris 10 doesn't get GDDR5X, its bottlenecked. Then a 2500-2800sp cut down Vega 11 part is going to be much faster. Not to mention it could have another TMU/ROP layout as well.

dude look back at AMD's true last gen stack (without Fiji)

HD 7770 -123 sq mm 640 sp
HD 7870 - 212 sq mm 1280 sp
HD 7970 - 360 sq mm 2048 sp
R9 290X - 438 sq mm. 2816 sp

So whatever Polaris 10 sp is Vega 10 will double it easily.
 

alcoholbob

Diamond Member
May 24, 2005
6,311
357
126
If GP104 really only has 2560 shaders, Polaris 10 might actually beat it.

I think if GP100 launch is any indication, Nvidia may be trying to win this generation with clockspeed. 2560 cores but if base clock is 1.5 or 1.6ghz you are looking at 60-65% faster than a stock 980, and about as fast as an overclocked 1.3-1.4ghz 980ti.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
980TI is only 22% faster than GTX980.If Gp104 is 65% faster than GTX980 then it will be 40% faster than 980TI.
i really dont know why everyobne overestimate 980Ti performance.Its really not some miracle card.
Its just 20-23% faster than GTX980.

Btw if its 40% faster than GTX980Ti then 980Ti cant match it even at 1500Mhz.

Btw i also think polaris 10=Gp106
gp104 is 300mm2 SKu.Polaris cant match it with 232mm2, but it will probably beat 200mm2 GP106/1060
 
Last edited:

Mondozei

Golden Member
Jul 7, 2013
1,043
41
86
But you just wait, when Pascal GTX debuts, the VR hype from NV is gonna be all aboard that latency train.


Nvidia will hype VR regardless of what happens, simply because it's the new hot thing in gaming. That in of itself doesn't prove anything, even if it needs to be said that I basically agree with your premise about the importance of preemption for VR in general.

I think if GP100 launch is any indication, Nvidia may be trying to win this generation with clockspeed. 2560 cores but if base clock is 1.5 or 1.6ghz you are looking at 60-65% faster than a stock 980, and about as fast as an overclocked 1.3-1.4ghz 980ti.

Yep.
 

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
980TI is only 22% faster than GTX980.If Gp104 is 65% faster than GTX980 then it will be 40% faster than 980TI.
i really dont know why everyobne overestimate 980Ti performance.Its really not some miracle card.
Its just 20-23% faster than GTX980.

Btw if its 40% faster than GTX980Ti then 980Ti cant match it even at 1500Mhz.

Btw i also think polaris 10=Gp106
gp104 is 300mm2 SKu.Polaris cant match it with 232mm2, but it will probably beat 200mm2 GP106/1060

We don't REALLY know that polaris 10 is 232mm^2. I mean I would say that Polaris 10 is actually NOT the chip you are talking about, and rather that one, on the linkedin profile, is something that will never come to retail, especially since it was posted publicly. That's my opinion on that matter.
 

coercitiv

Diamond Member
Jan 24, 2014
6,631
14,069
136
What Pascal chip goes into the Drive PX2?

I'm asking because it occurs to me, not only there isn't going to be a gaming focused big chip this generation, but it may just be that all the chips in the Pascal line are HPC focused this time around.
 

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
We don't REALLY know that polaris 10 is 232mm^2. I mean I would say that Polaris 10 is actually NOT the chip you are talking about, and rather that one, on the linkedin profile, is something that will never come to retail, especially since it was posted publicly. That's my opinion on that matter.

Plus it is a denser process AMD is using too.
 

Adored

Senior member
Mar 24, 2016
256
1
16
Plus it is a denser process AMD is using too.

Plus everybody keeps forgetting about the primitive discard accelerator which ought to help with memory bandwidth as well. I think a 232mm2 Polaris can be close to a 294mm2 Pascal.
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
So you can tell VR gamers to not move their heads until they are at a graphics draw call completion?

Don't be silly man.

That Async Timewarp needs to fire as soon as people move their heads, that's the entire point of it. You cannot control when people move to fall in-line to prevent stalls of Async Timewarp. -_-

But you just wait, when Pascal GTX debuts, the VR hype from NV is gonna be all aboard that latency train. We can come back and discuss how you are wrong, again.

Async timewarp does not need to fire as soon as people move their heads (since people are constantly moving their heads, which is measured 1000Hz by the IMUs). Async timewarp needs to fire at the last possible moment of the rendering pipeline, but still early enough that it can be applied in time for the next frame refresh.

And as Sontin said Async Timewarp and Async Compute has essentially nothing to do with each other. An Async Timewarp could in theory be performed with Async Compute (and thus be allowed to run alongside the graphics rendering), or you can do it in the way Nvidia is doing here by preempting it, which basically just means that it gets inserted into the rendering pipeline at the most opportune time (instead of having to wait for another draw call to finish first.
 
Feb 19, 2009
10,457
10
76
Plus everybody keeps forgetting about the primitive discard accelerator which ought to help with memory bandwidth as well. I think a 232mm2 Polaris can be close to a 294mm2 Pascal.

Depends how the new GCN turns out, but they have actually got Hyper-threading for SPs. For REAL!

http://forums.anandtech.com/showpost.php?p=38154409&postcount=19

^ There's a patent paper there for next-gen GCN. Take some time to read it, it's mind blowing stuff.

On paper, there's potential for 4x the throughput for each SP. Though I suspect that's under perfect scenario, but still, x1 to x2 (game load dependent) per SP performance vs older GCN SP is there on the table.

Polaris GCN has gone wide with each SP being able to run multiple threads in parallel, a feat that's pretty crazy when you realize the amount of synchronization it requires to keep the hardware scheduler aware of each ALU uptime, to keep the warp scheduler keeping it busy.

There's also SP independent power gating and clock boost, so if an SP is only running one thread, it will auto boost to finish the task quicker.

Insane changes TBH, more than I expected.
 
Feb 19, 2009
10,457
10
76
you can do it in the way Nvidia is doing here by preempting it, which basically just means that it gets inserted into the rendering pipeline at the most opportune time (instead of having to wait for another draw call to finish first.

Which wasn't possible on Maxwell and older, because it has to wait for graphics to finish first before being able to context switch the pipeline to handle compute.



With Pascal, that's an instantaneous change very much like GCN.

It works best if the compute timewarp can run in parallel, as Async Compute.

It works okay if the timewarp can go on a priority context and there's no delay for context switching. This is Pascal's uarch change, based on that article.

It works the least well the current way.

This is why NV say Maxwell at the BEST is only capable of 25ms motion to photon latency via async timewarp. Still above the 20ms recommended.

http://www.geforce.com/whats-new/ar...us-the-only-choice-for-virtual-reality-gaming

The standard VR pipeline from input in (when you move your head) to photons out (when you see the action occur in-game) is about 57 milliseconds (ms). However, for a good VR experience, this latency should be under 20ms.

Combined, and with the addition of further NVIDIA-developer tweaks, the VR pipeline is now only 25ms.

With this change in Pascal, they will get below that 20ms mark and there will be a lot of hoorah!

Why do I say this is "basic" Async Compute?

Because currently NV GPUs actually take a performance hit when AC is run. This is again because of their slow context switch. It causes stalls in the pipeline, wasting time where no work can be done.

The change in Pascal means even if they cannot run graphics + compute in parallel, devs calling for Async Compute, or even general games that use a lot of compute, won't cause stalls. In theory, it should behave like GCN where the graphics/compute context switch is fast.

Aside from actually having multi-engine like ACEs, this is a good fix by NV to add into Pascal as it resolves their weakness with performance regression with AC, and poor VR preemption due to stalls from slow context switches.



^ In the first instance, preemption, think of the slow context switch as adding idle time where the shaders cannot run as they are switching to handle graphics or compute. This is the problem with NV's current uarch, as pointed out by AMD's Robert Hallock when the Async Compute debacle started. This leads to NV losing performance when AC is used.

Basically with Pascal, it's much faster, what they call "fine-grained preemption", more flexible, anytime. And compute on priority can even override the current graphics task if that is needed.
 
Last edited:

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Which wasn't possible on Maxwell and older, because it has to wait for graphics to finish first before being able to context switch the pipeline to handle compute.



With Pascal, that's an instantaneous change very much like GCN.

It works best if the compute timewarp can run in parallel, as Async Compute.

It works okay if the timewarp can go on a priority context and there's no delay for context switching. This is Pascal's uarch change, based on that article.

It works the least well the current way.

Obviously this is an improvement over Maxwell, but that still doesn't make it Async compute.

Why do I say this is "basic" Async Compute?

Because currently NV GPUs actually take a performance hit when AC is run. This is again because of their slow context switch. It causes stalls in the pipeline, wasting time where no work can be done.

The change in Pascal means even if they cannot run graphics + compute in parallel, devs calling for Async Compute, or even general games that use a lot of compute, won't cause stalls. In theory, it should behave like GCN where the graphics/compute context switch is fast.

Just because Pascal no longer incurs a performance penalty from async compute, doesn't mean that it is supporting async compute, not even a "basic" async compute. The whole point of async compute is to improve performance, not simply maintain a status quo in performance.

Pascal no longer incurring a performance penalty, simply means that they have fixed their "workaround" (preemption) for not having async compute support. Of course instruction level pre-emption will likely be useful in many other areas as well, so it's not just a fix for lack of async support.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Which wasn't possible on Maxwell and older, because it has to wait for graphics to finish first before being able to context switch the pipeline to handle compute.

Nonsense. They must wait until a draw call is finished to run the Async Timewarp workload. There is no context switch involved.

This is why NV say Maxwell at the BEST is only capable of 25ms motion to photon latency via async timewarp. Still above the 20ms recommended.

http://www.geforce.com/whats-new/ar...us-the-only-choice-for-virtual-reality-gaming
13ms comes from the 75hz display. :\
Do you even read the article?!

Basically with Pascal, it's much faster, what they call "fine-grained preemption", more flexible, anytime. And compute on priority can even override the current graphics task if that is needed.
This is possible today, too - after a draw call.
Performance penalty happens because of wrong scheduled compute queues.
 
Feb 19, 2009
10,457
10
76
@sontin

Read and learn man. You come up with such random stuff that goes against what even NVIDIA tells developers about what their hardware is capable of.

https://developer.nvidia.com/sites/...works/vr/GameWorks_VR_2015_Final_handouts.pdf



^ NV claims it supports priority context...



^ They even say their priority context takes over the whole GPU and preempts whatever it's working on to switch to the new task... LOL



^ Except it can't. It's not actually a priority context at all. It gets stuck in traffic like everything else.

It's the same fault they had when they claim their hardware supports Async Compute... except it can't.

Pascal changes this, it's a major change for them. Pascal has real priority preemption and instant context switching of graphics/compute rendering.

Celebrate it instead of talking silly like Maxwell can do it when NV clearly says it cannot.

Need I remind you, you were so insistent that Maxwell also supports Async Compute. There were lots of threads where I tried to educate you otherwise, but nope, you were determined... and wrong.
 
Last edited:
Feb 19, 2009
10,457
10
76
Just Pascal no longer incurs a performance penalty from async compute, doesn't mean that it is supporting async compute, not even a "basic" async compute. The whole point of async compute is to improve performance, not simply maintain a status quo in performance.

Pascal no longer incurring a performance penalty, simply means that they have fixed their "workaround" (preemption) for not having async compute support. Of course instruction level pre-emption will likely be useful in many other areas as well, so it's not just a fix for lack of async support.

In many ways, it's similar to GCN 1.0, it doesn't gain much performance from Async Compute, but it doesn't regress in performance. The outcome of Pascal will be better for NV in DX12 games that use AC, and much better for VR.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
@sontin

Read and learn man. You come up with such random stuff that goes against what even NVIDIA tells developers about what their hardware is capable of.

And in none of these slides is mentioned that preemption has a penatly.

You are the one who doesnt care what is said and making fanfiction. Stop it. :thumbsdown:
 

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
@sontin

Read and learn man. You come up with such random stuff that goes against what even NVIDIA tells developers about what their hardware is capable of.

https://developer.nvidia.com/sites/...works/vr/GameWorks_VR_2015_Final_handouts.pdf



^ NV claims it supports priority context...



^ They even say their priority context takes over the whole GPU and preempts whatever it's working on to switch to the new task... LOL



^ Except it can't. It's not actually a priority context at all. It gets stuck in traffic like everything else.

It's the same fault they had when they claim their hardware supports Async Compute... except it can't.

Pascal changes this, it's a major change for them. Pascal has real priority preemption and instant context switching of graphics/compute rendering.

Celebrate it instead of talking silly like Maxwell can do it when NV clearly says it cannot.

Need I remind you, you were so insistent that Maxwell also supports Async Compute. There were lots of threads where I tried to educate you otherwise, but nope, you were determined... and wrong.

And in none of these slides is mentioned that preemption has a penatly.

You are the one who doesnt care what is said and making fanfiction. Stop it. :thumbsdown:

Nvidia is making up fanfiction?? What??
 
Feb 19, 2009
10,457
10
76
Nvidia is making up fanfiction?? What??

Heh. They even tell developers to not frequently mix graphics and compute in queues because their context switch is a costly one. Read the dev handout.

http://wccftech.com/nvidia-devs-computegraphics-toggle-heavyweight-switch/

On this topic, I have to commend zlatan, because he clearly knows what he is talking about ever since he graced this forum.

Outside of us forum warriors, he's probably one of the very few actual developers with deep experience.

Read what he says, everything turned out true.



There's earlier posts where he said Maxwell could not do Async Compute (as I did from 2014).

He called it on Pascal having this fine-grained preemption ability.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
You can only have a context switch within a queue. Async Compute uses different queues. So no context switch involved. Performance penalty happens because of barriers and fences to synchronise these queues. If you cant let them run parallel every architecture will have a performance degression.
 

jpiniero

Lifer
Oct 1, 2010
15,176
5,717
136
It's $ per wafer they can get. If a bunch of dies goes to $12k Tesla, a bunch more goes to $5K Quadros, the rest can be $1.5K GTX Titans. Overall they will make a lot per wafer that way. Telsa/Quadro in effect, subsidizes the existence of a huge high-end gaming chip from NV. Like it has for many generations.

There's only so far you can cut though before it starts to not make sense.

The P100 is 56 SMs. I bet they could sell a 50-52 SM cheaper Tesla and then 45-50 Quadros sometime in early 2017.

Theoretically the full GP104 is 40? (2560). I guess doing 2560 cores with high clock speeds shouldn't be a surprise but at 300 mm2 it's way too big for the price range anticipated. This node is going to be all about transistor usage efficiency and it looks bad for nVidia right now. Maybe that's why Volta also has shown up on the radar so soon because nVidia knows that Pascal is basically Fermi 2.0.
 

nvgpu

Senior member
Sep 12, 2014
629
202
81
http://www.anandtech.com/show/7166/nvidia-announces-quadro-k6000

Quadro K6000 launched with full 2880 CUDA cores enabled.

http://www.anandtech.com/show/9096/nvidia-announces-quadro-m6000-quadro-vca-2015

http://www.anandtech.com/show/10179/nvidia-announces-24gb-quadro-m6000

Quadro M6000 launched with full 3072 CUDA cores enabled.

Quadro P6000(?) will launched with all 3840 CUDA cores enabled and hopefully with 8 stack HBM2 for 32GB of RAM in mass production, since you don't want to ship a 16GB Quadro flagship a year after you shipped the Quadro M6000 24GB.
 

Kris194

Member
Mar 16, 2016
112
0
0
There's only so far you can cut though before it starts to not make sense.

The P100 is 56 SMs. I bet they could sell a 50-52 SM cheaper Tesla and then 45-50 Quadros sometime in early 2017.

Theoretically the full GP104 is 40? (2560). I guess doing 2560 cores with high clock speeds shouldn't be a surprise but at 300 mm2 it's way too big for the price range anticipated. This node is going to be all about transistor usage efficiency and it looks bad for nVidia right now. Maybe that's why Volta also has shown up on the radar so soon because nVidia knows that Pascal is basically Fermi 2.0.

Two years later is soon? How? 2560 CC too big while GTX 980 on 28nm node has 2048CC?
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |