NVIDIA Pascal Thread

Page 52 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Adored

Senior member
Mar 24, 2016
256
1
16
We're still talking small margins here. Even with a heavy memory overclock any halfway-sensible GPU is ultimately limited by it's raw shader power.

A 980 Ti with a 256-bit bus would be unlikely to lose more than 10% performance same as a 980 with a 384-bit bus would be unlikely to gain more than 5%. Obviously a lot depends on the actual game being played too.
 
Feb 19, 2009
10,457
10
76
Spot on with the predictions earlier, @JDC & others who nailed it.

2560 SP (4x GPC, GP100 has 6x GPC with FP64), ~300mm2 size. This is a lean mean gaming chip.

They are going to need some major improvements in their memory compression.

But expect ~Titan X +15% performance at ~150W. In newer titles and GCN-optimized/DX12 games, you can expect a much bigger lead, potentially Titan X + 25-35%. It's definitely going to look more better against GCN in the new era of games than Maxwell, or LOL-Kepler.

With that down, GP106 will be 1280 SP and GP107 will be 680 SP.

^ The amazing thing, AMD's (Polaris 10: 2560 SP, Polaris 11: 1,280 SP) and NV's chips will have identical core counts overall at each segment here and in the same arrangement of cores per SM/TMU/ROP block!
 
Last edited:

Sweepr

Diamond Member
May 12, 2006
5,148
1,142
131
2/3 of GP100's shaders at <1/2 the die size. Probably very agressively clocked too.

Bodes well for a future 'GP102' gaming GPU. We could be looking at up to 5120 SPs.
 
Feb 19, 2009
10,457
10
76
2/3 of GP100's shaders at <1/2 the die size. Probably very agressively clocked too.

Bodes well for a future 'GP102' gaming GPU. We could be looking at up to 5120 SPs.

GP104 is expected to be 4/6 GPCs compared to GP100, it's what NV has been doing for a long time. There is no precedent for change.

Vega 11 is the competitor to GP104. Polaris 10 is just too small to compete and it will face off versus GP106 as expected. If NV releases in retail Q3, they have a few months where there's no competition for the $400 - $550 segment.
 

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
Vega 11 is the competitor to GP104. Polaris 10 is just too small to compete and it will face off versus GP106 as expected. If NV releases in retail Q3, they have a few months where there's no competition for the $400 - $550 segment.

The problem is that you are forgetting that both companies are not using the same process node. A 232MM2 Polaris 10 is probably closer to a 250MM2 to 260MM2 16NM GPU,so in the end Polaris 10 is closer to the GP104. Plus AMD has a history of producing more transistor dense designs too.
 
Last edited:
Feb 19, 2009
10,457
10
76
Who is Ryan Smith?

Dude!

Anandtech editor.

NV's prior Tesla announcement don't list ROPs in their charts, don't show ROPs in their diagram, but it's got ROPs. Nothing surprising and I have to question why some folks are failing so hard at basic analysis.

There's no GP102 for gaming.

There's a huge GP100 for HPC, at massive profits each chip. The rest, gets made into a GTX after HPC demands are met. Each wafer, NV still banks a ton of money due to the HPC sales subsidizing the consumer GTX.

GP104 is the chip where they can yield better, at ~294mm2, expect the good dies to go into a Tesla refresh as well. Along with the $550-600 GTX 980 replacement. The harvested chip aka 970 replacement, can go for a good bang for buck, depending on how generous JHH feels.

The only reasons why there's no GP102 for gaming, yields of a bigger ~450mm2 chip will be worse. It competes with GP100 wafers, but per wafer, it earns much less because it can't go into expensive Tesla (nobody sane would pay $12K for each with neutered FP64) and it's big enough to suffer yield issues. As a gaming only chip, it's DoA, impossible unless the node is very mature, high volume and excellent yielding to make it worthwhile.

The ultimate driving logic at NVIDIA: How does JHH get the most $$ per 16nm wafer?
 
Feb 19, 2009
10,457
10
76
Oh, btw, guys!! Good news!!

Pascal has basic Async Compute support on the hardware level!

http://www.theregister.co.uk/2016/04/06/nvidia_gtc_2016/

Software running on the P100 can be preempted on instruction boundaries, rather than at the end of a draw call.

This means a thread can immediately give way to a higher priority thread, rather than waiting to the end of a potentially lengthy draw operation. This extra latency &#8211; the waiting for a call to end &#8211; can really mess up very time-sensitive applications, such as virtual reality headsets. A 5ms delay could lead to a missed Vsync and a visible glitch in the real-time rendering, which drives some people nuts.

By getting down to the instruction level, this latency penalty should evaporate, which is good news for VR gamers. Per-instruction preemption means programmers can also single step through GPU code to iron out bugs.

BOOOM! That's your real DX12 hardware, your real VR hardware. Any VR gamer that buys a Maxwell for VR, expect to upgrade to Pascal for vastly better motion to photon latency and less puke-factor.

NV is going to bank it big time on Pascal because of how gimped Maxwell is in VR for latency, people will realize soon enough, they bought obsolete GPUs for VR and will all upgrade.

What genius marketing PR. I am impressed. They held back talking about Async Compute all this time, promising it in their drivers... when Pascal comes, Async Compute is going to be "enabled" in the drivers, for Pascal.

There's gonna be a lot of VR coverage from NV for Pascal, a LOT. You'll see blog posts about how improvements in Pascal enable Async Timewarp latencies to drop below that magical 20ms barrier that is deemed acceptable. Much better than Maxwell. All the early VR adopters will upgrade in droves and NV is gonna laugh all the way to the bank. Pure genius.
 
Last edited:

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Spot on with the predictions earlier, @JDC & others who nailed it.

2560 SP (4x GPC, GP100 has 6x GPC with FP64), ~300mm2 size. This is a lean mean gaming chip.

They are going to need some major improvements in their memory compression.

But expect ~Titan X +15% performance at ~150W. In newer titles and GCN-optimized/DX12 games, you can expect a much bigger lead, potentially Titan X + 25-35%. It's definitely going to look more better against GCN in the new era of games than Maxwell, or LOL-Kepler.

With that down, GP106 will be 1280 SP and GP107 will be 680 SP.

^ The amazing thing, AMD's (Polaris 10: 2560 SP, Polaris 11: 1,280 SP) and NV's chips will have identical core counts overall at each segment here and in the same arrangement of cores per SM/TMU/ROP block!

Silverforce Polaris 11 is roughly 110 sq mm and smaller than GK107 (118 sq mm) and Cape Verde (123 sq mm).

http://www.anandtech.com/show/9886/amd-reveals-polaris-gpu-architecture

"In any case, the GPU RTG showed off was a small GPU. And while Raja’s hand is hardly a scientifically accurate basis for size comparisons, if I had to guess I would wager it’s a bit smaller than RTG’s 28nm Cape Verde GPU or NVIDIA’s GK107 GPU, which is to say that it’s likely smaller than 120mm"

If Polaris 11 is 1280sp then Polaris 10 is definitely 3072 sp assuming a 232 sq mm GPU. If you look at past GPUs like HD 7770/HD 7870 or GTX 960/GTX 980. The larger GPU with twice the resources is not 2x the die size. Its smaller as there are many components like PCI-E/video controller, video codec engines etc which are not doubled.

http://techreport.com/review/22573/amd-radeon-hd-7870-ghz-edition

HD 7770 - 123 sq mm
HD 7870 - 212 sq mm

http://techreport.com/review/27702/nvidia-geforce-gtx-960-graphics-card-reviewed

GTX 960 - 227 sq mm
GTX 980 - 398 sq mm

If we take Polaris 11 as 110 sq mm it would put a doubling of Polaris 10 at 110 x 1.75 = 192 sq mm. Instead we have a 232 sq mm GPU which hints at 3072 sp. There is also another hint that Polaris 10 has a slightly different organization.

http://videocardz.com/58634/amd-confirms-polaris-10-is-ellesmere-and-polaris-11-is-baffin

"Here’s a difference between ELLESMERE and BAFFIN. It appears that ELLESMERE (Polaris 10) will have 256-bit (8*32) memory bus, while BAFFIN (Polaris 11) will feature 128-bit (4*32). Also the graphics pipes are different, where Polaris 10 has 6, while Polaris 11 has 5. I don’t know if these correspond to Compute Units, but those two lines were the only differences between Polaris 10 and 11 in this file."

If we take Polaris 11 as 1280 = 640 x 2 (shader engines SE)
(64 x 5) x 2 = 640
(2 groups of 5 CUs per SE)

Polaris 10 as 3072 sp = 768 x 4 (shader engines SE)
= (64 x 6) x 2 = 768
(2 groups of 6 CUs per SE)

So we can see there is a good chance that Polaris 10 is 3072 sp. If we combine that with higher clocks, improved command processor with improved sp (higher IPC) and improved memory controller, L2 cache and memory compression I think we can see why Polaris 10 might be a potent combination of perf and perf/watt and likely to beat Fury X at 1080p/1440p. 4K is not going to be Polaris 10's forte. Thats for the Vega family.
 
Feb 19, 2009
10,457
10
76
@raghu78
I suspect given the new uarch (read that patent paper in the 16 compute unit thread), each extra SP requires an increase in back-end, front-end and registers to support it's newly gained "Hyper-threading" per SIMD based on ALUs, Vector and Scalar threads. So the die sizes need to go UP disproportionately to support the increase in SP.

There's more info in that other thread, but GCN 4 is damn amazing on paper at this point, basically much better than I expected.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Oh, btw, guys!! Good news!!

Pascal has basic Async Compute support on the hardware level!

http://www.theregister.co.uk/2016/04/06/nvidia_gtc_2016/



BOOOM! That's your real DX12 hardware, your real VR hardware. Any VR gamer that buys a Maxwell for VR, expect to upgrade to Pascal for vastly better motion to photon latency and less puke-factor.

NV is going to bank it big time on Pascal because of how gimped Maxwell is in VR for latency, people will realize soon enough, they bought obsolete GPUs for VR and will all upgrade.

What genius marketing PR. I am impressed. They held back talking about Async Compute all this time, promising it in their drivers... when Pascal comes, Async Compute is going to be "enabled" in the drivers, for Pascal.

There's gonna be a lot of VR coverage from NV for Pascal, a LOT. You'll see blog posts about how improvements in Pascal enable Async Timewarp latencies to drop below that magical 20ms barrier that is deemed acceptable. Much better than Maxwell. All the early VR adopters will upgrade in droves and NV is gonna laugh all the way to the bank. Pure genius.

Do you even understand what you are talking? Preemption has nothing to do with Async Compute. :\
 
Feb 19, 2009
10,457
10
76
Do you even understand what you are talking? Preemption has nothing to do with Async Compute. :\

Note I said "basic" Async Compute.

Do you understand one of the crippling problem for NV's current GPU is their crushing context switches at draw call boundaries whenever compute needs to run?

This new pre-emption feature basically nullifies that weakness.

When an Async Compute task is called, like Async Timewarp (go read the Occulus blog), on Maxwell, it can't do it, it has to wait for the current graphics in the pipeline to finish first. Stall. It's going to miss the timewarp window and the user sees a major stutter/lag.

With their change in hardware to allow priority context switches immediately, they don't have to wait for the graphics rendering to finish, that async timewarp call is processed instantly.

This is a major change for VR, and while they may or may not be able to run graphics + compute in parallel, they won't be as neutered losing performance when Async Compute is in play.

You wait and see. I will be 100% correct on this. Keep an eye on those NV VR blogs, they will go all wild on this when Pascal is launched.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Do you understand one of the crippling problem for NV's current GPU is their crushing context switches at draw call boundaries whenever compute needs to run?

This new pre-emption feature basically nullifies that weakness.

When an Async Compute task is called, like Async Timewarp (go read the Occulus blog), on Maxwell, it can't do it, it has to wait for the current graphics in the pipeline to finish first. Stall. It's going to miss the timewarp window and the user sees a major stutter/lag.

With their change in hardware to allow priority context switches immediately, they don't have to wait for the graphics rendering to finish, that async timewarp call is processed instantly.

This is a major change for VR, and while they may or may not be able to run graphics + compute in parallel, they won't be as neutered losing performance when Async Compute is in play.

You wait and see. I will be 100% correct on this.

Like i said: You havent understood what you talking about. :thumbsdown:
It would be great if you and other could stop this buzzword bingo and dont create fanfiction.

And no with Async Timewarp nVidia is preempting this workload after a draw call and interrupting a regular one. They dont let it run concurrently with the graphics pipeline. And there is no context switch with Async Compute because the Compute queue will be put after the graphics queue...
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
Spot on with the predictions earlier, @JDC & others who nailed it.

2560 SP (4x GPC, GP100 has 6x GPC with FP64), ~300mm2 size. This is a lean mean gaming chip.

They are going to need some major improvements in their memory compression.

But expect ~Titan X +15% performance at ~150W. In newer titles and GCN-optimized/DX12 games, you can expect a much bigger lead, potentially Titan X + 25-35%. It's definitely going to look more better against GCN in the new era of games than Maxwell, or LOL-Kepler.

With that down, GP106 will be 1280 SP and GP107 will be 680 SP.

^ The amazing thing, AMD's (Polaris 10: 2560 SP, Polaris 11: 1,280 SP) and NV's chips will have identical core counts overall at each segment here and in the same arrangement of cores per SM/TMU/ROP block!

Sign me up! Polaris/Pascal whomever tickles my fancy shall get me monies!
 
Feb 19, 2009
10,457
10
76
They dont let it run concurrently with the graphics pipeline. And there is no context switch with Async Compute because the Compute queue will be put after the graphics queue...

So you can tell VR gamers to not move their heads until they are at a graphics draw call completion?

Don't be silly man.

That Async Timewarp needs to fire as soon as people move their heads, that's the entire point of it. You cannot control when people move to fall in-line to prevent stalls of Async Timewarp. -_-

But you just wait, when Pascal GTX debuts, the VR hype from NV is gonna be all aboard that latency train. We can come back and discuss how you are wrong, again.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |