NVIDIA Pascal Thread

Page 109 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
Not sure if I am comforted or disappointed that you see the same issues on the EA forum where a bumpgate article devolves into a AMD vs. NV rant. :/

Anyway, I don't think it matters much for the consumer Pascal products as it is HBM related. Charlie fishing for hits...
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
Not sure if I am comforted or disappointed that you see the same issues on the EA forum where a bumpgate article devolves into a AMD vs. NV rant. :/

Anyway, I don't think it matters much for the consumer Pascal products as it is HBM related. Charlie fishing for hits...

well for now nah it doesnt matter...
but in 8 months we will get volta
 

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
Not sure if I am comforted or disappointed that you see the same issues on the EA forum where a bumpgate article devolves into a AMD vs. NV rant. :/

Anyway, I don't think it matters much for the consumer Pascal products as it is HBM related. Charlie fishing for hits...
Au contraire, if they don't make enough $ revenues & profits from their HPC cards where do you think the next target would be? If the GP100 isn't a big hit, the Pascal consumer cards will be milked way harder than Maxwell & we'll see evidence of that soon enough D:
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Au contraire, if they don't make enough $ revenues & profits from their HPC cards where do you think their next target would be? If the GP100 isn't a big hit, the Pascal consumer cards will be milked way harder than Maxwell & we'll see evidence of that soon enough D:

If it was possible for Nvidia to milk consumer cards anymore than they are doing now, then they would already have done so, otherwise they wouldn't be running their business properly (with regards to profit maximizing).

If consumer Pascal is more "milkable" than Maxwell, then it won't be because of GP100 failing, it will because of AMD failing to provide a competitive product.
 

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
If it was possible for Nvidia to milk consumer cards anymore than they are doing now, then they would already have done so, otherwise they wouldn't be running their business properly (with regards to profit maximizing).

If consumer Pascal is more "milkable" than Maxwell, then it won't be because of GP100 failing, it will because of AMD failing to provide a competitive product.
Not necessarily, the new buzzword being mindshare. There are enough products in the AMD stack that're way better VFM than anything Nvidia has to offer & yet the "970gate" & disabling OC on mobile Maxwell, without serious repercussions, shows us the power of brand Nvidia. So, no I don't AMD has too much to do with the success Nvidia has had in the last two years, even more baffling after GCN's DX12 exploits!
when they changed the roadmap again? or they added volta after they added pascal?
You answered that yourself
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Not necessarily, the new buzzword being mindshare. There are enough products in the AMD stack that're way better VFM than anything Nvidia has to offer & yet the "970gate" & disabling OC on mobile Maxwell, without serious repercussions, shows us the power of brand Nvidia. So, no I don't AMD has too much to do with the success Nvidia has had in the last two years, even more baffling after GCN's DX12 exploits!

I didn't say anything about AMD having anything to do with the success Nvidia has had for the last two years, I said that if Nvidia is going to milk consumer Pascal more than they are already milking Maxwell (i.e. something that would happen in the future not the past), then it would be due to AMD failing (with Polaris/Vega), not because of GP100 failing.
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
I'm pretty sure Volta has always been 2018 on Nvidia's roadmaps (except for the ones where it didn't have any date at all)
volta didnt existed on any roadmap till early 2015 it was pascal in the place of volta(2018 didnt even existed back then) it was 2016/17
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
My final prediction:

Highest tier GP104 at this release - 25-30% faster than GTX 980 TI $599-649
Second tier GP104 - 5-10% faster than GTX 980 TI - $449
Third tier GP104 - 10% slower than GTX 980 TI - $329

That would put the 3rd tier card about 33% faster than a reference GTX 970 for about $30 more and the second tier card about 33-40% faster than a GTX 980 for around the same prices as the 980 is currently going for. People currently with ~GTX 970 performance or less might find the $449 price bracket appealing for a >=60% upgrade in performance.
 
Last edited:

Sweepr

Diamond Member
May 12, 2006
5,148
1,142
131
Very informative post by sebbbi @ Beyond3D.

It seems that people are still confusing terms "async compute", "async shaders" and "compute queue". Marketing and press doesn't seem to understand the terms properly and spread the confusion

Hardware:
AMD - Each compute unit (CUs) on GCN can run multiple shaders concurrently. Each CU can run both compute (CS) and graphics (PS/VS/GS/HS/DS) tasks concurrently. The 64 KB LDS (local data store) inside a CU is dynamically split between currently running shaders. Graphics shaders also use it for intermediate storage. AMD calls this feature "Async shaders".

Intel / Nvidia: These GPUs do not support running graphics + compute concurrently on a single compute unit. One possible reason is the LDS / cache configuration (GPU on chip memory is configured differently when running graphics - CUDA even allows direct control for it). There most likely are other reasons as well. According to Intel documentation it seems that they are running the whole GPU either in compute mode or graphics mode. Nvidia is not as clear about this. Maxwell likely can run compute and graphics simultaneously, but not both in the same "shader multiprocessor" (SM).

Async compute = running shaders in the compute queue. Compute queue is like another "CPU thread". It doesn't have any ties to the main queue. You can use fences to synchronize between queues, but this is a very heavy operation and likely causes stalls. You don't want to do more than a few fences (preferably one) per frame. Just like "CPU threads", compute queue doesn't guarantee any concurrent execution. Driver can time slice queues (just like OS does for CPU threads when you have more threads than the CPU core count). This can still be beneficial if you have big stalls (GPU waiting for CPU for instance). AMDs hardware works a bit like hyperthreading. It can feed multiple queues concurrently to all the compute units. If a compute units has stalls (even small stalls can be exploited), the CU will immediately switches to another shader (also graphics<->compute). This results in higher GPU utilization.

You don't need to use the compute queue in order to execute multiple shaders concurrently. DirectX 12 and Vulkan are by default running all commands concurrently, even from a single queue (at the level of concurrency supported by the hardware). The developer needs to manually insert barriers in the queue to represent synchronization points for each resource (to prevent read<->write hazards). All modern GPUs are able to execute multiple shaders concurrently. However on Intel and Nvidia, the GPU is running either graphics or compute at a time (but can run multiple compute shaders or multiple graphics shaders concurrently). So in order to maximize the performance, you'd want submit large batches of either graphics or compute to the queue at once (not alternating between both rapidly). You get a GPU stall ("wait until idle") on each graphics<->compute switch (unless you are AMD of course).

If you assume that a single Pascal SM cannot run mixed graphics + compute then splitting the MPs should improve the granularity. Compute and graphics might also share some higher level (more global) resources as well. Nvidia has quite sophisticated load balancing in their geometry processing. Distributed geometry data needs to be stored somewhere (SM L1 at least is partially pinned for graphics work, see this presentation: http://on-demand.gputechconf.com/gtc/2016/video/S6138.html). Also, Nvidia doesn't have separate ROP caches (AMD still does). Some portion of their L2 needs to serve ROPs when rendering graphics. This might be transparent (just another client of the cache) or might be statically pinned based on the GPU state. I don't know

https://forum.beyond3d.com/threads/...eculation-rumors-and-discussion.56719/page-72
 

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86

Its a very good post.

Now the question begs, in Maxwell when running parallel graphics+compute tasks, does the entire GPU need to flip between graphics <-> compute modes?

Its interesting to see if in Pascal, if they could do this on a TPC or GPC level which would undoubtedly make it better than the entire GPU having to change modes. I doubt they can do it in SM level (similiar to GCN where they can do it in CU level).
 

Game_dev

Member
Mar 2, 2016
133
0
0
Not necessarily, the new buzzword being mindshare. There are enough products in the AMD stack that're way better VFM than anything Nvidia has to offer & yet the "970gate" & disabling OC on mobile Maxwell, without serious repercussions, shows us the power of brand Nvidia. So, no I don't AMD has too much to do with the success Nvidia has had in the last two years, even more baffling after GCN's DX12 exploits!You answered that yourself

AMD has had plenty of mistakes to affect their "mind share"

Hawaii was too hot and loud
Fiji was too slow and memory limited
Crimson killed cards
Terrible directx 11 driver overhead
Terrible Linux drivers
Ect.
 
Feb 19, 2009
10,457
10
76
Its a very good post.

Now the question begs, in Maxwell when running parallel graphics+compute tasks, does the entire GPU need to flip between graphics <-> compute modes?

Its interesting to see if in Pascal, if they could do this on a TPC or GPC level which would undoubtedly make it better than the entire GPU having to change modes. I doubt they can do it in SM level (similiar to GCN where they can do it in CU level).

He said so right in that quote.

"However on Intel and Nvidia, the GPU is running either graphics or compute at a time (but can run multiple compute shaders or multiple graphics shaders concurrently). So in order to maximize the performance, you'd want submit large batches of either graphics or compute to the queue at once (not alternating between both rapidly). You get a GPU stall ("wait until idle") on each graphics<->compute switch (unless you are AMD of course)."

It's the context switch that hurts Kepler and Maxwell. Even without Async Compute, just normal graphics > compute serial rendering.

The more compute you have in a game, the worse it tanks due to this slow context switch.

NV actually say the same thing since late 2014 in their VR programming guide official PDF to developers. A compute async timewarp can and do get stuck behind a graphics draw call, even on priority preemption queue, it doesn't work.

Pascal resolves this flaw in the uarch. No slow context switch for graphics <-> compute workloads, supports fine-grained preemption, where a priority compute queue can SUSPEND a graphics queue in process and proceed immediately as it should.

This is why I say don't underestimate Pascal's uarch gains over Maxwell. The effect will be more pronounced in recent games with more compute usage. Example, Quantum Break, that has a lot of compute and copy queues mixed in with graphics all the time (per GPUView results), Pascal will easily crush Maxwell.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91

http://lmgtfy.com/?q=nvidia+drivers+killing+gpu

You'll see Nvidia has been proven to kill far more GPUs, with multiple releases.

I've yet to see anyone actually have their AMD card die from Crimson, since the hardware had failsafes and the fans were just limited, not completely turned off / disabled.

Good to see you are still trolling though.
 

Game_dev

Member
Mar 2, 2016
133
0
0
Like I said "swept under the rug".






This sort of troll posting is not acceptable to discussion in VC&G.


esquared
Anandtech Forum Director
 
Last edited by a moderator:
Mar 10, 2006
11,715
2,012
126
anyway on topic the 1080 is going to be significantly faster due to GDDR5X. my guess is 20-25%. This means Nvidia can get more users to pick 1080 over 1070 because overclocking 1070 might not be enough to catch 1080.

As long as they can make the additional price worth it, I agree. The GTX 980 was actually very poorly positioned against the GTX 970, with the 970 offering a staggeringly better value (for the price of one 980 + $150, you could get GTX 970 in SLI) at launch.

Here's my guess, on the product stack:

GP104-400 -> GTX 1080 Ti @ $649
GP104-200 -> GTX 1080 @ $499
GP104-150 -> GTX 1070 @ $339

Whenever GP102 drops, I expect that this will be priced at $799 for the cut down version, and $999 for a full-blown Titan version. I expect these to be branded GeForce GTX 1090 and GeForce GTX Titan [whatever]. I doubt that dual GPU flagships will be in vogue anytime soon, so the X90 can be reserved for very high end single GPUs.

If I were in charge of branding these cards, I would absolutely NOT call GP104-150 a "1060 Ti" or even a "1060" because, on some level, customers will feel that last gen they got an X70 card for $339 and now this generation they get only an X60 card. Extremely bad for business/image.

NVIDIA must make it clear to customers that they are getting a better value today than they did before Pascal launched, and the branding scheme I outlined above would do that.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |