GP100 and GP104 are different architectures

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Mahigan

Senior member
Aug 22, 2015
573
0
0
Pascal has the same performance in lower resolutions as higher resolutions under AotS.

Each AotS run is different. See Pascal has the same performance at 1080p, slightly higher at 1440p, and a loss at 4K. This is just due to the slight differences in every run of the benchmark.

Pascal neither gains or loses performance with Async compute + graphics turned on.

Fences synchronize work from various queues. They're needed for hardware which executes work from various contexts in a serial fashion. They're not required for GCN due to the ACEs being capable of handling synchronization on their own. Meaning that the ACEs can wait for work from a graphics task without needing a fence in place. GCNs hardware scheduler has this added flexibility. Case in point, GCN doesn't make use of fences under Vulkan.

I made a mistake in that first paragraph you quoted. Replace preemption with context switch. If you execute two different contexts one after the other in serial, a context switch is involved. Switching between contexts too often is one of NVIDIA's don't in their DX12 do's and don'ts.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
Oh god, I didn't mean for this to turn into yet another Async Compute argument D: I just found it interesting that NVidia has made such a different design for their gaming card vs. their compute card.

But this was the norm. The most recent example is GK210 for Tesla and GM200 for games.

GF114 was quite different from GF110 and so on. There's nothing unusual going on.
 

4K_shmoorK

Senior member
Jul 1, 2015
464
43
91
Oh god, I didn't mean for this to turn into yet another Async Compute argument D: I just found it interesting that NVidia has made such a different design for their gaming card vs. their compute card.

What has been done cannot be undone.
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
But this was the norm. The most recent example is GK210 for Tesla and GM200 for games.

GF114 was quite different from GF110 and so on. There's nothing unusual going on.

GK210 and GM200 aren't really the proper comparison here. The proper comparison would be GK104 to GK110/GK210, since Maxwell never had a FP64 focused GPU.

GK104 and GK110/GK110 both had 192 CUDA cores per SM.

GF104/114 to GF100/110 is a much more apt comparison to the GP104/GP110 situation, with GF104/114 having 48 shaders per SM and GF100/110 having 32.

So this is neither the norm, nor is it unusual as such.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
But this was the norm. The most recent example is GK210 for Tesla and GM200 for games.

GF114 was quite different from GF110 and so on. There's nothing unusual going on.

I didn't pay as much attention to architectures back in the Fermi days, so I hadn't realised GF104 was so different from GF100. It's a very apt comparison.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
Async Compute + graphics is a subject I've studied closely. I know EXACTLY how it works. Folks at Beyond3D and elsewhere don't seem to have a full grasp of the feature.

What Pascal does is exactly what I stated above. End of story.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
Pascal is still executing the code in serial but the gap between executing the graphic and compute context is shorter, less latency, due to improved context switching over Maxwell (switching at the instruction level rather than thread block level).

The kicker is that while the GTX 1080 is 30% faster than the FuryX under DX11, when Async Compute + Graphics is used the difference narrows to 9-10%, as I stated above.

So, as I stated above, Vega will bury the GTX 1080 in newer DX12 titles. I'm not even talking about big Vega, just small Vega. Under DX11, the determining factor will be AMDs GCN architectural changes and mostly their improvements to single threaded performance. AMD states that their changes were dramatic, wait and see though.

So is the GTX 1080 a let down? Yes. Massively imo. I was expecting more.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Pascal is still executing the code in serial but the gap between executing the graphic and compute context is shorter, less latency, due to improved context switching over Maxwell (switching at the instruction level rather than thread block level).

And you still making this up. Can you not stop it?
BTW: Instruction level preemption only works with Cuda and pixel level preemption isnt supported by DX12 yet. :\

The kicker is that while the GTX 1080 is 30% faster than the FuryX under DX11, when Async Compute + Graphics is used the difference narrows to 9-10%, as I stated above.
The kicker is that a GTX1080 with DX11 is faster than a Fury X with DX12 and Async Compute. Guess this makes AotS another DX11 game with an attached DX12 path. :thumbsdown:

So, as I stated above, Vega will bury the GTX 1080 in newer DX12 titles. I'm not even talking about big Vega, just small Vega. Under DX11, the determining factor will be AMDs GCN architectural changes and mostly their improvements to single threaded performance. AMD states that their changes were dramatic, wait and see though.

So is the GTX 1080 a let down? Yes. Massively imo. I was expecting more.
So, let me rephrase this:
Pascal is a let down because it doesnt meet your fanfiction? :sneaky:
Oh and i like how hardware changes will improve software problems. Doesnt make sense, huh?
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
What??? You're grasping at straws at this point mate.

Pixel level preemption is finer grained preemption. The terms are interchangeable. It simply means instruction level preemption.

Preemption requires a context switch, which Pascal can do quickly, like GCN. This is what I stated before.

A GTX 1080 is expected to be quicker than a FuryX. The let down is that under DX12 loads, which make use of Async compute + graphics, the performance difference is only 9-10% between a GTX 1080 and a FuryX. Seeing as we're pretty much in the DX12 era now, this difference is minimal. Vega will certainly be more than 9-10% faster than Fiji.

Nothing I said is fanfiction. I'm stating pretty obvious truths. What is fanfiction is your grasping of straws and unwillingness to accept truths.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
The kicker is that a GTX1080 with DX11 is faster than a Fury X with DX12 and Async Compute. Guess this makes AotS another DX11 game with an attached DX12 path. :thumbsdown:

This has got to be one of the funnest thing I've ever read in my life. A newer more expensive card is faster than an older card and you are surprised at this...
 

AnandThenMan

Diamond Member
Nov 11, 2004
3,949
504
126
In DX12 the 1080 is marginally faster than Fury X, and DX12 titles are still in their infancy. This doesn't look good for Nvidia going forward seeing Polaris will no doubt take async performance up a few notches.
 

sirmo

Golden Member
Oct 10, 2011
1,014
391
136
Pascal is still executing the code in serial but the gap between executing the graphic and compute context is shorter, less latency, due to improved context switching over Maxwell (switching at the instruction level rather than thread block level).

The kicker is that while the GTX 1080 is 30% faster than the FuryX under DX11, when Async Compute + Graphics is used the difference narrows to 9-10%, as I stated above.

So, as I stated above, Vega will bury the GTX 1080 in newer DX12 titles. I'm not even talking about big Vega, just small Vega. Under DX11, the determining factor will be AMDs GCN architectural changes and mostly their improvements to single threaded performance. AMD states that their changes were dramatic, wait and see though.

So is the GTX 1080 a let down? Yes. Massively imo. I was expecting more.
Spot on. In my opinion this is the reason Nvidia went all out and rushed the 1080 launch, and emphasized high clocks (flashback to the megahertz wars from the 90s and early 2000s). Because they know they will be on a backfoot, once Polaris and Vega roll out. And DX12 is commonplace.
 

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
Async Compute + graphics is a subject I've studied closely. I know EXACTLY how it works. Folks at Beyond3D and elsewhere don't seem to have a full grasp of the feature.

What Pascal does is exactly what I stated above. End of story.

Sorry for the OT but i find this rather hilarious.

Is that why you don't post over at Beyond3d? you seem to like participating in actual technical discussions so why not?

Plus its a discussion. Over there they do that instead of preaching their thoughts in a condescending manner as 100% fact elsewhere.
 

Mopetar

Diamond Member
Jan 31, 2011
8,024
6,479
136
In DX12 the 1080 is marginally faster than Fury X, and DX12 titles are still in their infancy. This doesn't look good for Nvidia going forward seeing Polaris will no doubt take async performance up a few notches.

By the time DX12 really becomes mainstream Nvidia will have Volta out and it's hard to imagine that won't be a big architectural shift. Pascal is likely just a refinement of their existing architecture as it would be too much of a risk to bring a new architecture onto a new node at the same time as there's too much that can go wrong.

NVidia will be fine going forward, but I think that Pascal will not be an architecture that is going to age particularly well. At least they managed to get really good clock speeds out of it.
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
By the time DX12 really becomes mainstream Nvidia will have Volta out and it's hard to imagine that won't be a big architectural shift. Pascal is likely just a refinement of their existing architecture as it would be too much of a risk to bring a new architecture onto a new node at the same time as there's too much that can go wrong.

NVidia will be fine going forward, but I think that Pascal will not be an architecture that is going to age particularly well. At least they managed to get really good clock speeds out of it.
you really think dx12 is gonna take 2 more years to become mainstream? already most of the big titles of 2016 is on dx12 or vulkan
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,422
1,759
136
Async Compute + graphics is a subject I've studied closely. I know EXACTLY how it works. Folks at Beyond3D and elsewhere don't seem to have a full grasp of the feature.

No offence, but do you write shader code? Do you do this for a living?

Pixel level preemption is finer grained preemption. The terms are interchangeable. It simply means instruction level preemption.

No, it doesn't. In fragment shaders, each pixel spawns a thread. Pixel level preemption means that all the threads must complete before workload can be switched. Instruction level preemption means you can switch at any moment.

nVidia cards, up to and including GP104, can not have simultaneous graphics and compute tasks on the same GPC. To switch between the two types, they need to clear out every current task (this is thread-level, or pixel level if you will). GP104 makes this better by having more GPCs, and by reducing the overhead of switching, but the granularity doesn't actually change. Within the same task type they have fine grained preemption.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |