GP100 and GP104 are different architectures

Mahigan · May 19, 2016

Pascal has the same performance in lower resolutions as higher resolutions under AotS.

Each AotS run is different. See Pascal has the same performance at 1080p, slightly higher at 1440p, and a loss at 4K. This is just due to the slight differences in every run of the benchmark.

Pascal neither gains or loses performance with Async compute + graphics turned on.

Fences synchronize work from various queues. They're needed for hardware which executes work from various contexts in a serial fashion. They're not required for GCN due to the ACEs being capable of handling synchronization on their own. Meaning that the ACEs can wait for work from a graphics task without needing a fence in place. GCNs hardware scheduler has this added flexibility. Case in point, GCN doesn't make use of fences under Vulkan.

I made a mistake in that first paragraph you quoted. Replace preemption with context switch. If you execute two different contexts one after the other in serial, a context switch is involved. Switching between contexts too often is one of NVIDIA's don't in their DX12 do's and don'ts.

Lepton87 · May 19, 2016

NTMBK said:
Oh god, I didn't mean for this to turn into yet another Async Compute argument D: I just found it interesting that NVidia has made such a different design for their gaming card vs. their compute card.

But this was the norm. The most recent example is GK210 for Tesla and GM200 for games.

GF114 was quite different from GF110 and so on. There's nothing unusual going on.

4K_shmoorK · May 19, 2016

NTMBK said:
Oh god, I didn't mean for this to turn into yet another Async Compute argument D: I just found it interesting that NVidia has made such a different design for their gaming card vs. their compute card.

What has been done cannot be undone.

antihelten · May 19, 2016

Lepton87 said:
But this was the norm. The most recent example is GK210 for Tesla and GM200 for games.

GF114 was quite different from GF110 and so on. There's nothing unusual going on.

GK210 and GM200 aren't really the proper comparison here. The proper comparison would be GK104 to GK110/GK210, since Maxwell never had a FP64 focused GPU.

GK104 and GK110/GK110 both had 192 CUDA cores per SM.

GF104/114 to GF100/110 is a much more apt comparison to the GP104/GP110 situation, with GF104/114 having 48 shaders per SM and GF100/110 having 32.

So this is neither the norm, nor is it unusual as such.

NTMBK · May 19, 2016

Lepton87 said:
But this was the norm. The most recent example is GK210 for Tesla and GM200 for games.

GF114 was quite different from GF110 and so on. There's nothing unusual going on.

I didn't pay as much attention to architectures back in the Fermi days, so I hadn't realised GF104 was so different from GF100. It's a very apt comparison.

Mahigan · May 20, 2016

See, I was right...

http://wccftech.com/nvidia-gtx-1080-async-compute-detailed/

Mahigan · May 20, 2016

Async Compute + graphics is a subject I've studied closely. I know EXACTLY how it works. Folks at Beyond3D and elsewhere don't seem to have a full grasp of the feature.

What Pascal does is exactly what I stated above. End of story.

Mahigan · May 20, 2016

Pascal is still executing the code in serial but the gap between executing the graphic and compute context is shorter, less latency, due to improved context switching over Maxwell (switching at the instruction level rather than thread block level).

The kicker is that while the GTX 1080 is 30% faster than the FuryX under DX11, when Async Compute + Graphics is used the difference narrows to 9-10%, as I stated above.

So, as I stated above, Vega will bury the GTX 1080 in newer DX12 titles. I'm not even talking about big Vega, just small Vega. Under DX11, the determining factor will be AMDs GCN architectural changes and mostly their improvements to single threaded performance. AMD states that their changes were dramatic, wait and see though.

So is the GTX 1080 a let down? Yes. Massively imo. I was expecting more.

sontin · May 20, 2016

Mahigan said:
Pascal is still executing the code in serial but the gap between executing the graphic and compute context is shorter, less latency, due to improved context switching over Maxwell (switching at the instruction level rather than thread block level).

And you still making this up. Can you not stop it?
BTW: Instruction level preemption only works with Cuda and pixel level preemption isnt supported by DX12 yet. :\

The kicker is that while the GTX 1080 is 30% faster than the FuryX under DX11, when Async Compute + Graphics is used the difference narrows to 9-10%, as I stated above.

The kicker is that a GTX1080 with DX11 is faster than a Fury X with DX12 and Async Compute. Guess this makes AotS another DX11 game with an attached DX12 path. :thumbsdown:

So, as I stated above, Vega will bury the GTX 1080 in newer DX12 titles. I'm not even talking about big Vega, just small Vega. Under DX11, the determining factor will be AMDs GCN architectural changes and mostly their improvements to single threaded performance. AMD states that their changes were dramatic, wait and see though.

So is the GTX 1080 a let down? Yes. Massively imo. I was expecting more.

So, let me rephrase this:
Pascal is a let down because it doesnt meet your fanfiction? :sneaky:
Oh and i like how hardware changes will improve software problems. Doesnt make sense, huh?

Mahigan · May 20, 2016

What??? You're grasping at straws at this point mate.

Pixel level preemption is finer grained preemption. The terms are interchangeable. It simply means instruction level preemption.

Preemption requires a context switch, which Pascal can do quickly, like GCN. This is what I stated before.

A GTX 1080 is expected to be quicker than a FuryX. The let down is that under DX12 loads, which make use of Async compute + graphics, the performance difference is only 9-10% between a GTX 1080 and a FuryX. Seeing as we're pretty much in the DX12 era now, this difference is minimal. Vega will certainly be more than 9-10% faster than Fiji.

Nothing I said is fanfiction. I'm stating pretty obvious truths. What is fanfiction is your grasping of straws and unwillingness to accept truths.

Bacon1 · May 20, 2016

sontin said:
The kicker is that a GTX1080 with DX11 is faster than a Fury X with DX12 and Async Compute. Guess this makes AotS another DX11 game with an attached DX12 path. :thumbsdown:

This has got to be one of the funnest thing I've ever read in my life. A newer more expensive card is faster than an older card and you are surprised at this...

AnandThenMan · May 20, 2016

In DX12 the 1080 is marginally faster than Fury X, and DX12 titles are still in their infancy. This doesn't look good for Nvidia going forward seeing Polaris will no doubt take async performance up a few notches.

sirmo · May 20, 2016

Mahigan said:
Pascal is still executing the code in serial but the gap between executing the graphic and compute context is shorter, less latency, due to improved context switching over Maxwell (switching at the instruction level rather than thread block level).

The kicker is that while the GTX 1080 is 30% faster than the FuryX under DX11, when Async Compute + Graphics is used the difference narrows to 9-10%, as I stated above.

So, as I stated above, Vega will bury the GTX 1080 in newer DX12 titles. I'm not even talking about big Vega, just small Vega. Under DX11, the determining factor will be AMDs GCN architectural changes and mostly their improvements to single threaded performance. AMD states that their changes were dramatic, wait and see though.

So is the GTX 1080 a let down? Yes. Massively imo. I was expecting more.

Spot on. In my opinion this is the reason Nvidia went all out and rushed the 1080 launch, and emphasized high clocks (flashback to the megahertz wars from the 90s and early 2000s). Because they know they will be on a backfoot, once Polaris and Vega roll out. And DX12 is commonplace.

Cookie Monster · May 22, 2016

Mahigan said:
Async Compute + graphics is a subject I've studied closely. I know EXACTLY how it works. Folks at Beyond3D and elsewhere don't seem to have a full grasp of the feature.

What Pascal does is exactly what I stated above. End of story.

Sorry for the OT but i find this rather hilarious.

Is that why you ~~don't~~ post over at Beyond3d? you seem to like participating in actual technical discussions so why not?

Plus its a discussion. Over there they do that instead of preaching their thoughts in a condescending manner as 100% fact elsewhere.

Mopetar · May 22, 2016

AnandThenMan said:
In DX12 the 1080 is marginally faster than Fury X, and DX12 titles are still in their infancy. This doesn't look good for Nvidia going forward seeing Polaris will no doubt take async performance up a few notches.

By the time DX12 really becomes mainstream Nvidia will have Volta out and it's hard to imagine that won't be a big architectural shift. Pascal is likely just a refinement of their existing architecture as it would be too much of a risk to bring a new architecture onto a new node at the same time as there's too much that can go wrong.

NVidia will be fine going forward, but I think that Pascal will not be an architecture that is going to age particularly well. At least they managed to get really good clock speeds out of it.

airfathaaaaa · May 22, 2016

Mopetar said:
By the time DX12 really becomes mainstream Nvidia will have Volta out and it's hard to imagine that won't be a big architectural shift. Pascal is likely just a refinement of their existing architecture as it would be too much of a risk to bring a new architecture onto a new node at the same time as there's too much that can go wrong.

NVidia will be fine going forward, but I think that Pascal will not be an architecture that is going to age particularly well. At least they managed to get really good clock speeds out of it.

you really think dx12 is gonna take 2 more years to become mainstream? already most of the big titles of 2016 is on dx12 or vulkan

Tuna-Fish · May 22, 2016

Mahigan said:
Async Compute + graphics is a subject I've studied closely. I know EXACTLY how it works. Folks at Beyond3D and elsewhere don't seem to have a full grasp of the feature.

No offence, but do you write shader code? Do you do this for a living?

Mahigan said:
Pixel level preemption is finer grained preemption. The terms are interchangeable. It simply means instruction level preemption.

No, it doesn't. In fragment shaders, each pixel spawns a thread. Pixel level preemption means that all the threads must complete before workload can be switched. Instruction level preemption means you can switch at any moment.

nVidia cards, up to and including GP104, can not have simultaneous graphics and compute tasks on the same GPC. To switch between the two types, they need to clear out every current task (this is thread-level, or pixel level if you will). GP104 makes this better by having more GPCs, and by reducing the overhead of switching, but the granularity doesn't actually change. Within the same task type they have fine grained preemption.

GP100 and GP104 are different architectures

Mahigan

Senior member

Lepton87

Platinum Member

4K_shmoorK

Senior member

antihelten

Golden Member

NTMBK

Lifer

Mahigan

Senior member

Mahigan

Senior member

Mahigan

Senior member

sontin

Diamond Member

Mahigan

Senior member

Bacon1

Diamond Member

AnandThenMan

Diamond Member

sirmo

Golden Member

Cookie Monster

Diamond Member

Mopetar

Diamond Member

airfathaaaaa

Senior member

Tuna-Fish

Golden Member

TRENDING THREADS