[bitsandchips]: Pascal to not have improved Async Compute over Maxwell

airfathaaaaa · May 18, 2016

2is said:
It won't stop him. It won't stop me, it won't stop anyone. Notice how the AMD praise is already starting to narrow. First it was amd is better at DX12, not it's AMD is better at DX12 WITH Async Compute.

There's a reason why AOTS keeps being brought up... It's all the ADF has got, and it isn't much... So yeah, it isn't going to stop anyone from doing anything. Sorry to break the news, but AMD will have to do a lot more then continue to fall further behind to gain market share. Deluding their existing fanbase isn't going to get them there.

actually if you take out gow and rotr in most cases when people compare refrence to refrence and not fury x with highly clocked 980ti amd is ahead

and i think you dont understand why we play the aots card... its not because the perf its about to reveal what nvidia keep saying
also newsflash http://www.eteknix.com/amd-makes-gains-discrete-graphics-market/

sontin · May 18, 2016

Despoiler said:
https://www.youtube.com/watch?v=Bh7ECiXfMWQ&app=desktop

-The dynamic load balancing feature of Pascal is confirmed to be in the driver. 3m:25s

And the hardware can override the driver setting. I like how you ignore this part. :thumbsup:

Det0x · May 18, 2016

sontin said:
And the hardware can override the driver setting. I like how you ignore this part. :thumbsup:

I will copy some posts from a other forum

What I noticed, is that a couple of reviewers got confused by the term "Async Compute", using it both for the DX11 extension for explicit preemption by high priority context, and the asynchronous queues in DX12. And mixing these together badly, stating that Pascal would now fully support Async Compute in DX12 because it can do preemption now, or that Maxwell could perform the context switch (the reassignment of SMMs) in DX12 at draw call borders.

I would say NVs marketing for fuzzing this term was a complete success.

Pre-emption is something else entirely, i.e. an explicit interrupt for high priority tasks.

Yes preemption and async are different. Sebbi over at beyond3d explained this greatly. Nvidia didn´t talk about async, only preemption and it improved its granularity in Pascal, compiler level doable.

Async in AoTS are compute tasks. The feature is called async-compute but you can run it like an "async-compute" architecture (ala GCN) capable of running at the same time both compute and graphics tasks at CU level or fully using entirely a SM for compute or graphics like Nvidia does.

No async compute in sight.

You can read and learn more over at beyond3d here:

https://forum.beyond3d.com/threads/nvidia-pascal-reviews-1080-and-1070.57930/page-6

*edit*

The way I see it Pascal's dynamic load balancing is functionally equivalent to GCN's async shaders. At least I haven't seen anything to indicate otherwise.

Dynamic load balancing is a thing - yes, and it is a hardware feature. But it's nowhere the same, or even remotely comparable to GCN's async execution via the independent command lists dispatched by the ACE units.

Dynamic load balancing is only for efficiently switching between compute and graphic workloads inside a single command list, respectively for eliminating the need for a full command buffer flush every time the partition scheme changes.

So you can essentially now:
Upload the next compute only command list while the previous mixed command list is still in execution as the SMMs may now switch the mode lazily after the finished the graphics portion.
Vice versa also when switching back to graphics.
The penalty for a driver screwup when you mix compute and graphics inside a single command list is also eliminated.

Technically, that means there is no longer a scheduling problem just from having compute portions in there, and by that you avoid stalling the command processor.

What it doesn't provide yet, is the resource sharing or the truly asynchronous scheduling AMDs hardware features. So it using asynchronous queues r compute is now only (almost...) "for free", but it's still not gaining you anything.

And without triggering actual, explicit preemption, you are not gaining truly asynchronous, independent execution yet either. You are still subject to all side effects resulting from cooperative scheduling.

Nvidia didn´t talk about async, only preemption.

But they are unfortunately still referring to their preemption extension for DX11 as "Async Compute" too. On purpose.

Bacon1 · May 18, 2016

airfathaaaaa said:
actually if you take out gow and rotr in most cases when people compare refrence to refrence and not fury x with highly clocked 980ti amd is ahead

and i think you dont understand why we play the aots card... its not because the perf its about to reveal what nvidia keep saying
also newsflash http://www.eteknix.com/amd-makes-gains-discrete-graphics-market/

Actually in the latest GoW updates Fury X beats 980 TI.

All it took was a game patch that raised Fury performance 2x.

http://www.overclock3d.net/reviews/...erformance_retest_-_the_game_has_been_fixed/6

Rise of the Tomb Raider, all cards saw negative scaling and they didn't bother to use Async compute even though they used it on the XB1 build for BTAO. Gotta say that was my biggest pre-order regret this year. They screwed the DX12 launch so hard

Ashes of the Singularity is the only DX12 game that takes advantage of the new DX12 features:

* Async Compute
* Multi-Adapter for MGPU support (cross vendor)

They also worked closely with BOTH Nvidia and AMD, they even stated they worked more with Nvidia directly than AMD. Their main developer, Dan Baker, also helped create some of the core DX standards.

sontin · May 18, 2016

Det0x said:
What I noticed, is that a couple of reviewers got confused by the term "Async Compute", using it both for the DX11 extension for explicit preemption by high priority context, and the asynchronous queues in DX12. And mixing these together badly, stating that Pascal would now fully support Async Compute in DX12 because it can do preemption now, or that Maxwell could perform the context switch (the reassignment of SMMs) in DX12 at draw call borders.

I would say NVs marketing for fuzzing this term was a complete success.

Preemption and async are different. Sebbi over at beyond3d explained this greatly. Nvidia didn´t talk about async, only preemption and it improved its granularity in Pascal, compiler level doable.

Async in AoTS are compute tasks. The feature is called async-compute but you can run it like an "async-compute" architecture (ala GCN) capable of running at the same time both compute and graphics tasks at CU level or fully using entirely a SM for compute or graphics like Nvidia does.

No async compute in sight.

You can read and learn more over at beyond3d here:

https://forum.beyond3d.com/threads/nvidia-pascal-reviews-1080-and-1070.57930/page-6

Nobody got confused except this guy. No API describes how the hardware has to schedule work.

Pascal supports "load balancing" on the hardware to override the driver settings when the hardware detects that no new workload comes to the segmented parts of the GPU. This is Async Compute.

Det0x · May 18, 2016

sontin said:
Nobody got confused except this guy. No API describes how the hardware has to schedule work.

Pascal supports "load balancing" on the hardware to override the driver settings when the hardware detects that no new workload comes to the segmented parts of the GPU. This is Async Compute.

Read my updated post, or you can just follow the thread ive already linked.. its all in there.

Is that an assumption on your part or is there evidence to support that?

In the Pascal reviewers guide nVidia explicitly mentions overlapping PhysX kernels with graphics tasks. Are you saying that those DirectX and CUDA tasks are submitted to the GPU in the same command list?

Sorry, oversimplification from my side. I only referred to draw and dispatch calls going via the graphics command processor (which also handles the "compute" queues in DX12), and forgot that independent command processor handling CUDA.

Yes, the same benefits also apply to grids dispatched from the independent HW queues used for CUDA, which makes perfect sense if the reallocation happens independent from either command queue.

But the point is: It looks as if Nvidia also managed to get rid of the stall on the GPC all together which did previously cause a lot of problems. Several Maxwell performance guidelines, such as "avoid mixing compute and graphics", or "don't toggle between compute and graphic queues" are now void.

The old "CUDA has access to command queues which should be exposed as compute queues in DX12 rather than doing everything on the GPC" complaint appears to remain valid though. I've not seen any indicator that they've fixed this yet.

Isn't async compute simply the fact that a GPU can run compute shaders independently and asynchronously with graphics workloads?
If so, doing it inter-shader instead of intra-shader should be sufficient to meet that definition.

Yes, cooperative scheduling is perfectly sufficient to fulfill the specification. Maxwell did that already, respectively you can do that on any hardware.

But the problem with Maxwell was that it would essentially flush the entire graphics pipeline, all SMMs and stall the command processors, in order to reconfigure the hardware for compute. That made the switch extremely expensive, as the GPU utilization suffers while the remaining draw calls complete, and the GPC isn't allowed to dispatch anything new.

The specs said nowhere that you had to gain anything from Async Compute, but that penalty should not have happened either.

airfathaaaaa · May 18, 2016

sontin said:
Nobody got confused except this guy. No API describes how the hardware has to schedule work.

Pascal supports "load balancing" on the hardware to override the driver settings when the hardware detects that no new workload comes to the segmented parts of the GPU. This is Async Compute.

if it has hardware sc then how come it still regressing?

sontin · May 18, 2016

Det0x said:
Read my updated post, or you can just follow the thread ive already linked.. its all in there.

And it is wrong. Listen to the youtube video:
The GPU gets segmented on the SM level through the driver. When the load balancer detects that there is no new workload coming towards those segments it can align those segments to the other workload.

Det0x · May 18, 2016

Similarly, nobody said that enabling async compute has to be faster than not enabling it: if a particular implementation is such that it can't find inefficiencies to exploit, then so be it.

It is well known AMD hardware suffered from underutilization since generations ago. Async helps them achieve better utilization. That doesnt mean NVIDIA should follow suit. There are other ways through which a certain archeticture maximizes its throughput.

End result:

No async compute in sight, but at least there is no performance decline using AC, as there is with Maxwell

Erenhardt · May 18, 2016

Det0x said:
Seems like they were semi correct

http://www.bitsandchips.it/52-engli...scal-in-trouble-with-asyncronous-compute-code

Looks like nvidia didn't do their homework.

We should expect alot of gameworks titles in this generation. The only way they stay competitive is to deny this advantage from async compute to AMD.

3DVagabond · May 18, 2016

biostud said:
If Nvidia cards are faster (or the same) without AC than AMD cards are with AC, what is the problem?

Beside the obvious, it's a non supported feature that would improve performance, have you not noticed nVidia cards get slower over time vs. AMD? This is just more built in obsolescence.

Games will either use this feature more to get better performance, or they'll hold it out because nVidia doesn't offer support. In the 1st instance nVidia owners will suffer. In the second we all suffer. Which do you think nVidia will put their weight behind?

3DVagabond · May 18, 2016

Arachnotronic said:
A single game that virtually nobody plays. Ashes of The Singularity has done its job though -- promote AMD and sow the seeds of doubt for the "DX12 performance" of NVIDIA based cards.

Hope the Oxide guys got compensated nicely for this because AMD seems to be drawing tremendous value from it.

If the game runs better on AMD then the game sucks. This is not new spin. The writing is on the wall and you don't want anyone to read it.

airfathaaaaa · May 18, 2016

Erenhardt said:
Looks like nvidia didn't do their homework.

We should expect alot of gameworks titles in this generation. The only way they stay competitive is to deny this advantage from async compute to AMD.

and then its nvidia vs 7 bigger companies (well 6 since amd isnt tha big nowdays)

2is · May 18, 2016

airfathaaaaa said:
actually if you take out gow and rotr in most cases when people compare refrence to refrence and not fury x with highly clocked 980ti amd is ahead

and i think you dont understand why we play the aots card... its not because the perf its about to reveal what nvidia keep saying
also newsflash http://www.eteknix.com/amd-makes-gains-discrete-graphics-market/

In most cases, you're wrong... And highly overclcocked? You mean the OC's almost anyone can achieve with whatever air cooler came with their 980TI's? Those overclocks? The ones that Fury X can't touch with it's water cooled GPU and HBM that was supposed to make all GDDR5 cards obsolete as soon as AMD graced us with the card? I hope those aren't the overclocks you're referring to, because that's down right embarrassing.

2is · May 18, 2016

3DVagabond said:
If the game runs better on AMD then the game sucks. This is not new spin. The writing is on the wall and you don't want anyone to read it.

He didn't say it sucks he said virtually nobody plays it. Do you play it? What's your Steam ID so I can check out how many hours you've logged into AOTS.

xthetenth · May 18, 2016

I too get easily confused and think that studios would finance the development of a new engine for one game.

airfathaaaaa · May 18, 2016

2is said:
In most cases, you're wrong... And highly overclcocked? You mean the OC's almost anyone can achieve with whatever air cooler came with their 980TI's? Those overclocks? The ones that Fury X can't touch with it's water cooled GPU and HBM that was supposed to make all GDDR5 cards obsolete as soon as AMD graced us with the card? I hope those aren't the overclocks you're referring to, because that's down right embarrassing.

and the fact remains they did bench it against oc cards not refrence

also yes in most case i am right and as if know only on rots amd doesnt win

AtenRa · May 18, 2016

We are not discussing if the game is good or bad or if it sold millions of copies.

Here we are talking about the Async Compute capabilities of the GPUs.

As of now, AoTS and Hitman does use Async Compute and thus we are using those two Games. Im sure Warhummer will use AC as well Deus Ex.
And knowing Dice, it is highly possible that BF1 will also be DX-12 and will use Async Compute as well.

So, it is understandable for certain people trying to divert the conversation outside of the threads Title, but that doesnt change the fact that Pascal cannot do Async Compute like GCN does.
And the only way NV hardware will be faster in DX-12 games its through its GameWorks initiative.

96Firebird · May 18, 2016

Yes... We just have to Wait and See™.

LTC8K6 · May 18, 2016

If Pascal runs DX12 well, no one is going to care if it has AC or not, and no one is going to care what NV calls whatever it does have.

Heck, if it "brute forces" it's way to running DX12 well, no one will care, either. It may even appeal to a lot of people.

Krteq · May 18, 2016

sontin said:
Nobody got confused except this guy. No API describes how the hardware has to schedule work...

Well, It's described by Microsoft on this MSDN page - Synchronization and Multi-Engine. The "Multi-engine" is what are you looking for

Lepton87 · May 18, 2016

Can we resolve the performance hit by dedicating an another card to compute? Like we can dedicate a whole card to PSY-X? If so, I'll just keep the Titan in my computer along with the 980Ti. I had it for PSY-X but there were so few titles that I was going to give up on that idea but if I can do that for DX12 then that would be great.
ps. Is someone also annoyed by "IF NV doesn't support something it might just as well not exist"crowd? This is a terrible approach that stalls progress.

sontin · May 18, 2016

Lepton87 said:
Can we resolve the performance hit by dedicating an another card to compute? Like we can dedicate a whole card to PSY-X? If so, I'll just keep the Titan in my computer along with the 980Ti. I had it for PSY-X but there were so few titles that I was going to give up on that idea but if I can do that for DX12 then that would be great.

Yes, it is possible. nVidia is offloading the compute part to the second GPU.

Despoiler · May 18, 2016

sontin said:
And the hardware can override the driver setting. I like how you ignore this part. :thumbsup:

No, what he said is that "you" ie a dev can specify to use static mode "if you really want to". Dynamic mode is the new mode and by saying "if you really want to" sounds like dynamic mode is the default behavior. Dynamic mode is done in software in the driver.

wilds · May 18, 2016

So how slow of a card can do the offloading? We have seen low end cards like GTX 650/950 or 750 ti to be dedicated purely to PhysX.

If these companion cards work for Async as well, then that could be good news for users with dedicated PhysX cards.

[bitsandchips]: Pascal to not have improved Async Compute over Maxwell

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Senior member

Diamond Member

Golden Member

Diamond Member

Lifer

Lifer

Senior member

Diamond Member

Diamond Member

Golden Member

Senior member

Lifer

Diamond Member

Lifer

Golden Member

Platinum Member

Diamond Member

Golden Member

Platinum Member