[bitsandchips]: Pascal to not have improved Async Compute over Maxwell

Page 17 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
It won't stop him. It won't stop me, it won't stop anyone. Notice how the AMD praise is already starting to narrow. First it was amd is better at DX12, not it's AMD is better at DX12 WITH Async Compute.

There's a reason why AOTS keeps being brought up... It's all the ADF has got, and it isn't much... So yeah, it isn't going to stop anyone from doing anything. Sorry to break the news, but AMD will have to do a lot more then continue to fall further behind to gain market share. Deluding their existing fanbase isn't going to get them there.

actually if you take out gow and rotr in most cases when people compare refrence to refrence and not fury x with highly clocked 980ti amd is ahead

and i think you dont understand why we play the aots card... its not because the perf its about to reveal what nvidia keep saying
also newsflash http://www.eteknix.com/amd-makes-gains-discrete-graphics-market/
 
Last edited:

Det0x

Golden Member
Sep 11, 2014
1,061
3,105
136
And the hardware can override the driver setting. I like how you ignore this part. :thumbsup:

I will copy some posts from a other forum

What I noticed, is that a couple of reviewers got confused by the term "Async Compute", using it both for the DX11 extension for explicit preemption by high priority context, and the asynchronous queues in DX12. And mixing these together badly, stating that Pascal would now fully support Async Compute in DX12 because it can do preemption now, or that Maxwell could perform the context switch (the reassignment of SMMs) in DX12 at draw call borders.

I would say NVs marketing for fuzzing this term was a complete success.

Pre-emption is something else entirely, i.e. an explicit interrupt for high priority tasks.

Yes preemption and async are different. Sebbi over at beyond3d explained this greatly. Nvidia didn´t talk about async, only preemption and it improved its granularity in Pascal, compiler level doable.

Async in AoTS are compute tasks. The feature is called async-compute but you can run it like an "async-compute" architecture (ala GCN) capable of running at the same time both compute and graphics tasks at CU level or fully using entirely a SM for compute or graphics like Nvidia does.

No async compute in sight.

You can read and learn more over at beyond3d here:

https://forum.beyond3d.com/threads/nvidia-pascal-reviews-1080-and-1070.57930/page-6

*edit*

The way I see it Pascal's dynamic load balancing is functionally equivalent to GCN's async shaders. At least I haven't seen anything to indicate otherwise.

Dynamic load balancing is a thing - yes, and it is a hardware feature. But it's nowhere the same, or even remotely comparable to GCN's async execution via the independent command lists dispatched by the ACE units.

Dynamic load balancing is only for efficiently switching between compute and graphic workloads inside a single command list, respectively for eliminating the need for a full command buffer flush every time the partition scheme changes.

So you can essentially now:
Upload the next compute only command list while the previous mixed command list is still in execution as the SMMs may now switch the mode lazily after the finished the graphics portion.
Vice versa also when switching back to graphics.
The penalty for a driver screwup when you mix compute and graphics inside a single command list is also eliminated.

Technically, that means there is no longer a scheduling problem just from having compute portions in there, and by that you avoid stalling the command processor.

What it doesn't provide yet, is the resource sharing or the truly asynchronous scheduling AMDs hardware features. So it using asynchronous queues r compute is now only (almost...) "for free", but it's still not gaining you anything.

And without triggering actual, explicit preemption, you are not gaining truly asynchronous, independent execution yet either. You are still subject to all side effects resulting from cooperative scheduling.

Nvidia didn´t talk about async, only preemption.

But they are unfortunately still referring to their preemption extension for DX11 as "Async Compute" too. On purpose.
 
Last edited:

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
actually if you take out gow and rotr in most cases when people compare refrence to refrence and not fury x with highly clocked 980ti amd is ahead

and i think you dont understand why we play the aots card... its not because the perf its about to reveal what nvidia keep saying
also newsflash http://www.eteknix.com/amd-makes-gains-discrete-graphics-market/

Actually in the latest GoW updates Fury X beats 980 TI.

All it took was a game patch that raised Fury performance 2x.

http://www.overclock3d.net/reviews/...erformance_retest_-_the_game_has_been_fixed/6

Rise of the Tomb Raider, all cards saw negative scaling and they didn't bother to use Async compute even though they used it on the XB1 build for BTAO. Gotta say that was my biggest pre-order regret this year. They screwed the DX12 launch so hard

Ashes of the Singularity is the only DX12 game that takes advantage of the new DX12 features:

* Async Compute
* Multi-Adapter for MGPU support (cross vendor)

They also worked closely with BOTH Nvidia and AMD, they even stated they worked more with Nvidia directly than AMD. Their main developer, Dan Baker, also helped create some of the core DX standards.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
What I noticed, is that a couple of reviewers got confused by the term "Async Compute", using it both for the DX11 extension for explicit preemption by high priority context, and the asynchronous queues in DX12. And mixing these together badly, stating that Pascal would now fully support Async Compute in DX12 because it can do preemption now, or that Maxwell could perform the context switch (the reassignment of SMMs) in DX12 at draw call borders.

I would say NVs marketing for fuzzing this term was a complete success.

Preemption and async are different. Sebbi over at beyond3d explained this greatly. Nvidia didn´t talk about async, only preemption and it improved its granularity in Pascal, compiler level doable.

Async in AoTS are compute tasks. The feature is called async-compute but you can run it like an "async-compute" architecture (ala GCN) capable of running at the same time both compute and graphics tasks at CU level or fully using entirely a SM for compute or graphics like Nvidia does.

No async compute in sight.

You can read and learn more over at beyond3d here:

https://forum.beyond3d.com/threads/nvidia-pascal-reviews-1080-and-1070.57930/page-6

Nobody got confused except this guy. No API describes how the hardware has to schedule work.

Pascal supports "load balancing" on the hardware to override the driver settings when the hardware detects that no new workload comes to the segmented parts of the GPU. This is Async Compute.
 

Det0x

Golden Member
Sep 11, 2014
1,061
3,105
136
Nobody got confused except this guy. No API describes how the hardware has to schedule work.

Pascal supports "load balancing" on the hardware to override the driver settings when the hardware detects that no new workload comes to the segmented parts of the GPU. This is Async Compute.

Read my updated post, or you can just follow the thread ive already linked.. its all in there.

Is that an assumption on your part or is there evidence to support that?

In the Pascal reviewers guide nVidia explicitly mentions overlapping PhysX kernels with graphics tasks. Are you saying that those DirectX and CUDA tasks are submitted to the GPU in the same command list?
Sorry, oversimplification from my side. I only referred to draw and dispatch calls going via the graphics command processor (which also handles the "compute" queues in DX12), and forgot that independent command processor handling CUDA.

Yes, the same benefits also apply to grids dispatched from the independent HW queues used for CUDA, which makes perfect sense if the reallocation happens independent from either command queue.

But the point is: It looks as if Nvidia also managed to get rid of the stall on the GPC all together which did previously cause a lot of problems. Several Maxwell performance guidelines, such as "avoid mixing compute and graphics", or "don't toggle between compute and graphic queues" are now void.

The old "CUDA has access to command queues which should be exposed as compute queues in DX12 rather than doing everything on the GPC" complaint appears to remain valid though. I've not seen any indicator that they've fixed this yet.

Isn't async compute simply the fact that a GPU can run compute shaders independently and asynchronously with graphics workloads?
If so, doing it inter-shader instead of intra-shader should be sufficient to meet that definition.
Yes, cooperative scheduling is perfectly sufficient to fulfill the specification. Maxwell did that already, respectively you can do that on any hardware.

But the problem with Maxwell was that it would essentially flush the entire graphics pipeline, all SMMs and stall the command processors, in order to reconfigure the hardware for compute. That made the switch extremely expensive, as the GPU utilization suffers while the remaining draw calls complete, and the GPC isn't allowed to dispatch anything new.

The specs said nowhere that you had to gain anything from Async Compute, but that penalty should not have happened either.
 
Last edited:

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
Nobody got confused except this guy. No API describes how the hardware has to schedule work.

Pascal supports "load balancing" on the hardware to override the driver settings when the hardware detects that no new workload comes to the segmented parts of the GPU. This is Async Compute.
if it has hardware sc then how come it still regressing?
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Read my updated post, or you can just follow the thread ive already linked.. its all in there.

And it is wrong. Listen to the youtube video:
The GPU gets segmented on the SM level through the driver. When the load balancer detects that there is no new workload coming towards those segments it can align those segments to the other workload.
 

Det0x

Golden Member
Sep 11, 2014
1,061
3,105
136
Similarly, nobody said that enabling async compute has to be faster than not enabling it: if a particular implementation is such that it can't find inefficiencies to exploit, then so be it.
It is well known AMD hardware suffered from underutilization since generations ago. Async helps them achieve better utilization. That doesnt mean NVIDIA should follow suit. There are other ways through which a certain archeticture maximizes its throughput.

End result:

No async compute in sight, but at least there is no performance decline using AC, as there is with Maxwell
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
If Nvidia cards are faster (or the same) without AC than AMD cards are with AC, what is the problem?

Beside the obvious, it's a non supported feature that would improve performance, have you not noticed nVidia cards get slower over time vs. AMD? This is just more built in obsolescence.

Games will either use this feature more to get better performance, or they'll hold it out because nVidia doesn't offer support. In the 1st instance nVidia owners will suffer. In the second we all suffer. Which do you think nVidia will put their weight behind?
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
A single game that virtually nobody plays. Ashes of The Singularity has done its job though -- promote AMD and sow the seeds of doubt for the "DX12 performance" of NVIDIA based cards.

Hope the Oxide guys got compensated nicely for this because AMD seems to be drawing tremendous value from it.

If the game runs better on AMD then the game sucks. This is not new spin. The writing is on the wall and you don't want anyone to read it.
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
Looks like nvidia didn't do their homework.

We should expect alot of gameworks titles in this generation. The only way they stay competitive is to deny this advantage from async compute to AMD.
and then its nvidia vs 7 bigger companies (well 6 since amd isnt tha big nowdays)
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
actually if you take out gow and rotr in most cases when people compare refrence to refrence and not fury x with highly clocked 980ti amd is ahead

and i think you dont understand why we play the aots card... its not because the perf its about to reveal what nvidia keep saying
also newsflash http://www.eteknix.com/amd-makes-gains-discrete-graphics-market/

In most cases, you're wrong... And highly overclcocked? You mean the OC's almost anyone can achieve with whatever air cooler came with their 980TI's? Those overclocks? The ones that Fury X can't touch with it's water cooled GPU and HBM that was supposed to make all GDDR5 cards obsolete as soon as AMD graced us with the card? I hope those aren't the overclocks you're referring to, because that's down right embarrassing.
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
If the game runs better on AMD then the game sucks. This is not new spin. The writing is on the wall and you don't want anyone to read it.

He didn't say it sucks he said virtually nobody plays it. Do you play it? What's your Steam ID so I can check out how many hours you've logged into AOTS.
 

xthetenth

Golden Member
Oct 14, 2014
1,800
529
106
I too get easily confused and think that studios would finance the development of a new engine for one game.
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
In most cases, you're wrong... And highly overclcocked? You mean the OC's almost anyone can achieve with whatever air cooler came with their 980TI's? Those overclocks? The ones that Fury X can't touch with it's water cooled GPU and HBM that was supposed to make all GDDR5 cards obsolete as soon as AMD graced us with the card? I hope those aren't the overclocks you're referring to, because that's down right embarrassing.
and the fact remains they did bench it against oc cards not refrence

also yes in most case i am right and as if know only on rots amd doesnt win
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
We are not discussing if the game is good or bad or if it sold millions of copies.

Here we are talking about the Async Compute capabilities of the GPUs.

As of now, AoTS and Hitman does use Async Compute and thus we are using those two Games. Im sure Warhummer will use AC as well Deus Ex.
And knowing Dice, it is highly possible that BF1 will also be DX-12 and will use Async Compute as well.

So, it is understandable for certain people trying to divert the conversation outside of the threads Title, but that doesnt change the fact that Pascal cannot do Async Compute like GCN does.
And the only way NV hardware will be faster in DX-12 games its through its GameWorks initiative.
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
If Pascal runs DX12 well, no one is going to care if it has AC or not, and no one is going to care what NV calls whatever it does have.

Heck, if it "brute forces" it's way to running DX12 well, no one will care, either. It may even appeal to a lot of people.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
Can we resolve the performance hit by dedicating an another card to compute? Like we can dedicate a whole card to PSY-X? If so, I'll just keep the Titan in my computer along with the 980Ti. I had it for PSY-X but there were so few titles that I was going to give up on that idea but if I can do that for DX12 then that would be great.
ps. Is someone also annoyed by "IF NV doesn't support something it might just as well not exist"crowd? This is a terrible approach that stalls progress.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Can we resolve the performance hit by dedicating an another card to compute? Like we can dedicate a whole card to PSY-X? If so, I'll just keep the Titan in my computer along with the 980Ti. I had it for PSY-X but there were so few titles that I was going to give up on that idea but if I can do that for DX12 then that would be great.

Yes, it is possible. nVidia is offloading the compute part to the second GPU.
 

Despoiler

Golden Member
Nov 10, 2007
1,966
770
136
And the hardware can override the driver setting. I like how you ignore this part. :thumbsup:

No, what he said is that "you" ie a dev can specify to use static mode "if you really want to". Dynamic mode is the new mode and by saying "if you really want to" sounds like dynamic mode is the default behavior. Dynamic mode is done in software in the driver.
 

ZGR

Platinum Member
Oct 26, 2012
2,054
661
136
So how slow of a card can do the offloading? We have seen low end cards like GTX 650/950 or 750 ti to be dedicated purely to PhysX.

If these companion cards work for Async as well, then that could be good news for users with dedicated PhysX cards.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |