[bitsandchips]: Pascal to not have improved Async Compute over Maxwell

Page 20 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

renderstate

Senior member
Apr 23, 2016
237
0
0
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.
 

nurturedhate

Golden Member
Aug 27, 2011
1,762
761
136
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

Or you can continue making ill-informed comments that clearly demonstrate your lack of understanding of the subject matter overall while still thinking you are correct.
 

Ma_Deuce

Member
Jun 19, 2015
175
0
0
Or you can continue making ill-informed comments that clearly demonstrate your lack of understanding of the subject matter overall while still thinking you are correct.

I don't really believe that he thinks he's correct. He's putting too much effort into discrediting it. If there was true confusion, I don't think he would be bringing AMD into it.
 

nurturedhate

Golden Member
Aug 27, 2011
1,762
761
136
I don't really believe that he thinks he's correct. He's putting too much effort into discrediting it. If there was true confusion, I don't think he would be bringing AMD into it.

You're probably right. Probably gets told what to post about.
 

selni

Senior member
Oct 24, 2013
249
0
41
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.

Async is part of how they're addressing their utilization problem though - it's hardly a bad idea and will probably be necessary for both as GPUs keep getting wider and used for more general tasks. NV have done a better job with GPU utilization to date, but they're going to hit the same issues eventually.
 

xpea

Senior member
Feb 14, 2014
449
150
116
It s tailored toward async compute, either your hardware support it and there s some gain or it doesnt support it and at best there s no losses...

I guess that you didnt get that async compute is about gaining something...
God, for the thousand time, how can you gain performance if your resource is already fully utilized ? (ie nvidia). You can't go 110% !
You can only gain performance with DX12 async if you have idling ALUs under DX11. Thats the whole story of CGN poor utilization under DX11 and the reason why AMD much higher FLOPs never materialize in performance lead (under DX11).

what is so difficult to understand ?
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the informed replies of those that still don't understand how async compute works/what it does and doesn't do.

FTFY ...

It could be a number of things. We need performance counters to truly tell how well the Nvidia microarchitecture is been utilized ...

You can not say if it has async compute or not due to near maximum utilization nor could you claim high utilization without measuring hardware throughput ...

It would be nice to see some async compute experiments with several 16K resolution shadow maps and a depth pre-pass for deferred lighting while a very heavy compute shader is running ...

I know how well Nvidia competitor's hardware would react but as for their own latest Pascal architecture it's very ambiguous and I don't find that to be reassuring one bit ...

The killer app of async compute is been able to overlap compute work with rasterizer and texture sampler bound work ...

FWIW, I agree with you that hardware should be ideal for the software yet the same is true for the other way as well ...
 
Last edited:

casiofx

Senior member
Mar 24, 2015
369
36
61
God, for the thousand time, how can you gain performance if your resource is already fully utilized ? (ie nvidia). You can't go 110% !
You can only gain performance with DX12 async if you have idling ALUs under DX11. Thats the whole story of CGN poor utilization under DX11 and the reason why AMD much higher FLOPs never materialize in performance lead (under DX11).

what is so difficult to understand ?
That is why AMD spent so much effort to lay down the software groundwork for PC gaming as well as capturing the console chip deals. To sow the seed for the architecture that they designed to bear fruits.
 

Despoiler

Golden Member
Nov 10, 2007
1,966
770
136
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.

Oh boy. When you ended up proving your opponents point, but you didn't even realize it. So wait Nvidia implemented a feature because they already have perfect or near perfect utilization? If you have perfect utilization you would just say async compute is BS here is our already perfect utilization because our hardware and software rocks. They didn't though. They tasked their driver team to implement dynamic load balancing and then demo'd it . Therefore they acknowledged they have gaps in their utilization or they are straight up lying for marketing purposes. Not only that, but scene to scene never takes the exact same resources. To claim otherwise is just plain silly. I think we can safely assume Nvidia is attempting to tackle an issue that is very much real.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
Oh boy. When you ended up proving your opponents point, but you didn't even realize it. So wait Nvidia implemented a feature because they already have perfect or near perfect utilization? If you have perfect utilization you would just say async compute is BS here is our already perfect utilization because our hardware and software rocks. They didn't though. They tasked their driver team to implement dynamic load balancing and then demo'd it . Therefore they acknowledged they have gaps in their utilization or they are straight up lying for marketing purposes. Not only that, but scene to scene never takes the exact same resources. To claim otherwise is just plain silly. I think we can safely assume Nvidia is attempting to tackle an issue that is very much real.

No, better utilization does not mean perfect utilization, I think you're deliberately misunderstanding what people are saying.
 

renderstate

Senior member
Apr 23, 2016
237
0
0
Of course he's deliberately misunderstanding. Not only I never talked about perfect utilization, I was also clearly referring to that specific application. To not mention I wrote many times before async compute is a *great feature*. Unfortunately it's also the most opportunistically misunderstood feature.
 

Mikeduffy

Member
Jun 5, 2016
27
18
46
Thing I don't understand is this:

If concurrent graphics+compute didn't matter then why would Sony have specifically requested for an additional 6 ACEs in GCN? Why spend money on the hardware if it isn't important?

Why have so many prominent devs post about the benefits of asynchronous compute? Is it common for them to tweet about trivial hardware features?

About Pascal, I'm just glad there isn't a performance regression in dx12, going to be very good for all owners of GCN cards.
 
Feb 19, 2009
10,457
10
76
Thing I don't understand is this:

If concurrent graphics+compute didn't matter then why would Sony have specifically requested for an additional 6 ACEs in GCN? Why spend money on the hardware if it isn't important?

Why have so many prominent devs post about the benefits of asynchronous compute? Is it common for them to tweet about trivial hardware features?

About Pascal, I'm just glad there isn't a performance regression in dx12, going to be very good for all owners of GCN cards.

Because some people here deliberately spread FUD about Async Compute.

In DX12/Vulkan's programming guide, it's simply referred to as Multi-Engine. Look up the documents.

This Multi-Engine API allows different queues to run concurrently IF the hardware is capable.

Graphics, Compute, Copy.

Sony specifically wanted this Multi-Engine feature, and their lead architecture made an example point, when you're rendering Shadow Maps, you are only using the Rasterizer (ROPs which also handle other types of workloads), the Shaders are idling. It is in this situation which you can run Compute queues separately so that both ROPs + Shader Clusters / CUs are both performing work concurrently.

Add to this, Copy queues can directly work on DMAs engines (GCN has 2 active, Kepler/Maxwell has 2, but 1 is disabled in DX, only 2 accessible via CUDA for some reason), to get transfers going concurrently.

Without hardware support for this feature, no matter how great your shader utilization is, you cannot use concurrent Rasterizer and DMAs while the shaders are running. That is a FLAW for prior APIs & GPUs which lack Multi-Engine / "Async Compute" hardware.

Required reading for some people, before they keep on spreading more FUD!

https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php

Cerny expects developers to run middleware -- such as physics, for example -- on the GPU. Using the system he describes above, you can run at peak efficiency, he said.

"If you look at the portion of the GPU available to compute throughout the frame, it varies dramatically from instant to instant. For example, something like opaque shadow map rendering doesn't even use a pixel shader, it’s entirely done by vertex shaders and the rasterization hardware -- so graphics aren't using most of the 1.8 teraflops of ALU available in the CUs. Times like that during the game frame are an opportunity to say, 'Okay, all that compute you wanted to do, turn it up to 11 now.'"

http://ext3h.makegames.de/DX12_Compute.html

ps. If anyone who makes a remark about how Async Compute is only useful for GPUs with less shader utilization, or how its ineffective for GPUs that have 100% shader utilization, they don't know what they are talking about and just keep on regurgitating the same fud.
 
Last edited:

renderstate

Senior member
Apr 23, 2016
237
0
0
Thing I don't understand is this:



If concurrent graphics+compute didn't matter then why would Sony have specifically requested for an additional 6 ACEs in GCN? Why spend money on the hardware if it isn't important?



Why have so many prominent devs post about the benefits of asynchronous compute? Is it common for them to tweet about trivial hardware features?



About Pascal, I'm just glad there isn't a performance regression in dx12, going to be very good for all owners of GCN cards.



Who said async compute doesn't matter?
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Having the ability to do proper Async Compute doesnt say anything about performance improvements.

Ashes is tailored towards AMD. So why would anyone expect the same or any gain on nVidia hardware?

You've repeated this falsehood too many times. Everyone knows that the source code is shared and that the dev has worked with nVidia specifically requesting an update to fix it. The hardware simply hasn't been capable.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
Add to this, Copy queues can directly work on DMAs engines (GCN has 2 active, Kepler/Maxwell has 2, but 1 is disabled in DX, only 2 accessible via CUDA for some reason), to get transfers going concurrently.

Only some cards have both DMA engines enabled. Last I checked, it was Quadros and Titans, and it was disabled on Geforce, though that could have changed. It also used to be the case that one engine was dedicated to uploads and one to downloads (when 2 are enabled), though that could also have changed.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.

You seem to not understand that without async compute certain tasks have to wait until another task is completed before it can run. That creates a stall. Async allows these tasks to run simultaneously. Without it, they simply can't do that. nVidia can't possibly do without it what AMD, or any hardware, does with it. nVidia is the one who has something to fix. Not AMD.
 

renderstate

Senior member
Apr 23, 2016
237
0
0
You seem to not understand that without async compute certain tasks have to wait until another task is completed before it can run. That creates a stall. Async allows these tasks to run simultaneously. Without it, they simply can't do that. nVidia can't possibly do without it what AMD, or any hardware, does with it. nVidia is the one who has something to fix. Not AMD.


This has absolutely nothing to do with what I said. If you have something relevant to say please go ahead, otherwise ignore me.
 

renderstate

Senior member
Apr 23, 2016
237
0
0
Your rate of flip flopping is higher than a quantum qubit D:


A quantum bit doesn't flip flop. That's the very nature of quantum superposition. Quantum mechanics is clearly not your cup of tea.

Also I never said async compute is useless, unnecessary, or anything of that sort, quite the contrary. But who am I to try showing people that they still don't understand (or don't want to understand) async compute?!
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |