[bitsandchips]: Pascal to not have improved Async Compute over Maxwell

renderstate · Jun 15, 2016

Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.

nurturedhate · Jun 15, 2016

renderstate said:
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

Or you can continue making ill-informed comments that clearly demonstrate your lack of understanding of the subject matter overall while still thinking you are correct.

Ma_Deuce · Jun 15, 2016

nurturedhate said:
Or you can continue making ill-informed comments that clearly demonstrate your lack of understanding of the subject matter overall while still thinking you are correct.

I don't really believe that he thinks he's correct. He's putting too much effort into discrediting it. If there was true confusion, I don't think he would be bringing AMD into it.

nurturedhate · Jun 15, 2016

Ma_Deuce said:
I don't really believe that he thinks he's correct. He's putting too much effort into discrediting it. If there was true confusion, I don't think he would be bringing AMD into it.

You're probably right. Probably gets told what to post about.

selni · Jun 15, 2016

renderstate said:
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.

Async is part of how they're addressing their utilization problem though - it's hardly a bad idea and will probably be necessary for both as GPUs keep getting wider and used for more general tasks. NV have done a better job with GPU utilization to date, but they're going to hit the same issues eventually.

xpea · Jun 15, 2016

Abwx said:
It s tailored toward async compute, either your hardware support it and there s some gain or it doesnt support it and at best there s no losses...

I guess that you didnt get that async compute is about gaining something...

God, for the thousand time, how can you gain performance if your resource is already fully utilized ? (ie nvidia). You can't go 110% !
You can only gain performance with DX12 async if you have idling ALUs under DX11. Thats the whole story of CGN poor utilization under DX11 and the reason why AMD much higher FLOPs never materialize in performance lead (under DX11).

what is so difficult to understand ?

ThatBuzzkiller · Jun 15, 2016

renderstate said:
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the informed replies of those that still don't understand how async compute works/what it does and doesn't do.

FTFY ...

It could be a number of things. We need performance counters to truly tell how well the Nvidia microarchitecture is been utilized ...

You can not say if it has async compute or not due to near maximum utilization nor could you claim high utilization without measuring hardware throughput ...

It would be nice to see some async compute experiments with several 16K resolution shadow maps and a depth pre-pass for deferred lighting while a very heavy compute shader is running ...

I know how well Nvidia competitor's hardware would react but as for their own latest Pascal architecture it's very ambiguous and I don't find that to be reassuring one bit ...

The killer app of async compute is been able to overlap compute work with rasterizer and texture sampler bound work ...

FWIW, I agree with you that hardware should be ideal for the software yet the same is true for the other way as well ...

casiofx · Jun 15, 2016

xpea said:
God, for the thousand time, how can you gain performance if your resource is already fully utilized ? (ie nvidia). You can't go 110% !
You can only gain performance with DX12 async if you have idling ALUs under DX11. Thats the whole story of CGN poor utilization under DX11 and the reason why AMD much higher FLOPs never materialize in performance lead (under DX11).

what is so difficult to understand ?

That is why AMD spent so much effort to lay down the software groundwork for PC gaming as well as capturing the console chip deals. To sow the seed for the architecture that they designed to bear fruits.

Despoiler · Jun 16, 2016

renderstate said:
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.

Oh boy. When you ended up proving your opponents point, but you didn't even realize it. So wait Nvidia implemented a feature because they already have perfect or near perfect utilization? If you have perfect utilization you would just say async compute is BS here is our already perfect utilization because our hardware and software rocks. They didn't though. They tasked their driver team to implement dynamic load balancing and then demo'd it . Therefore they acknowledged they have gaps in their utilization or they are straight up lying for marketing purposes. Not only that, but scene to scene never takes the exact same resources. To claim otherwise is just plain silly. I think we can safely assume Nvidia is attempting to tackle an issue that is very much real.

dogen1 · Jun 16, 2016

Despoiler said:
Oh boy. When you ended up proving your opponents point, but you didn't even realize it. So wait Nvidia implemented a feature because they already have perfect or near perfect utilization? If you have perfect utilization you would just say async compute is BS here is our already perfect utilization because our hardware and software rocks. They didn't though. They tasked their driver team to implement dynamic load balancing and then demo'd it . Therefore they acknowledged they have gaps in their utilization or they are straight up lying for marketing purposes. Not only that, but scene to scene never takes the exact same resources. To claim otherwise is just plain silly. I think we can safely assume Nvidia is attempting to tackle an issue that is very much real.

No, better utilization does not mean perfect utilization, I think you're deliberately misunderstanding what people are saying.

renderstate · Jun 16, 2016

Of course he's deliberately misunderstanding. Not only I never talked about perfect utilization, I was also clearly referring to that specific application. To not mention I wrote many times before async compute is a *great feature*. Unfortunately it's also the most opportunistically misunderstood feature.

Mikeduffy · Jun 16, 2016

Thing I don't understand is this:

If concurrent graphics+compute didn't matter then why would Sony have specifically requested for an additional 6 ACEs in GCN? Why spend money on the hardware if it isn't important?

Why have so many prominent devs post about the benefits of asynchronous compute? Is it common for them to tweet about trivial hardware features?

About Pascal, I'm just glad there isn't a performance regression in dx12, going to be very good for all owners of GCN cards.

Silverforce11 · Jun 16, 2016

Mikeduffy said:
Thing I don't understand is this:

If concurrent graphics+compute didn't matter then why would Sony have specifically requested for an additional 6 ACEs in GCN? Why spend money on the hardware if it isn't important?

Why have so many prominent devs post about the benefits of asynchronous compute? Is it common for them to tweet about trivial hardware features?

About Pascal, I'm just glad there isn't a performance regression in dx12, going to be very good for all owners of GCN cards.

Because some people here deliberately spread FUD about Async Compute.

In DX12/Vulkan's programming guide, it's simply referred to as Multi-Engine. Look up the documents.

This Multi-Engine API allows different queues to run concurrently IF the hardware is capable.

Graphics, Compute, Copy.

Sony specifically wanted this Multi-Engine feature, and their lead architecture made an example point, when you're rendering Shadow Maps, you are only using the Rasterizer (ROPs which also handle other types of workloads), the Shaders are idling. It is in this situation which you can run Compute queues separately so that both ROPs + Shader Clusters / CUs are both performing work concurrently.

Add to this, Copy queues can directly work on DMAs engines (GCN has 2 active, Kepler/Maxwell has 2, but 1 is disabled in DX, only 2 accessible via CUDA for some reason), to get transfers going concurrently.

Without hardware support for this feature, no matter how great your shader utilization is, you cannot use concurrent Rasterizer and DMAs while the shaders are running. That is a FLAW for prior APIs & GPUs which lack Multi-Engine / "Async Compute" hardware.

Required reading for some people, before they keep on spreading more FUD!

https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php

Cerny expects developers to run middleware -- such as physics, for example -- on the GPU. Using the system he describes above, you can run at peak efficiency, he said.

"If you look at the portion of the GPU available to compute throughout the frame, it varies dramatically from instant to instant. For example, something like opaque shadow map rendering doesn't even use a pixel shader, it’s entirely done by vertex shaders and the rasterization hardware -- so graphics aren't using most of the 1.8 teraflops of ALU available in the CUs. Times like that during the game frame are an opportunity to say, 'Okay, all that compute you wanted to do, turn it up to 11 now.'"

http://ext3h.makegames.de/DX12_Compute.html

ps. If anyone who makes a remark about how Async Compute is only useful for GPUs with less shader utilization, or how its ineffective for GPUs that have 100% shader utilization, they don't know what they are talking about and just keep on regurgitating the same fud.

renderstate · Jun 16, 2016

Mikeduffy said:
Thing I don't understand is this:

If concurrent graphics+compute didn't matter then why would Sony have specifically requested for an additional 6 ACEs in GCN? Why spend money on the hardware if it isn't important?

Why have so many prominent devs post about the benefits of asynchronous compute? Is it common for them to tweet about trivial hardware features?

About Pascal, I'm just glad there isn't a performance regression in dx12, going to be very good for all owners of GCN cards.

Who said async compute doesn't matter?

dacostafilipe · Jun 16, 2016

renderstate said:
Who said async compute doesn't matter?

:sneaky:

If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

Leadbox · Jun 16, 2016

^ lol

R0H1T · Jun 16, 2016

renderstate said:
Who said async compute doesn't matter?

Your rate of flip flopping is higher than a quantum qubit D:

faseman · Jun 16, 2016

NeoLuxembourg said:
:sneaky:

What a goober!

antihelten · Jun 16, 2016

R0H1T said:
Your rate of flip flopping is higher than a quantum qubit D:

Why do I suddenly have a feeling of déjà vu

3DVagabond · Jun 16, 2016

sontin said:
Having the ability to do proper Async Compute doesnt say anything about performance improvements.

Ashes is tailored towards AMD. So why would anyone expect the same or any gain on nVidia hardware?

You've repeated this falsehood too many times. Everyone knows that the source code is shared and that the dev has worked with nVidia specifically requesting an update to fix it. The hardware simply hasn't been capable.

NTMBK · Jun 16, 2016

Silverforce11 said:
Add to this, Copy queues can directly work on DMAs engines (GCN has 2 active, Kepler/Maxwell has 2, but 1 is disabled in DX, only 2 accessible via CUDA for some reason), to get transfers going concurrently.

Only some cards have both DMA engines enabled. Last I checked, it was Quadros and Titans, and it was disabled on Geforce, though that could have changed. It also used to be the case that one engine was dedicated to uploads and one to downloads (when 2 are enabled), though that could also have changed.

3DVagabond · Jun 16, 2016

renderstate said:
Or it means NVIDIA architecture is better than AMD at fully utilizing the GPU in that particular application. If AMD could fix their utilization issue suddenly async compute wouldn't look as cool.

1, 2, 3..waiting for the irate replies of those that still don't understand how async compute works/what it does and doesn't do.

You seem to not understand that without async compute certain tasks have to wait until another task is completed before it can run. That creates a stall. Async allows these tasks to run simultaneously. Without it, they simply can't do that. nVidia can't possibly do without it what AMD, or any hardware, does with it. nVidia is the one who has something to fix. Not AMD.

USER8000 · Jun 16, 2016

antihelten said:
Why do a suddenly have a feeling of déjà vu

Just,wow.

renderstate · Jun 16, 2016

3DVagabond said:
You seem to not understand that without async compute certain tasks have to wait until another task is completed before it can run. That creates a stall. Async allows these tasks to run simultaneously. Without it, they simply can't do that. nVidia can't possibly do without it what AMD, or any hardware, does with it. nVidia is the one who has something to fix. Not AMD.

This has absolutely nothing to do with what I said. If you have something relevant to say please go ahead, otherwise ignore me.

renderstate · Jun 16, 2016

R0H1T said:
Your rate of flip flopping is higher than a quantum qubit D:

A quantum bit doesn't flip flop. That's the very nature of quantum superposition. Quantum mechanics is clearly not your cup of tea.

Also I never said async compute is useless, unnecessary, or anything of that sort, quite the contrary. But who am I to try showing people that they still don't understand (or don't want to understand) async compute?!

[bitsandchips]: Pascal to not have improved Async Compute over Maxwell

Senior member

Golden Member

Member

Golden Member

Senior member

Senior member

Golden Member

Senior member

Golden Member

Senior member

Senior member

Member

Lifer

Senior member

Senior member

Senior member

Platinum Member

Member

Golden Member

Lifer

Lifer

Lifer

Golden Member

Senior member

Senior member