Thing I don't understand is this:
If concurrent graphics+compute didn't matter then why would Sony have specifically requested for an additional 6 ACEs in GCN? Why spend money on the hardware if it isn't important?
Why have so many prominent devs post about the benefits of asynchronous compute? Is it common for them to tweet about trivial hardware features?
About Pascal, I'm just glad there isn't a performance regression in dx12, going to be very good for all owners of GCN cards.
Because some people here deliberately spread FUD about Async Compute.
In DX12/Vulkan's programming guide, it's simply referred to as Multi-Engine. Look up the documents.
This Multi-Engine API allows different queues to run concurrently IF the hardware is capable.
Graphics, Compute, Copy.
Sony specifically wanted this Multi-Engine feature, and their lead architecture made an example point, when you're rendering Shadow Maps, you are only using the Rasterizer (ROPs which also handle other types of workloads), the Shaders are idling. It is in this situation which you can run Compute queues separately so that both ROPs + Shader Clusters / CUs are both performing work concurrently.
Add to this, Copy queues can directly work on DMAs engines (GCN has 2 active, Kepler/Maxwell has 2, but 1 is disabled in DX, only 2 accessible via CUDA for some reason), to get transfers going concurrently.
Without hardware support for this feature, no matter how great your shader utilization is, you cannot use concurrent Rasterizer and DMAs while the shaders are running. That is a FLAW for prior APIs & GPUs which lack Multi-Engine / "Async Compute" hardware.
Required reading for some people, before they keep on spreading more FUD!
https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx
http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php
Cerny expects developers to run middleware -- such as physics, for example -- on the GPU. Using the system he describes above, you can run at peak efficiency, he said.
"If you look at the portion of the GPU available to compute throughout the frame, it varies dramatically from instant to instant. For example, something like opaque shadow map rendering doesn't even use a pixel shader, it’s entirely done by vertex shaders and the rasterization hardware -- so graphics aren't using most of the 1.8 teraflops of ALU available in the CUs. Times like that during the game frame are an opportunity to say, 'Okay, all that compute you wanted to do, turn it up to 11 now.'"
http://ext3h.makegames.de/DX12_Compute.html
ps. If anyone who makes a remark about how Async Compute is only useful for GPUs with less shader utilization, or how its ineffective for GPUs that have 100% shader utilization, they don't know what they are talking about and just keep on regurgitating the same fud.