There's no dynamic light source for spells & weapon effects. In the demo showed where they mention async compute, there is. So from a viewer PoV, one's very static, other isn't.
Honestly, it's damn near impossible to see dynamic lighting in any of those videos, as the footage quality just isn't there. The only video where dynamic lighting is in your face is the one I posted earlier which demonstrated dynamic global illumination.
Edit: The contention here is that NV drops off in performance in Ashes in the full test (but gains perf in the draw call test!) is perhaps due to poor async compute/shading performance OR poor implementation by Oxide. For the blame on Oxide to be true, you would need to show proof that Kepler/Maxwell is great at async compute/shading. As I said, I only see examples of gamedevs praising async compute/shading for GCN, not for Kepler/Maxwell.
Let's first come to an agreement on what asynchronous compute actually is.
To my understanding, asynchronous compute allows devs to execute compute shaders in parallel with the rendering, but out of sync. The last part is important, because it means that rendering performance should not be affected since the GPU is using
SPARE cycles for asynchronous compute.
Now AMD's approach is to use 8 dedicated ACEs, who's only duty is to process asynchronous compute. 8 ACEs is a very large number, but it probably works for AMD because their architecture has a big problem with under utilization and would expectedly have a lot of spare cycles.
This is in sharp contrast to NVidia's Maxwell, which has no dedicated ACE or similar counterpart, but instead uses the GMU along with their HyperQ technology to keep the GPU as occupied as possible; an approach which seems to be very successful as Maxwell is an extremely efficient architecture and does not have the under utilization problem that GCN has.
So in light of this, I submit that cross comparisons between AMD and NVidia in the realm of asynchronous compute is
FUTILE.
If you could magically strap on 8 ACEs to Maxwell, I doubt it would make one single bit of difference since Maxwell's single GMU has very little problem keeping the GPU occupied.
So what's causing the DX12 path to be slower than the DX11 path in AotS for Maxwell? I think it's because Oxide's DX12 optimization for NVidia isn't up to par with the driver optimizations NVidia made for DX11.
Remember, in DX12, the developers have much closer access to the hardware than in DX11, which puts the burden of responsibility for performance mostly on them. With DX11 it was the opposite. NVidia did a
TON of driver optimization on the side to give them an edge.
With GCN, the ACEs I believe are controlled in hardware. So the devs probably don't have to do much, if anything to exploit them. But since NVidia does not have dedicated ACEs, tight management of the GMU and HyperQ will become critical for performance.
This explains why the GTX 980 Ti was able to be pretty much on par with the Fury X in the benchmarks, whilst other Maxwell GPUs like the GTX 980, 970 and 960 with smaller shader arrays experienced slowdowns in DX12. That's because they need greater optimization due to having less spare cycles than the GTX 980 Ti and Titan X.
In DX11, NVidia used their drivers to manage the GMU, but in DX12, the developers will probably have to get their hands dirty when it comes to tapping into it.