I will copy some posts from a other forum
What I noticed, is that a couple of reviewers got confused by the term "Async Compute", using it both for the DX11 extension for explicit preemption by high priority context, and the asynchronous queues in DX12. And mixing these together badly, stating that Pascal would now fully support Async Compute in DX12 because it can do preemption now, or that Maxwell could perform the context switch (the reassignment of SMMs) in DX12 at draw call borders.
I would say NVs marketing for fuzzing this term was a complete success.
Yes preemption and async are different. Sebbi over at beyond3d explained this greatly. Nvidia didn´t talk about async, only preemption and it improved its granularity in Pascal, compiler level doable.
Async in AoTS are compute tasks. The feature is called async-compute but you can run it like an "async-compute" architecture (ala GCN) capable of running at the same time both compute and graphics tasks at CU level or fully using entirely a SM for compute or graphics like Nvidia does.
No async compute in sight.
You can read and learn more over at beyond3d here:
https://forum.beyond3d.com/threads/nvidia-pascal-reviews-1080-and-1070.57930/page-6
*edit*
Dynamic load balancing is a thing - yes, and it is a hardware feature. But it's nowhere the same, or even remotely comparable to GCN's async execution via the independent command lists dispatched by the ACE units.
Dynamic load balancing is only for efficiently switching between compute and graphic workloads inside a single command list, respectively for eliminating the need for a full command buffer flush every time the partition scheme changes.
So you can essentially now:
Upload the next compute only command list while the previous mixed command list is still in execution as the SMMs may now switch the mode lazily after the finished the graphics portion.
Vice versa also when switching back to graphics.
The penalty for a driver screwup when you mix compute and graphics inside a single command list is also eliminated.
Technically, that means there is no longer a scheduling problem just from having compute portions in there, and by that you avoid stalling the command processor.
What it doesn't provide yet, is the resource sharing or the truly asynchronous scheduling AMDs hardware features. So it using asynchronous queues r compute is now only (almost...) "for free",
but it's still not gaining you anything.
And without triggering actual, explicit preemption, you are not gaining truly asynchronous, independent execution yet either. You are still subject to all side effects resulting from cooperative scheduling.
But they are unfortunately still referring to their preemption extension for DX11 as "Async Compute" too. On purpose.