What you and AMD are incorrectly labeling "Async Compute" is not Async Compute. What they mean when they say that is "Multiple Command Processor Issuing."
Due to a flaw in GCN's design, the command processor can only issue commands to 1024 (shaders) at a time, as it was designed explicitly for Tahiti (which has 2048 "shaders")
GCN shaders run instructions from the command processor for 4 cycles. This is how a single command processor can deal with greater than 1024 shaders.
"Multiple Command Processor Issuing," or what AMD incorrectly calls "Asynchronous Computing" is the implementation of the ability to use more than 1 command processor simultaneously.
The reason why "Multiple Command Processor Issuing" is so beneficial for AMD cards is that you have to remember that each command processor can only issue commands to 1024 shaders per clock.
Of course "Multiple Command Processor Issuing" has overhead associated with it, as all unnecessary multi-threading does. The benefits simply usually outweigh the costs when used in GCN due to the abysmal command processor.
The reason you don't see any benefit to "Multiple Command Processor Issuing" for Nvidia cards is that Nvidia cards have command processors sufficient to issue commands to all their "shaders" all by themselves, as they do not have the same design flaw as the GCN command processors.
Thus enabling "Multiple Command Processor Issuing" simply adds overhead with zero possible benefit.
Due to a flaw in GCN's design, the command processor can only issue commands to 1024 (shaders) at a time, as it was designed explicitly for Tahiti (which has 2048 "shaders")
GCN shaders run instructions from the command processor for 4 cycles. This is how a single command processor can deal with greater than 1024 shaders.
"Multiple Command Processor Issuing," or what AMD incorrectly calls "Asynchronous Computing" is the implementation of the ability to use more than 1 command processor simultaneously.
The reason why "Multiple Command Processor Issuing" is so beneficial for AMD cards is that you have to remember that each command processor can only issue commands to 1024 shaders per clock.
Of course "Multiple Command Processor Issuing" has overhead associated with it, as all unnecessary multi-threading does. The benefits simply usually outweigh the costs when used in GCN due to the abysmal command processor.
The reason you don't see any benefit to "Multiple Command Processor Issuing" for Nvidia cards is that Nvidia cards have command processors sufficient to issue commands to all their "shaders" all by themselves, as they do not have the same design flaw as the GCN command processors.
Thus enabling "Multiple Command Processor Issuing" simply adds overhead with zero possible benefit.