Silverforce11
Lifer
- Feb 19, 2009
- 10,457
- 10
- 76
Looking over the documentation I can't find anything that says that AMD's 8 queues per ACE are submitted in parallel. The ACE units operate in parallel and can manage 8 queues but nothing appears to say that those 8 queues are done in parallel.
Furthermore AMD's recent blog post states this.
http://developer.amd.com/community/blog/2015/06/05/concurrency-in-modern-3d-graphics-apis/
AMD drivers currently support one queue of each type.
- Copy queues support all kinds of copy operations, including format conversions, multi-sample anti-aliasing (MSAA) resolves, and swizzling
- Compute queues are a superset of copy queues, and also support dispatching compute tasks
- Graphics queues are a superset of compute queues and also support rendering operations
Which seems to indicate that while the hardware is there, more than 1 queue of each type is not enabled by drivers yet.
If anything can find anything on this it would be appreciated.
Each ACE has up to 8 queues, but they cannot handle different queue tasks at the same time, copy, compute, or graphics in the queue, not more than 1 type at once in the queue is my understanding. There's more info in the recent 2015 SIGGRAPH.
Here's the thing though, GCN has a separate CP (hence the 1 + 8 engines, or 1+64 queues) that handles rendering so the ACEs can focus on compute tasks.
And it is true that GCN is under-utilizing its shaders (front-end bottleneck, cos the ACEs are idling in DX11). Computerbase.de's review of Fury X has a summary with their interview with AMD on the uarch and they mention this specifically, its why it sucks at 1080/1440p on DX11 and gets better at 4K (& DX12).
Also, posted on b3d by gamedevs, they had this to say regarding queues, basically don't be fooled by the total queue count of an engine, using more than 1 in parallel is very difficult and can potentially cause issues. I posted the direct quote awhile ago, cbf finding it again.
Last edited: