It's even worse for ASC (Async Compute). It would be a great opportunity to compare the max possible gaines between GPU vendors/generation with ASC, but with suboptimal utilisation on some vendors this is not possible at the moment.
It is not a suboptimal utilization. I have no clue where you pull that out of.
There is no way to magically change between "optimal" and "suboptimal". DX12 application just files work to two queues - DIRECT and COMPUTE and the rest is up to the driver. AMD engineers themselves worked on suggesting (vendor neutral) optimizations for the Time Spy async compute code and as far as I know they are happy with it.
Yes, there are other low level optimizations (not related to compute) that could be done that rely on specific architectures, but if you start down that road, where does it end? AMD and NVIDIA coding the benchmark for their own architectures? I mean, if we would do vendor-specific paths ourselves, as soon as first numbers are out, you would be posting in a brand new thread and going on and on as to how "Futuremark cannot optimize properly" or "Futuremark favored this and that vendor". With any luck, both green and red team "fans" will simultaneously claim the same thing.
Also as soon as a new architecture ships, first reaction would be "well it is not properly optimized for it, the scores are not comparable". What use such a benchmark would be for releases of new hardware using new architecture?
You seem to fundamentally misunderstand what is a benchmark and what it is supposed to do.