zlantan, can you expand on the predictable performance comment?
Well I have several issues about how D3D and the driver use the hardware. One of my main problem is the real time shader compilation is out of my control. I don't know when will happen and how much time will it take. I can't optimize directly for this scenario, and the performance is hardly depends on the driver. For example if I somehow make the code good enough and it didn't microstutter in my tests, than maybe a new driver will broke it and I have to patch the original code. This is so embarrassing for me and the industry because my customers want fluid gameplay, and I can't guarantee that. This is certainly not the way how real time shader compilation should work. Mantle will help here a lot, because with this API I can save complete pipelines with pre-compiled shaders. I just load them and you will have fluid gameplay with all drivers.
My second problem is how to deal with data hazards. This is one of the biggest problem in D3D, because you need to create resources but something have to deal with them. The driver must track all the writes and reads, because the write operations need to be completed before a shader read the actual resource. Now avoiding race conditions and hazards are controlled by the driver and this is bad, because I don't see what's going on under the hood. I just get the bad performance and I really don't know why. At this point I have to understand how the hardware is working which is kinda weird, because Nvidia don't document this, so I must use Intel and AMD GPU, than I somehow need to figure out what optimization should I use for Nvidia. Many times I just do some minor tweaks in the code, but it takes so much time to get better performance, because the driver don't let me see what's the problem.
This is what predictable performance means. Now I write a code, and maybe it has good performance, but maybe not. And if not I don't see where should I optimize. In Mantle the resource tracking is done by the application. Yes I have to write my own code for it, but I will see how it works. If the performance is weak I see where is the problem, and I patch it before the day ends.
The multi-threading model is still a problem in D3D. The API and the driver is the primary reason why we can't use efficient threading on PC. The kernel driver has hidden server threads and these are conflicts with the app threads. The best thing I can do is to limit the number of app threads, so the driver has two or three unused CPU cores, where it can do it's work. Deferred context just make this worse, because the driver use more aggressive threading, so you need to free up more CPU cores, likely four. But most PCs don't have more that four logical CPU cores, so if I free up these for the driver I don't have resource for the application.
I actually do manual threading with immediate context, which is not good, but I don't have better idea how to solve this puzzle in D3D. I think deferred context is a total failure. The command buffers can't be re-used, and many times this solution is slower than immediate context.
Mantle has a much more elegant multi-threading model. I have very good control over the hardware, there are no hidden driver threads, so I can use all the resources, all the cores, nothing will conflict with the app logic. This is how an API should work.
And how do you feel about DirectX 12, which will supposedly have many of the "direct metal" traits that Mantle was developed for.
I actually don't saw D3D12. My friend saw it and he said it's very similar to Mantle, so the difference between these APIs are negligible. Personally, I will definitely support D3D12 when it launches.