Ever since Microsoft added the context queue with DX10 to separate the two parts of the GPU drivers we have had latency issues associated with DirectX. The idea of queuing up the draw commands into a 3 long buffer before the card and then again afterwards having vsync buffers as well is a lot of designed in latency. The C++ interface is quite limited in multithreaded support and since that tends to dominate the usage in most games it does seem reasonable to want to remove the limitation there. The fixed pipeline representation in DX is also overly strict compared to what the cards can now do and since most games are now bypassing the fixed function lighting it seems reasonable to want to expand the API.
I think this release says a lot about the way DX has gone and what NVidia and AMD and the big developers wanted but didn't get negotiating with Microsoft. I wouldn't worry too much about the draw call performance increase as its not something they can use until DX/openGL are gone for gaming purposes anyway but it might give performance boosts at some point in the future. But the other changes like better threading support and skipping parts of the fixed function might very well bring a true change and performance benefits on todays CPUs and GPUs.
But the nagging feeling I have is that a low level API is bad overall for PC gaming. Its not just that its AMD only, its GCN with the current generation cards only. If we add a new fixed function or expand what Shaders can do this API could become obsolete or gain its own set of cruft and poor performing aspects. The abstract layer of DX and openGL allows the GPUs to keep getting better and changing their architecture which has served us well for well over a decade. A low level API might force them into keeping backwards compatibility in a way that is detrimental to the industries hardware development.
I need to see what this API looks like to know how much of a problem its going to be. Its either going to allow NVidia and Intel to implement it as well and hence its just an openGL/DX competitor updated for modern concerns or its really a low level API and AMD will have trouble supporting it into the future as well as their competitors. I am hoping its more a competitor that has more flexibility that the big developers wanted than a true LL API.
That is what I originally thought about NV's PhysX, that it was bad overall for PC gaming. ATI introduced tessellation with Unreal Tournament (TruForm), tried to promote it a bit later on (although they dropped hardware TruForm with 9700Pro) with a Ruby demo for HD 2900XT yet Nvidia basically dropped tessellation for nearly a decade before it became a standard with DX11. GPU-based physics was just a tad bit more successful, but still.. it's not quite yet ubiquitous.
However, Mantle is more like Nvidia's CUDA than a more specific function like PhysX (which uses NV's CUDA). We have had CUDA for nearly 7 years now, and an 8800GTX can pretty much run PhysX just as well as it ever did (within its own capacity). I know that Just Cause 2's CUDA filters require GT200 rather than G80 (IIRC), but the point is that with GCN being the basic requirement of Mantle, I think AMD will simply be building upon GCN without changing too many things for the upcoming console generation of games. It will probably only be GCN 2.0, 2.1 and so on forward, without moving onto a completely new arch - given that GCN is in its flexible essence forward-looking from a rather different VLIW-based arch.
It's not like as if the console API is only going to be good for 2-3 years of PC ports and then start to hold everything back. Look at Skyrim, which is like 5-10x as graphically stunning as Oblivion. Skyrim came out many years later on the aging consoles, yet showed us how the same console could still show a vastly better-looking game. The devs were still finding more efficient ways of coding to the metal after all these years, and getting amazing results. Bottom line is that we still got SKYRIM on these ancient consoles. GTA IV is another story - more to do with draw calls and memory management (that could have drastically benefited with an API of the same nature for PC's, encouraging the dev to port easily to). Look at how long it took for the devs to improve Skyrim performance with patches for the PC version.
As for comparing it to Glide - heck, some games ran SEVERAL times faster on Glide than on DirectX (UT '99), or even OpenGL. DX7 and below were abysmal - just look at how Half Life 1 ran compared to OpenGL on the older rigs. I don't expect Mantle to run THAT much faster than DX11, DX11.1 or DX11.2, but a minimum of 10% speed increase is a modest expectation, if I had to guess. The difference over DX11.2 might be lower than DX11.0 (which is obviously one of the reasons why M$ just had to do DX11.2 for XBone). :sneaky:
Imagine Kaveri APU (Steamroller-based CPU with GCN-based GPU) coming on socket FM2+ in Q1 2014, with more benefit from Mantle than all other Radeon cards, thanks to the huge efficiency boost with CPU and bandwidth utilization! There could indeed be more than 20% gain to be seen there, easily.
One thing for sure is that AMD is now working more closely with game devs than ever before. Only if ATI did not give up with Truform, but it did look funny in its infancy. AMD has been maturing as of late, learning from Nvidia and from their own failures. I just hope AMD keeps it on the up and up, without the 65nm Athlon (weaker than 90nm), Phenom 1, Bulldozer, HD 4890 using 800sp instead of 960) etc.. moments happening again ever - because AMD really needs all the constructive momentum they can possibly achieve.