No, it is not. It is optimized for AMD hardware.
Otherwise nVidia hardware would be faster with DX12 like they are in King of Wushu or Fable Legends.
Umm no. I know... you're one of those fanbois who doesn't know the first thing about how GPUs function. The MSAA also wasn't broken. May I suggest you read the Extreme Tech article on the topic.
As for what's going on... here's a course...
It's not about optimizations. It's about Hardware limitations. These limitations affect both nVIDIA and AMD GPUs under Ashes of the Singularity but in different ways.
Take Parallelism for example...
The difference between Maxwell and Maxwell 2 is that Maxwell's Grid Management Unit can only send either a Graphics task or 32 Compute tasks to the work Distributor. It cannot send both in Parallel.
Maxwell 2 changes this. Therefore now, with Maxwell2, the communication between the Grid Management Unit and Work Distributor works in Parallel.
The problem is that this doesn't change the fact that Maxwell 2 still only contains a single Grid Management Unit. This still remains as a bottleneck.
nVIDIAs Parallelism, under Maxwell 2, is thus limited to 1 Graphics and 31 Compute tasks. AMDs Parallelism, under GCN 1.1 (290 series) and GCN 1.2 is limited to 1 Graphics and 64 Compute tasks.
Another difference is that AMDs GCN 1.1 (290 series)/GCN 1.2 have 8 independent Asynchronous Compute Engines each able to schedule and prioritize work independently of one another. With Maxwell 2, it's a single Grid Management Unit. You can see why GCN 1.1 (290 series)/GCN 1.2 can best take advantage of the available compute resources.
Take a look at all of those light sources floating around in Ashes of the Singularity. Each unit emits its own light sources in Parallel to other units. Each one of those light sources is a Compute task.
Therefore if there are more than 31 Compute tasks (assuming there is a Graphics task which there ought to be because of the amount of Rasterization going on), it takes two cycles for Maxwell 2 to assign the tasks to the Work Distributor. This looks to be the culprit (explaining why Maxwell 2 tends to match, but not beat, AMDs GCN 1.1/1.2 architecture).
I'm quite certain that Pascal will incorporate more than a single Grid Management Unit for this very reason.
Since Ars Technica showed that a 290x can nearly match a GTX 980 Ti and a GTX 980 Ti is a near match to a Fury-X then we can conclude that the 290x and the Fury-X are a near match under Ashes of the Singularity. This points to a common bottleneck between both Hawaii and Fiji architectures.
So we have to look at the nature of Ashes of the Singularity. Ashes of the Singularity does two things in a big way.
1. Makes ample use of Asynchronous Shading.
2. Draws MANY units onto the screen (requiring many Triangles or Polygons).
Since both Fury-X an the 290x share the same Asynchronous Compute Engines, but with Fury-X having more compute resources at its disposal, then we can conclude than if Asynchronous Shading and Compute resources was the bottleneck for Fiji and Hawaii... we'd see Fiji fairing better than Hawaii. this is not the case.
Since both Fiji and Hawaii retain the same amount of Hardware Rasterizers (and the same Peak Rasterization rate expressed in Gtris/s) we can conclude that both are bottlenecked by their Peak Rasterization rate (ability to draw triangles/polygons).
Since the GTX 980 Ti has a much higher peak rasterization rate, we would expect the GTX 980 Ti to overpower the Fiji and Hawaii cards, this is not the case. Therefore we can conclude that the GTX 980 Ti is being limited by its Asynchronous Compute capabilities.
Fiji and Hawaii are bottlenecked by their Peak Rasterization rates under Ashes of the Singularity while Maxwell 2 is bottlenecked by its ability to handle Asynchronous Shading.
Warning issued for personal attack.
-- stahlhart