Tahiti is not pixel-fillrate limited, at least that is no bullet-proof assumption. The GPU clock influences all fillrates and computational power by the same factor, so there simply is no way to tell which part is bottlenecking.
Yes it is. You can check it by looking at how overclocking affects performance in certain games. I am not going to go through this again. I've done it before. GCN is pixel fill-rate limited. The entire front-end of Tahiti is a bottleneck. AMD's cards in general have been pixel fill-rate deficient for many generations now. AMD managed to increase ROP efficiency 50% which is why they kept it at 32 since 6970 but it's still not enough. NV is already on 48 ROPs. They have a better balanced design. The 580 had a major texture fill-rate limitation, which was corrected in 680.
If you look at various games and analyze benchmarks and resolution, you can usually find a bottleneck in a particular architecture. We knew 5870/6970 were tessellation bottlenecked, 580 was texture fill-rate bottlenecked, and Tahiti needs more ROPs/faster ROPs. I called it a long time ago that 580 was texture fill-rate limited and NV nearly tripled it with 680.
Here is another hint: HD7870 and HD7970GE have nearly identical pixel fill-rate but HD7970GE has
60% more texture performance,
87% more memory bandwidth,
68% more shader/GLOPs performance.
In
Crysis 3 that stresses everything, HD7970GE is less than 50% faster than HD7870 but it should be much faster.
Looking at high resolution gaming that stress textures, we can note that HD7970GE is only 48% faster than 7870 at 1600P:
http://www.computerbase.de/artikel/grafikkarten/2013/nvidia-geforce-gtx-760-im-test/4/
The common denominator is that HD7970GE has a pixel fill-rate deficiency, ACE and geometry engine bottlenecks since those are the least improved over 7870. If HD7970GE was not pixel fill-rate limited, it would be beating 7870 by much higher amounts since GCN scales well with shaders & textures. That means in many games pixel fill-rate is a key component.
The increased number of ACEs may help performance by better utilizing the ALUs in the CUs, but how that affects gaming, remains to be seen. Bitcoin mining is irrelevant here.
Again, bitcoin mining benefits from efficient ALUs. With more ACEs, the ALUs are better utilized. If you read on GCN architecture, ACEs are critical for scheduling compute workloads.
"Meanwhile on the compute side, AMD’s new Asynchronous Compute Engines serve as the command processors for compute operations on GCN. The principal purpose of ACEs will be to accept work and to dispatch it off to the CUs for processing. As GCN is designed to concurrently work on several tasks, there can be multiple ACEs on a GPU, with the ACEs deciding on resource allocation, context switching, and task priority."
Since games use compute, more ACEs will allow GCN to perform compute calculations faster because 2 ACEs won't be bottlenecking as much.
Tessellation is another factory where GCN lags Kepler. Adding a 3rd geometry engine will help here too.
The card you just listed addresses 3 of 4 weak areas lacking in Tahiti - ROPs, ACEs and geometry shaders. There is no way it would only be 25% faster on average based on the specs you listed. It is actually mathematically impossible since I have already shown to you that if texture & shader performance alone improve 26% on Tahiti, performance improves 19-21%. Problem is Tahiti doesn't scale linearly after a while because shader & texture performance increase much greater relative to pixel fill-rate since you are applying higher GPU clock speeds to more shaders & textures but there are only 32 ROP units.
You seem to have added almost no performance increases related to:
1) 3rd geometry engine
2) doubling of ACEs
3) 50% increase in ROP.
You are assigning just a 5-6% increase in performance from those 3 factors. Does not compute. Your specs are either too high or your estimate for increase in speed is too low.
The other thing AMD needs to do is up the texture performance. Once they fix the pixel fill-rate, they'll be quickly running into a TMU problem because that will be the next bottleneck.
For FP16 textures, Titan is 88% faster than Tahiti:
http://techreport.com/review/24381/nvidia-geforce-gtx-titan-reviewed/6
GLOPs/shader performance & memory bandwidth are the least important areas to address in Tahiti. It has shader performance by truckloads. AMD needs to focus on ROPs, ACEs, geometry and textures. It's a little more complicated though since while Tahiti already has stronger shader performance than even the Titan, with compute features in games, you still want more ALUs for global illumination, SSAA passes, etc. Strictly from shader performance though GCN has more than enough.