Look, we've done enough talking. There are some pretty straight-forward ways to test most of the 'theories' being floated around here.
To sum up, there are a couple of fundamental questions to be answered:
1) Why does the 9600GT perform as closely as it does to the 8800GT given the extreme reduction in shaders.
2) How heavily are most modern games linked to shader (versus texture) performance.
With regard to question #1, two theories are being floated. First, that the 9600GT is simply a cut-down, but tweaked G92 with higher clockspeeds, and that it performs closely with the 8800GT because the extra shading power of the 8800 just isn't required for a lot of games. Second, that the G92's architectural improvements amount to more than a few simple 'tweaks' and that the existing shaders are just more efficient somehow.
This is an easy question to answer as long as we have two people, one with a 9600GT and another with a G92 (any flavor except GT 256MB), who are willing to run benchmarks. First, disable enough 'units' on the G92 to make it even with the 9600GT in terms of shaders and texture units. Then equalize the clocks (core, shader & memory). This should yield parts with the same number of shaders and texture units, with equal clocks. We benchmark and look at the results.
To address question number two is simple as well. Simply take an 8800GT 512MB and reduce the shader clock in increments equal to approximately 1/7th the maximum value, then compare the results against disabling one 16sp 'unit' at a time. There are seven 16 shader 'units' in the 8800GT. Reducing the shader power by 1/7th the total value should be equal to disabling one of the 16 shader units. The only difference between the two will be the fact that texture units are reduced when actually disabling units, while they will still be active when only reducing the shader clock speed.
Voila, you have a way to compare both the 9600GT's architecture to G92, and you have a way to look at texturing power versus shader power. It's not perfect, as shaders and texture units are tied, but you can easily get a number of useful combinations.
You could, for example, disable three units one time, and four units another, while (when disabling four units) ensuring that you have a 1/7th greater shader clock speed. This will have a part with equal shader power, but with slightly fewer texture units.
Let's stop talking and start benching. I've got a 8800GT 512MB and a monitor that goes up to 19x12, and I'm game. Does anyone have a 9600GT with a monitor going up to 19x12?