BFG:
You constantly express your belief you feel that two products should be benchmarked with identical settings. On the surface, I have no problem with that; however, there are other considerations to bear in mind, IMO.
Let's make a quick analogy here. Two video cards are two machines in factory that produce different manufactured items, depending on their settings (a blue nail or red nail, long or short). Now, these two machines come from two different companies, and of course each company wants to sell your factory their machine and they backup their sales pitch with lots of PR/marketing stuff. So your trusted factory QA guy goes through and makes sure each machine is set identically and they are then ran for, let's say, an hour each. At the end of that hour, the number of nails each produced is counted. The machine that produced the most will obviously be more enticing to the factory manager and his final purchase decision. That's pretty much the extent of your argument.
Here's where my problem is. NO ONE BOTHERED TO CHECK TO SEE IF THE NAILS THEMSELVES WERE IDENTICAL!!!! You see, my contention is this: settings are just that, settings, and while our QA guys (hardware reviewers) are so busy making sure their test settings are identical they don't check to see whether or not the finished product is the same!! Case in point: Nvidia's texture compression problems all year long! We all read the benchmark scores, tested with identical settings (except for the V5's inability to do trilinear filtering, which made it unfair to compare to the Nvidia cards), and that's probably what stuck in most people's minds. . .those numbers and how the GTS ruled the roost over the competition this summer. What didn't come to light, until somewhat recently, is that the final product of those settings was in no way, shape, or form similar enough to make the outcome of those settings fair!!! I cannot stress this enough! Almost every single reviewer out there utterly failed to comment on the image quality problems the GTS was creating with texture compression enabled.
So what Dave is arguing for is, IMO, quite sound, fair, and logical. Consumers are buying the finished, boxed product, and drivers are a factor in that overall purchase. The fact that the V5 could take advantage of its texture compression in a really nasty benchmark like Reverend's Quaver demo without causing the final product (displayed image) to be degraded in its overall quality while the Nvidia boards could not means that the focus of a reviewer's attention, the onus of benchmarking, should therefore be shifted towards the final product. And by using Nvidia's texture compression problems, I'm not trying to bash Nvidia or play favorites for 3dfx. It's just a great example to show how focusing on equivalent settings as the crux for supposed objective testing completely fails to take into account the final product of such testing. These products are not identical in how they perform, or in their hardware implementations, so blindly assuming that identical driver settings will result in identical hardware operations is a fallacy. Had reviewers been paying closer attention, they might've noticed that the V5 wasn't performing trilinear filtering in Q3 even though the game allowed it to be used as a software setting.
Now, you're going to come back and ask: what else can be used as a baseline for fair and objective testing? To be honest, I'm not sure, I honestly don't know, and things are only going to get worse next year as reviewers scramble to determine which company will have the most powerful, useful, and flexible T&L for games. Will they stupidly use proprietary, synthetic benchmarks (like Treemark) or will they use only 3DMark 2001?