One thing I find mildly disurbing- 85% of reviewers are still using Intel's receommended scene to benchmark LightWave.
I found it fascinating when in an article at Aces Hardware alternatives scenes were benchmarks and the AXP jumped from 20% slower to above 5% faster then equivalent P4's.
I'm not picking out AnandTech when I specify LightWave, as I've no idea what scene(s) AnandTech may be using.
PhotoShop 5/6/7 is another benchmark whose results can vary DRAMATICALLY depending upon which filters you test, and on what scene.
If you know what your doing you can pretty easily pick out the appropriate filters to show an AXP as crushing the P4 or vice versa, you can pick out a decent set of filters that perform extremely well on a G4.
Hell, given enough time one can pick out a handful of filters in which even the VIA C3 performs respectably.
One has to be very cautious on which filters are applied and on what scene if you wish to fairly benchmark PhotoShop performance.
Even worse- awhile back some sites started benchmarking nBench 1 as a supposedly reliable indicator of processor performance.
While nBench isnt as blatantly biased as one might expect is still clearly and strongly favors AMD- as one would expect given that it was designed by AMD.
AnandTech's server benchmarks use an actual script taken from a typical "day in the life" of the web, database, and forum servers. This script is replayed against an old database using the hardware in question. Such a method is about as real-world as you can get, and highly relevant.
I agree, I wholeheartedly applaud AnandTech for their server benchmark based on recording from their own Web Server. That was a 100% true real world test of performance, and an extremely valuable benchmark.
That one benchmark I put more faith in then virtually any benchmark AT has ever used.
I've only briefly looked at CSA Research's OfficeBench, and have precious little first hand experience with it and what it does so I've no idea whether it's a suitable candidate to use it reviews but it sounds as though it may be worth looking into at least.
Funny that it needed to be "registry patched" to enable SSE support on the Palomino, when the software ought to detect such instructions independent of the CPU make or model. Funny too that Intel chips have almost consistently dominated the benchmark for the past two years.
I don't believe that was an example of bias at all, the version of WME used in SysMark 2001 was a perfect example of poor coding that only queried Intel processors for SSE capabilities.
The retail version experienced the same problem.
It's not Bapco's fault, but poor coding practices on the part of MS. Bapco merely happened to use the latest version of WME available at the time, which happened to have the problem.
What I do find amazing is that simply making sure that the Athlon's SSE units are recognized pushes the Sysmark 2001 Internet score up with 18%!
The Sysmark 2001 Internet score is a combination of SIX applications. Simply improve the Media encoder performance somewhat (Most of the time SSE improves the scores of application by 5-30%) and you get a 18% boost even though the rest of the benchmark does not change at all. It's not hard to see that WME must have had an INCREDIBLY high weighting for one application to have such a huge impact.
My question would be why did Bapco put such a hefty weight on WME performance?
Why even include WME encoder at all? According to research by eTesting Labs very few people reported using WME. I resume Bapco must have done some research but they've never publicized any.
To me that has always spoken volumes about how poorly designed SysMark 2001 was.
WME is still weighted WAY to heavily in even SysMark 2002 Ifrom what I've seen. I recall Bapco used to have WME run in the background throughout the entire test, I wonder if that's still true for SysMark 2002.
Funny too that Intel chips have almost consistently dominated the benchmark for the past two years.
In my experience Bapco SysMark has favored Intel chips through revision 98/2000/2001/2002. I did find it quite interesting that a new version of SysMark was released almost immediately after AMD processors caught up to the competition from Intel in the previous version, and the next revision always dramatically favored Intel processors initially.
That said, it may be a simple aspect of progression and for the last few years Bapco has revised their benchmarks on a fairly regular basis.
Hence I do not believe it is an indication of bias.
My reasons for disliking Bapco is that I find their testing methodology to be of dubious merit, and their tests do not accurately reflect the manner in which people typically multi-task.
Thair almost complete lack of documentation on the operations and inner working of SysMark and lack of any documented research into the application usage scenarios of people.
This combined with the potentially dubious background makes SysMark of very questionable reliability.
Winstone was once a viable alternative but they've let the suite age, and the majority of the tests are no longer applicable for modern software.