It's also very easy to "cheat" with the SPECmarks by adjusting your compiler settings and operating system parameters. People have been doing this for the past 15 years. Sun, HP, and IBM became very good at pumping up their SPEC92 numbers for selling their Unix workstations.
The good thing about SPEC is that it's actually a composite of several tests. You can compare the results of each subtest to see the strenghts and weaknesses of each machine. Each subtest has a very well documented algorithm it's testing. But to just compare the final composite score is meaningless. A good final SPEC score could mean that you have a blanaced system, or it could mean that you're AWESOME in one test while you totally suck at another.
Also, tuning your system for SPEC does not have the same results as tuning your system for GUI responsiveness, for example.
Then there's the case of the algoritms themselves. Anyone remember the ByteMark? It was created by Byte magazine and was great for comparing computers and workstations in the early 1990s. But when the PowerPC 750 "G3" came out, it had a certain instruction that was quite powerful... and that certain instruction was also one heavily used in the ByteMark. As a result, the G3 benchmarked as being over 2x faster than the Pentium2 in the ByteMark! When in reality they were much closer in performance.
Lies, damned lies, and benchmarks.