So, Spec2000 numbers (granted probably not optimally compiled for the A9) show the A9 and the E6700 as close, but with the E6700 winning the majority of tests. Yet, Geekbench shows the A9 crushing the E6700 at stock and matching the very best overclocked score listed. And you still see that result even if you ignore the encryption tests.
Point is, these cross-arch benches aren't exactly reliable or consistent.
Geekbench does have a lot of problems, and really it's even worse going cross platform because they don't actually use the same datasets between the Windows and iOS builds. They scale the scores to calibrate it out but this only works if the difference is linear and repeatably so between uarchs I doubt it's that close. Hopefully this is improved with GB4 (along with taking the accelerated encryption tests out of the main scores...) I probably wouldn't have brought up a GB comparison if not for the thread opening with it.
But SPEC2k has its own problems. They're more subtle but in a way that makes it more insidious. The big one is how much ICC is gamed for it. Now I don't know how much this applies to a current GCC or Clang/LLVM score vs whatever ICC was used for these numbers but back when VIA wrote about it in 2011 the difference was about 20% improvement for their processor and 25% for Atom. That was with both optimized with PGO.
http://www.centtech.com/wp-content/uploads/2014/09/WP1-NanoX2-whitepaper-1-3.pdf
There's a score breakdown and the biggest advantage is seen in 255.vortex and 300.twolf, which were over 50% faster using ICC on Nano. Those are also the two that show the biggest leads in the linked article, very similar ~50% leads actually. That makes it seem pretty evident to me that ICC is optimizing (maybe breaking...) these benchmarks in a way that GCC isn't.
Less obvious but still notable are 181.mcf and 197.parser which both exhibited ~33% leads using ICC. 181.mcf again shows one of the biggest differences.
For all of GB's faults at least it uses similar (same?) compiler versions for the different builds; that still won't ensure optimizations are applied equitably to all platforms but it'll make the difference a lot closer. Compilers like GCC are also a lot less likely to employ benchmark breaking optimizations because they don't have the marketing incentive ICC does. The AEC scores are still really bad and should be ignored, but otherwise I'd rather take a benchmark that has questionable relevance or representation but is fair than one that is relevant and unfair.