Yes we complain about SPEC a lot in the RWT forums - but also agree pretty much universally that "SPEC isn't that great of a benchmark, but everything else is worse".
SPEC2017 at least addressed one of the bigger issues with SPEC2006, that several benchmarks had been "broken" by compilers. AFAIK none of SPEC2017's benchmarks has been broken yet. The biggest remaining issue is that the way it is used in vendor submissions is unrealistic for the way most applications (let alone phone apps) are delivered now - they use the highest levels of the optimization the compiler is capable of, substitute alternate malloc libraries, use feedback directed optimization, etc.
The way Anandtech performs its SPEC runs is actually better IMHO, because they decided to use standard flags like regular developers would use and unless specifically testing overclocking type stuff run CPUs at default settings with default DRAM.
Complain about SPEC all you want, but all other benchmarks are WORSE.
Benchmarks... what's good, what bad? It's a deep rabbit hole that goes on and on Having been in this "game" of following tech for the past 30 years for me the simpler the better for benchmarks. Some thoughts:
1. I don't like benchmarks I can't run on my computer for free. It's hard to verify benches and it's hard to see how my rigs compare, especially if the newer benches haven't been run on older CPU's.
2. I don't like benches that aren't portable. Less mucking up my rig the better.
3. I don't like benches that aren't precise, meaning not a lot of variability in result from run-to-run.
4. I don't like benches that don't show much difference in performance among generations of CPU's. For example, years and years ago CPUmark99 was a widely used CPU bench. But by the time of Haswell just about every CPU performed the same on it as it could not exploit not only multiple cpu's but not even instruction level parallelism.
5. I like benches where I can "see" the work or task being completed.
6. I like benches that are widely published for obvious reasons.
7. Benches are helpful but at the end of the day you've gotta see how various cpu's handle tasks that are critical to you.
For these reasons lately I've been focused on Cinebench R23. It's ticks all of the right boxes for me and my workload. It highlights differences in CPU's, is precise, portable, and gives me a quick indication of the ST and MT "strength" of a CPU. After that I look at specific application results that are applicable to my workflow. Yes, CB is not perfect, no bench is but as I said it's subjective what bench a specific person will "take to heart."
Early CB R23 leaks showed Alder Lake being very strong ST and as good as the 5950X in MT. That turned out to be a pretty good average of how things worked out over broad application testing.
The thing is about leaks is that the rarely divulge clocks, which makes them just about meaningless, especially if the scores are low. If the scores are really good then we can assume highish ~5GHz clocks and get a good idea of performance.