GPU Benchmarking Methods Investigated: Fact vs. Fiction

Cookie Monster · Jun 16, 2010

Quite an interesting article by hardware canucks on the subject of benchmarking. They look at several variables that affect the resulting scores from benchmarking games.

link

link 2

thilanliyan · Jun 16, 2010

Good. I'm glad at least HWC is going to give all info on their benchmarking process.

Lonyo · Jun 16, 2010

Well I guess that resolves the debate in the other thread.
Canned benchmarks are often pretty useless indicators, but well thought out benchmarks can be indicative of performance.

And neither ATI nor NV fanboys can complain about this, because it shows it going both ways. Sometimes in real situations ATI catches up, and sometimes NV pulls further ahead. Which also confirms what I said. You can skew any benchmarks to paint either side more positively, depending on your aims, and it validates HWC's breakdown of benchmarking procedure.

In short: Never trust canned benchmarks until they've been at least somewhat proven accurate.
And always consider the benchmarks being used, settings, and methodology before relying on a conclusion, because benchmarks can be tweaked to favour one card or another.

toyota · Jun 16, 2010

benchmarks affect the cpu differently than actual gameplay too. RE 5 benchmark will make dual core cpus look like they would perform much worse than the actual game.

Scali · Jun 17, 2010

Bottom line is: any benchmark can be useful (they all measure *something*), but it is up to the benchmarker to understand what the benchmark measures, and how to put the results in the proper context.

3DVagabond · Jun 17, 2010

I find many adjust the benchmarks to favor the card that is the subject of the review. They tend to run benchmarks, games, at resolutions and filtering that show the subject card in it's best light. I can understand this because most likely the review sample was supplied with preferences from the supplier on running certain tests specific ways.

Example: The supplier could say, "Make sure you run the tests with high AA. We've really made some major performance gains in our AA." Or the supplier might have said, "The higher the resolution the stronger our cards performance is. Be sure to test it at 2560*1600." Or, "This card eats Crysis for breakfast. Runs 60fps @ 1900*1200. Be sure to show that in your tests." Or, whatever. I'm sure you get the idea.

In all fairness I can understand the card (or any other hardware) supplier wanting their piece shown like this. The last thing you would want is for a reviewer to test your product in a way that it doesn't perform at it's best. If you read more reviews you'll see the strengths and weaknesses of the products. There's so many possible setups, settings and scenarios that there's no way for just one review to tell the whole story, IMO.

Lonyo · Jun 17, 2010

3DVagabond said:
I find many adjust the benchmarks to favor the card that is the subject of the review. They tend to run benchmarks, games, at resolutions and filtering that show the subject card in it's best light. I can understand this because most likely the review sample was supplied with preferences from the supplier on running certain tests specific ways.

Example: The supplier could say, "Make sure you run the tests with high AA. We've really made some major performance gains in our AA." Or the supplier might have said, "The higher the resolution the stronger our cards performance is. Be sure to test it at 2560*1600." Or, "This card eats Crysis for breakfast. Runs 60fps @ 1900*1200. Be sure to show that in your tests." Or, whatever. I'm sure you get the idea.

In all fairness I can understand the card (or any other hardware) supplier wanting their piece shown like this. The last thing you would want is for a reviewer to test your product in a way that it doesn't perform at it's best. If you read more reviews you'll see the strengths and weaknesses of the products. There's so many possible setups, settings and scenarios that there's no way for just one review to tell the whole story, IMO.

http://forums.anandtech.com/showthread.php?t=2079718

All been said already

Which means that you can put forward support for either side and play the value and performance game until you're blue in the face, and argue both sides equally, but at the end of the day, what matters more for an individual user is not that the GTX480 can be on average 25% faster if you use the right benchmark, or that the HD5850 can be faster than the GTX470 if you use the right benchmarks, but look at which card performs best in the specific selection of games you are interested in at your specific resolution using the settings you are most likely to use (e.g. AA/AF levels).

That, more than anything, is what this shows (although it makes it in no way truer than it was before, but it's worth repeating anyway).
It's also why most threads where people ask for a card recommendation break down, because everyone has their own personal preference and can show benchmarks putting forward a certain card to be better value.

Genx87 · Jun 17, 2010

Interesting article. Personally I prefer timedemo's. This puts both cards through the exact same paces and then we can compare.

jvroig · Jun 17, 2010

Genx87 said:
This puts both cards through the exact same paces and then we can compare.

So do in-game benchmarks, or demo versions, or third-party synthetic benchmarks. We're not really after making sure each card performs the exact same test. We can be pretty sure they do already.

The real issue is: are those tests (which we reasonably know to be repeatable and consistent) actually indicative of real gameplay? That's the reason for the superiority of timedemos, they are more indicative of actual gameplay than built-in benchmarking tools or demo versions of games.

scooterlibby · Jun 17, 2010

I know many reviewers probably don't the time or know how, but a lot of this stuff could be cleared up with multiple runs and some simple statistical analysis (finding correlation coefficients, running regression analysis, etc.) to help answer questions like, "Do synthetics correlate with real world performance?" or "Does GPU X have a higher impact on FPS than GPU Y?"

For example, with the last question, you could take data from multiple runs of benchmarks (at least 30 benchmark runs per game) and have a dummy variable that takes on the value of 1 if GPU X and 0 if GPU Y. Your dependent variable would be FPS in a a given game. A one parameter estimate model using Ordinary Least Squares regression would be able to determine which intercept was higher and therefore which card was faster in terms of FPS and exactly how much faster it was.

This, to me, is the only way to have any confidence in benchmarks and is also the only way to reasonably protect yourself from 'fluke' results. This is the end of a stats nerd rant, but I really hate how much trust people place in 'one shot' (aka N=1) reviews. Like Canucks said, there are a whole host of variables that can affect performance and that fact should be treated with more sophisticated analysis.

jvroig · Jun 17, 2010

scooterlibby said:
but I really hate how much trust people place in 'one shot' (aka N=1) reviews.

I don't think people actually "trust one shot reviews". The real problem is that the testing methodologies are largely secret, with the info about them restricted to the title of the game and graphical settings, no real explanation or even revelation of the methodology.

So people just end up trusting that the reviewer did a good job (i.e., formulating a sane methodology), perhaps only to realize later, when the cat is out of the bag, that the review methodology used was subpar (N=1 / "one-shot" type of review).

FragKrag · Jun 17, 2010

Thanks for the link

Nice to see the HWC opened up how it benches so we can actually know if we can trust their results

GaiaHunter · Jun 17, 2010

scooterlibby said:
I know many reviewers probably don't the time or know how, but a lot of this stuff could be cleared up with multiple runs and some simple statistical analysis (finding correlation coefficients, running regression analysis, etc.) to help answer questions like, "Do synthetics correlate with real world performance?" or "Does GPU X have a higher impact on FPS than GPU Y?"

For example, with the last question, you could take data from multiple runs of benchmarks (at least 30 benchmark runs per game) and have a dummy variable that takes on the value of 1 if GPU X and 0 if GPU Y. Your dependent variable would be FPS in a a given game. A one parameter estimate model using Ordinary Least Squares regression would be able to determine which intercept was higher and therefore which card was faster in terms of FPS and exactly how much faster it was.

This, to me, is the only way to have any confidence in benchmarks and is also the only way to reasonably protect yourself from 'fluke' results. This is the end of a stats nerd rant, but I really hate how much trust people place in 'one shot' (aka N=1) reviews. Like Canucks said, there are a whole host of variables that can affect performance and that fact should be treated with more sophisticated analysis.

And then of course how much do GPUs lose/win when you have a slower/faster CPU, or X chipset instead of Y chipset, etc.

You will always have to have some "faith" or not on the reviewer, but more information is better.

3DVagabond · Jun 18, 2010

Lonyo said:
http://forums.anandtech.com/showthread.php?t=2079718

All been said already

I'm talking about reviews tend to be slanted towards the card that is featured in the review. Not exactly the same thing as you quoted. Both apply though.

schenley101 · Jun 18, 2010

I actually think the way HardOCP reviews cards is the best way for end users. They find the highest settings possible to play at each resolution so you can see which card will give you a better experience not just the raw performance.

GPU Benchmarking Methods Investigated: Fact vs. Fiction

Cookie Monster

Diamond Member

thilanliyan

Lifer

Lonyo

Lifer

toyota

Lifer

Scali

Banned

3DVagabond

Lifer

Lonyo

Lifer

Genx87

Lifer

jvroig

Platinum Member

scooterlibby

Senior member

jvroig

Platinum Member

FragKrag

Member

GaiaHunter

Diamond Member

3DVagabond

Lifer

schenley101

Member

TRENDING THREADS