GPU Benchmarking Methods Investigated: Fact vs. Fiction

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
Quite an interesting article by hardware canucks on the subject of benchmarking. They look at several variables that affect the resulting scores from benchmarking games.

link

link 2
 

Lonyo

Lifer
Aug 10, 2002
21,939
6
81
Well I guess that resolves the debate in the other thread.
Canned benchmarks are often pretty useless indicators, but well thought out benchmarks can be indicative of performance.

And neither ATI nor NV fanboys can complain about this, because it shows it going both ways. Sometimes in real situations ATI catches up, and sometimes NV pulls further ahead. Which also confirms what I said. You can skew any benchmarks to paint either side more positively, depending on your aims, and it validates HWC's breakdown of benchmarking procedure.

In short: Never trust canned benchmarks until they've been at least somewhat proven accurate.
And always consider the benchmarks being used, settings, and methodology before relying on a conclusion, because benchmarks can be tweaked to favour one card or another.
 

toyota

Lifer
Apr 15, 2001
12,957
1
0
benchmarks affect the cpu differently than actual gameplay too. RE 5 benchmark will make dual core cpus look like they would perform much worse than the actual game.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Bottom line is: any benchmark can be useful (they all measure *something*), but it is up to the benchmarker to understand what the benchmark measures, and how to put the results in the proper context.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
I find many adjust the benchmarks to favor the card that is the subject of the review. They tend to run benchmarks, games, at resolutions and filtering that show the subject card in it's best light. I can understand this because most likely the review sample was supplied with preferences from the supplier on running certain tests specific ways.

Example: The supplier could say, "Make sure you run the tests with high AA. We've really made some major performance gains in our AA." Or the supplier might have said, "The higher the resolution the stronger our cards performance is. Be sure to test it at 2560*1600." Or, "This card eats Crysis for breakfast. Runs 60fps @ 1900*1200. Be sure to show that in your tests." Or, whatever. I'm sure you get the idea.

In all fairness I can understand the card (or any other hardware) supplier wanting their piece shown like this. The last thing you would want is for a reviewer to test your product in a way that it doesn't perform at it's best. If you read more reviews you'll see the strengths and weaknesses of the products. There's so many possible setups, settings and scenarios that there's no way for just one review to tell the whole story, IMO.
 

Lonyo

Lifer
Aug 10, 2002
21,939
6
81
I find many adjust the benchmarks to favor the card that is the subject of the review. They tend to run benchmarks, games, at resolutions and filtering that show the subject card in it's best light. I can understand this because most likely the review sample was supplied with preferences from the supplier on running certain tests specific ways.

Example: The supplier could say, "Make sure you run the tests with high AA. We've really made some major performance gains in our AA." Or the supplier might have said, "The higher the resolution the stronger our cards performance is. Be sure to test it at 2560*1600." Or, "This card eats Crysis for breakfast. Runs 60fps @ 1900*1200. Be sure to show that in your tests." Or, whatever. I'm sure you get the idea.

In all fairness I can understand the card (or any other hardware) supplier wanting their piece shown like this. The last thing you would want is for a reviewer to test your product in a way that it doesn't perform at it's best. If you read more reviews you'll see the strengths and weaknesses of the products. There's so many possible setups, settings and scenarios that there's no way for just one review to tell the whole story, IMO.

http://forums.anandtech.com/showthread.php?t=2079718

All been said already

Which means that you can put forward support for either side and play the value and performance game until you're blue in the face, and argue both sides equally, but at the end of the day, what matters more for an individual user is not that the GTX480 can be on average 25% faster if you use the right benchmark, or that the HD5850 can be faster than the GTX470 if you use the right benchmarks, but look at which card performs best in the specific selection of games you are interested in at your specific resolution using the settings you are most likely to use (e.g. AA/AF levels).

That, more than anything, is what this shows (although it makes it in no way truer than it was before, but it's worth repeating anyway).
It's also why most threads where people ask for a card recommendation break down, because everyone has their own personal preference and can show benchmarks putting forward a certain card to be better value.
 

Genx87

Lifer
Apr 8, 2002
41,095
513
126
Interesting article. Personally I prefer timedemo's. This puts both cards through the exact same paces and then we can compare.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
This puts both cards through the exact same paces and then we can compare.
So do in-game benchmarks, or demo versions, or third-party synthetic benchmarks. We're not really after making sure each card performs the exact same test. We can be pretty sure they do already.

The real issue is: are those tests (which we reasonably know to be repeatable and consistent) actually indicative of real gameplay? That's the reason for the superiority of timedemos, they are more indicative of actual gameplay than built-in benchmarking tools or demo versions of games.
 

scooterlibby

Senior member
Feb 28, 2009
752
0
0
I know many reviewers probably don't the time or know how, but a lot of this stuff could be cleared up with multiple runs and some simple statistical analysis (finding correlation coefficients, running regression analysis, etc.) to help answer questions like, "Do synthetics correlate with real world performance?" or "Does GPU X have a higher impact on FPS than GPU Y?"

For example, with the last question, you could take data from multiple runs of benchmarks (at least 30 benchmark runs per game) and have a dummy variable that takes on the value of 1 if GPU X and 0 if GPU Y. Your dependent variable would be FPS in a a given game. A one parameter estimate model using Ordinary Least Squares regression would be able to determine which intercept was higher and therefore which card was faster in terms of FPS and exactly how much faster it was.


This, to me, is the only way to have any confidence in benchmarks and is also the only way to reasonably protect yourself from 'fluke' results. This is the end of a stats nerd rant, but I really hate how much trust people place in 'one shot' (aka N=1) reviews. Like Canucks said, there are a whole host of variables that can affect performance and that fact should be treated with more sophisticated analysis.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
but I really hate how much trust people place in 'one shot' (aka N=1) reviews.
I don't think people actually "trust one shot reviews". The real problem is that the testing methodologies are largely secret, with the info about them restricted to the title of the game and graphical settings, no real explanation or even revelation of the methodology.

So people just end up trusting that the reviewer did a good job (i.e., formulating a sane methodology), perhaps only to realize later, when the cat is out of the bag, that the review methodology used was subpar (N=1 / "one-shot" type of review).
 

FragKrag

Member
May 27, 2010
99
0
0
Thanks for the link

Nice to see the HWC opened up how it benches so we can actually know if we can trust their results
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,634
180
106
I know many reviewers probably don't the time or know how, but a lot of this stuff could be cleared up with multiple runs and some simple statistical analysis (finding correlation coefficients, running regression analysis, etc.) to help answer questions like, "Do synthetics correlate with real world performance?" or "Does GPU X have a higher impact on FPS than GPU Y?"

For example, with the last question, you could take data from multiple runs of benchmarks (at least 30 benchmark runs per game) and have a dummy variable that takes on the value of 1 if GPU X and 0 if GPU Y. Your dependent variable would be FPS in a a given game. A one parameter estimate model using Ordinary Least Squares regression would be able to determine which intercept was higher and therefore which card was faster in terms of FPS and exactly how much faster it was.


This, to me, is the only way to have any confidence in benchmarks and is also the only way to reasonably protect yourself from 'fluke' results. This is the end of a stats nerd rant, but I really hate how much trust people place in 'one shot' (aka N=1) reviews. Like Canucks said, there are a whole host of variables that can affect performance and that fact should be treated with more sophisticated analysis.

And then of course how much do GPUs lose/win when you have a slower/faster CPU, or X chipset instead of Y chipset, etc.

You will always have to have some "faith" or not on the reviewer, but more information is better.
 

schenley101

Member
Aug 10, 2009
115
0
0
I actually think the way HardOCP reviews cards is the best way for end users. They find the highest settings possible to play at each resolution so you can see which card will give you a better experience not just the raw performance.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |