Originally posted by: awesomedude
For all the people claiming that crysis is one game, and that because HardOCP did not show the same results for other games that they are wrong. In scientific theories it only takes one case to prove an entire theory wrong, even if there are million cases that support the theory all you need is one case where the theory is wrong to invalidate the entire theory. Scientific method/theory doesn't work on a only when its convenient for us basis.
There is a big *if* that you really aren't mentioning here about scientific theory only requiring a single failure: that failure must be reproducible by anyone, given the exact same testing methods and materials. I think apoppin is rightly saying to Kyle: if you want us to believe your disproving test over numerous other, contradictory tests, we need to be able to reproduce it.
Originally posted by: awesomedude
Then there are the people who are claiming they need 100% repeatable proof before they are convinced of anything, but then they turn around and make absurd claims that Kyle probably picks the best run for card a and the worst run for card b, or picks a scenario to favor card a over card b without any sort of proof. If you guys are going to stick to your proof guns at the very least don't make up stuff about Kyle cause you don't like him calling AT out. I am not saying Kyle is right for calling AT out, but for you guys to say Kyle calling out AT was wrong and unprofessional and then to turn around and say that Kyle picks and chooses data to fit the results they want, without any sort of proof, is outright hypocritical.
I don't think many people here are claiming that they know with certainty that Kyle *is* manipulating benchmarks. Most posters on this thread have noted, however, that his method would allow for that because it is not at all transparent. Again, if you want credibility in the scientific world, your methods and materials must be transparent to everyone examining the data.
Originally posted by: awesomedude
To the guys discussing population and samples, the population would be all the frames one could create in the entire game, or at least that level, and Kyles run-through would be a sample of that. Now granted both samples are likely different and it is impossible to recreate the exact same sample by hand. But two random samples from the same population should give the same averages. Now I understand they aren't taking random samples, but we can conclude that two similar samples from the same population, will give similar results.
The key here is knowing that the samples do not suffer from selection bias. The only way to confirm the presence or absence of selection bias is to submit your selection process to independent observers for critical review. Again, this is apoppin's point.
Originally posted by: awesomedude
To the people saying that because HardOCP compare apple to oranges that it is impossible to distinguish performance differences because you could run card a at 600 x 800 and card b at 1600 x 1200 and get the same frame rate and determine that the cards are equally fast is again absurd. First HardOCP determines the "fastest" card as the one that can play at the highest settings IE: 1600 x 1200 would be pronounced a faster card then one run at 600 x 800. HardOCP doesn't rate the fastest card by the best average frame rates in their apple to oranges comparison like everyone seems to going off about. Since HardOCP gives its readers all the setting information readers can come to their own conclusions IE: card a has an average fps of 20 and card b has an average fps of 20 at the same resolution, but card b is running with 4xAA and 16x AF while card a is running no aa or af, anyone can easily see that card b is the faster card. There is usually enough information present that you can tell which card is faster even though they are using apple to orange comparisons.
I would agree in principle, but the real question is: has the test been conducted in an appropriate fashion. If so, then extrapolating results becomes possible. If not, then the initial results themselves, to say nothing of any extrapolations, are suspect.
Originally posted by: awesomedude
Also note on this page that they describe the exact map they choose to play, and some of the various effects, affecting the graphics card, and the length in which they played. This is for all the people claiming they chose specific effects for one video card over another, and for those saying they probably just played for 10 seconds. I understand that this isn't a save point or a video but it does give a lot of data about their run through the game. Someone could easily go to the selected map, and make a 10 or so minute run through containing the listed effects and if the data gotten was drastically different in that scenario then HardOCP's we could conclude that something was wrong with someones data.
Simply stating the map played, effects seen during play, and duration of play is wholly insufficient for reproducing a test, either to confirm or deny its results. I can do a run through Oblivion, describing the area I'm in (Kvatch), some of the effects I'm seeing (the Oblivion Gate), and how long my run is (10 minutes). If I run loops around the Oblivion Gate (which stresses the card heavily) for 10 minutes, that is a vastly different test than spending 9 minutes running around the refugee camp at the bottom of the hill (not taxing) and only 1 minute running around the Oblivion Gate. Again, to repeat the test, apoppin rightly calls for a video of the run, so we can attempt to repeat it.
Originally posted by: awesomedude
Also making demands that someone do something or all there data is false, does not make it so. And if all data was held to the meet my demands or your data is no good standard no data would be good, as everyone would have ever increasing demands. While I would love to see more openness from HardOCP reviews, including save points and videos of the run through, the fact that they don't doesn't make there data somehow false. Making demands on how things be done on someone else's forum, and making threats (if you don't do it, your a bunch of pussy's/liars/corporate whores/etc...) has never been a good way to get things done the way you want them.
...
I realize that this is biased in favor of HardOCP. Also I know I gave Kyle the benefit of the doubt in this thread, but it only seems fair to me to give someone the benefit of the doubt unless there is proof stating otherwise.
Forgive me, but you seem to be missing the point. While making unfounded accusations against someone doesn't make their data false, nor does it make it true. The entire point of the scientific 'method' is to attempt to reduce the necessity of simply 'trusting' one person and their results. There is no 'benefit of the doubt' in science. Unlike a criminal defendent in the law, where the burden is on the prosecution to prove guilt beyond reasonable doubt, the burden on the scientific community is to continue to doubt all results, until they have been confirmed over, over, and over again. Kyle doesn't get a 'benefit of the doubt' as a scientist, though he certainly deserves one as a human being (as do we all). We have to separate these two things.
The final 'fact' in this case is that, according to everyone's admission, Kyle's benchmarking method is not transparent; thus, all squabbles over minor variances of each run aside, his results can't even begin to be validated through additional testing. Thus, he is essential asking the community to take his results on faith--faith in his own integrity.
In a world (hardware review sites) driven by page-hits and the advertising revenue they generate, I am "disinclined to acquiesce to his request."
Cheers.