Because there is really no straightforward way of determining what is or isn't appropriate, so by using a mixture of settings you should hopefully even out any outliers.
because garbage data is garbage data. the only way to account for outliers is to eliminate them from your data set. That is what real statisticians do with outliers.
As I just said above, this is actually a strength of meta analysis not a weakness.
no it isn't see above. Any metanalysis that contains widely variable numbers reflecting identical values is a metanalysis that chooses to include bad or misleading data which no statistician considers a strength. garbage in, garbage out. Further, tossing in outliers within that pool is amateur hour. There is no analysis that "compensates for outliers." This is why you simple toss them out on the high end and the low end. The N of your data set doesn't really matter, because outliers will always skew it.
And how do you determine what is bad data?
Well, that's the real problem, isn't it? It always depends on your subject and the question you are trying to answer. I am a biologist and a lot of our work is bioinformatics, though I'm more a molecular biologist...data kinda puts me to sleep. :\
But I can say that testing manufactured chunks of metal like this that behave based on how humans design them, within set/expected limits, is incredibly uncomplicated and really only requires some extremely basic analysis, doesn't it?
You ask yourself what kind of reporting is relevant to these websites: user experience, right? What is the most valid data that will most accurately inform their purchasing decisions, right? I'm intrigued by the fact that I don't have to deal with living cells or fully developed organisms that for whatever bastard reasons, simply don't behave the way they are
supposed to when I arrest their heart development or inject a construct that, when meeting optimal conditions, should see a population flies developing with legs growing out of their heads instead of antennae. You know, when that doesn't happen like it is supposed to, it is frustrating, but that is life.
This is different. Hardware has human imposed limits blah blah now I've gone off on a tangent.
Here, you take a collection of products from two vendors representing various designs. YOu standardized a class of games that reflect: popularity (Real use) and when not completely reflected in popularity, you add other games with various APIs that can best reflect both types of cards. For each card and each game, you choose the API in that specific usage scenario that maximizes each card's performance. The set variables would be your resolution, quality settings, this and that...it's really quite simple.
A worthwhile N in this kind of test is not total number of games, it would be total number of chips per card tested, for each game. In this case, with each card best representing itself properly in each game (say DX11 is better for nVidia in the same game that DX12 favors AMD--it is wholly unreasonable to test both cards with the same API in that game. That is garbage data), you test multiple chips of the same card in each game benchmark to average out binning issues--just as you have a bit of a process issue in manufacturing I know that I have the same kind of random output when it comes to testing piles and piles of cells or individual mice and flies, or whatever.
Toss in whatever thermal/noise/power draw/OC potential are relevant to each usage period within games
Take TPU (seeing as this seems to be the one people are hating on), they currently have 2 games in their benchmark suite with DX12, but the DX12 implementation for both those games is more or less broken.
Do you still think they should use DX12 in their testing then?
And as I said above TPU already tests in the API that best represents the cards, so what's the problem?
So they chose 2 games with DX12 (while there are currently ~7?), and yet those 2 games have broken DX12 implementation compared to others? Yet, when you see something like this, you suggest that maybe DX12 as a whole is an invalid variable, and not TPU's specific selection of the 2 broken DX12 games...and for what reason would TPU do this?
I am puzzled by the way you interpret this situation. I doubt you will be able to sufficiently explain your reasoning behind that, but I will try.
As for the final, bolded claim: Either you
really aren't paying attention or you are outright lying.
The 2 most obvious examples:
--They chose the known broken implementation of TroTTR in DX12--that is terrible for both cards compared to DX11 in both cards, yet they do it anyway? How does this support your claim?
--They do not test Vulkan in Doom which not only favors both cards above DX11, but highly favors AMD. How does this support your claim?
Further, they still do not use AMD's updated and current drivers which are already known to boost 480 ~3-4% in most cases above the numbers that they still report. Why do you think they do this?