You’re conflating “lots of data” with “data collected and presented in a statistical analysis sound way”.
That’s a mistake. One does not make the other true.
Sampling the data is the easy part.
This is also fairly trivial data, to present. You are just reporting percentage of each category, that you recorded. It doesn't need to be adjusted for presentation, there aren't a multitude of confounding factors like if this was trying to present the impact of eating processed meat on health. This just straight up reporting of objective data.
Plus we aren't using it in a scientific paper, so we don't need to know confidence intervals, sample size. Though given the month to month consistency of results (with one exception noted below), the sample size is almost certainly adequate, and given the ease of data sampling, there is no pressure at all to under sample.
This is barely Stats 101 stuff, it's among the simplest, least confounding data to collect and present. Valve has been at this for near two decades. I think they can handle it.
The Steam Survey is going to be MUCH more accurate than most surveys that usually involve getting somewhat subjective, or memory dependent responses from humans, where how you ask questions can swing results significantly, where some questions may be sensitive and skew results.
This is just easy objective, automated, data collection, and presentation.
It's not perfect, but
to pretend those imperfections make it useless is preposterous.
The main issues seem to be a mechanism to handle shared computers (Internet Cafe's), which it recently had a one month glitch with, and had a similar issue years ago. This is the only real snag in an otherwise trivial sampling and reporting effort.
Other than that it comes down to
whether there is a reason to distrust Valve... I don't think there is.
You sometimes see a similar negative response to Steam Survey on Linux forums: Steam Survey is biased against them and under-representing them. There is ZERO evidence that is happening, it's just that some in the Linux community don't like the results and want to shoot the messenger. Valve is a massive Linux proponent, and they have been trying for years to increase Linux gaming, so if they were inclined to color the results, it would be over-representing Linux.
So again, it's just distrust based on disliking the results. Even if Valve published detailed methodologies, it would still be faced with the same distrust from those who don't like the results, because Valve could simply be lying and manipulating the output. Then what, demand third party auditors in to check everything? Only then would looking at Steam Survey make sense. Some still wouldn't because how can you trust the third party auditors, when we can't know Valve didn't bribe them after all...