The data is biased by people self selecting the things they know work. I use Google now only for the types of queries I know it can answer (like the weather). I have tried a variety of things on it and eventually found a guide to all the commands. Once I had that I remembered what I could and couldn't do.
Google Now is far from answering 88% of my searches, instead I suspect 88% of the queries are based on things that people know will work or does so. There is a big difference.
I don't even know if you can make that claim as the methodology for the study is not well-described. Unless I've misread it, it appears as though they simply created a list of different questions which were fed to each of the services and then compared the results they received to what was expected.
The only way in which it might be a bad test is if the questions don't reflect the types of questions typically asked by users of the service. Even then it doesn't make the study useless as it does show that Google has some obvious advantages in terms of their ability to process requests.
However, none of the companies are going to share what kind of requests they're getting, which makes it difficult to compare. You'd need to get a few hundred participants and record all of their voice searches in order to determine what would make a good sample.
Also, self-selecting works in both ways. If Siri or Cortana users know something won't work, they're less likely to use those queries which would artificially increase their success rates. That's why it would be necessary to build a question pool that's platform independent or at least have enough participants so that any differences in the types of questions users of one platform or another typically ask can be identified and investigated.