For the first part, Anandtech recorded and correlated the following data: power state distribution, frequency distribution, and run-queue depth - which is an exact representation of the number of threads running through the system. Not only did Anandtech measure the right thing, they did so in a comprehensive manner to give us a detailed representation of how the big.LITTLE SOC behaves under Android. Please take your time and read through the article, it really is worth the read.
They measured whether the system doing a given task could use those resources but never measured if it was advantageous to use those resources in the way they were used. (The principle of big.LITTLE is that it is so but this has NOT been demonstrated). In other words there is no data to show that those cores are doing anything useful.
For instance, loading a webpage http://www.anandtech.com/show/9518/the-mobile-cpu-corecount-debate/2
What is interesting to see here is that even though it's mostly just 1 large thread that requires performance on the big cores, most of the other cores still have some sort of activity on them which causes them to not be able to fall back into their power-collapse state. As a result, we see them stay within the low-residency clock-gated state.
In other words the other high power cores are not being shut off despite there being little need for them. Is this power and performance efficient? Don't know - the article never discusses this. Multiple big cores are also used while scrolling. Is this needed? Probably not, much less multiple big cores (could power off the big cores entirely and simply use the small cores (like 2 or so) for tasks like scrolling.
AT's article demonstrates that multiple cores are used for everyday tasks but they have not demonstrated that they are in any way, shape, or form needed.
There is no control group.
I'm not saying this article is useless. AT did disprove that in big.LITTLE cpus the cores are not going completely idle but they did not show that big.LITTLE (or even having that many cores) was in any way advantageous.
For the second part, it seems to me you imply that optimised multithreaded software running on many cores needs proof of efficiency (perf, power, or both). No offense, but for you to claim that in a scenario when the browser can use up to 6-8 threads, using the additional available cluster of power optimised cores does not yield additional efficiency gains is a bit much. You're entitled to your opinion though, maybe we'll get the chance to compare results in a test with the little cores disabled, since that's the only way we can maintain data consistency (keep software and CPU arch/process the same).
Many programs will use quite a few threads - that doesn't mean that that many cores are needed (i.e. a game like battlefield will use dozens of threads but because most of those threads are very low usage there is no gain to using a CPU with more than about 6-8 cores. All those low usage threads can be thrown onto a single core and even then that core is barely active).