[The problem is that many tasks are single-threaded tasks. Those tasks that are parallel, eventually run out of work for multiple cores to do. In the supercomputing world, the solution is simple: devise a bigger problem to solve. That solution does not work in the real world for most people on PCs.]
What do you mean by this? The workload is not a batch. It is an interactive event-driven workload. If you follow this logic then you could argue that no automobile needs more than 45 horsepower, because that is enough to get it to freeway speeds. The point is not whether there is always work to do, but how responsive the system is when there is more instantaneous demand than one CPU can handle. This happens constantly under Windows or any other time-slicing operating system. Don't confuse this with percent processor utilization. Even when the processor is utilized at low levels it can still only run one set of instructions at a time. With a single core no two apps can have processor cycles at the same time. With more than one core, they can. That means a more responsive system.
I have well over 100 threads running on my system now. Let's say I have 100 of them. If I have 100 cores, then that system will be as responsive to load as it can possibly be at the rated speed of the processor. One less core will make it less responsive, as two threads will contend for processor cycles at some point. Yes, that difference will be very small in this theoretical case, but the difference between one processor and two is not small at all. On average it means half the amount of thread contention as in a single core system, and half the context switches.
Context switches are very expensive in terms of processor cycles. All the registers have to be swapped, along with any thread-local memory mappings. If the context switch is between two processes, then memory mappings always have to be swapped.
All this arguing that multiple processors don't benefit an ordinary Windows user verges on silly. In general, the average Windows user will have a smoother computing experience with two 1 ghz processors as opposed to one 2 ghz processor. For games this wouldn't hold, but when the speed differential narrows to a few hundred mhz. dual cores catch up fast, and provide better response.