Power used by a CPU is proportional to the capacitance of the chip, multiplied by the frequency, multiplied by the square of the voltage. It takes higher voltages to achieve higher clocks, so not only is your power consumption going up linearly with the frequency, but it also is going up quadratically with the voltage. For this reason, it is actually better for power reasons to use lower frequency multi-core architectures (increasing your chip capacitance) than a single high frequency core.
It's also better for performance, like Roland mentioned. The higher your frequency, the more dependent you are on your cache hit rate to maintain performance. I can write a program where an 8 GHz core will have the same performance as a 1 GHz core, if it's constantly engaged in, for example, pointer chasing with lots of memory-level parallelism. In that scenario, a chip with 2 4 GHz cores will actually have 2x the throughput of the single 8 GHz core. These types of programs are not unusual. GPUs take this idea to the extreme, with latency hiding through massive threading.
Also, Amdahl's law is not that big a deal when it comes to multicore performance, IMO, unlike what Shintai would have you believe. It will always lose out to Gustafson's law in important applications, IMO. In the real world, people care about throughput.