I think, the real way to boost performance in the future will be SIMD-Units.
Take the Cell for example. It reaches 200 GFlop, where a P4 with 3,2 Ghz makes 25,6 GFlops. That's just for Matrix Multiplication, in Linpack it reaches 156 Gflops, where the P4 is at 25,6 too.
At...