There's no doubt that memory is now much faster. But let's look at a few things
1) Original IBM PC is 4.7 MHz. Core i5 3570K is 3.4 GHz. That's a 723x increase just in clock speed.
2) On the 8088, the fastest instruction was mov reg,reg at 2 cycles. Now we're up to at least a 1440x increase and probably more already since there are many more instructions (especially integer instructions) that effectively execute in 1 cycle on Ivy Bridge
3) Ivy Bridge is also super scalar and has speculative execution.
The above 3 can absolutely give a 3000x increase in performance. Try disabling L1, L2, and L3 caches and run again. Should be interesting.
Don't get caught by the increase in bandwidth. The primary bottleneck today is latency (access time). This access time (measured in nanoseconds) increased since the IBM PC days only 4-5 fold. That's the primary reason why we got one level of cache, then a second and now a third.
These days, one main memory access incurs a CPU wait time up to 200-400 CPU cycles. During the IBM PC days only poorly designed systems had wait states for main memory access.
All the huge performance increases listed by you have one important requirement. The data need to be in the cache. Most benchmarks are so small, that they fit into the cache hierarchy - if your application as well, that's fine. If your data is bigger than the cache than your computational perf drops up to 10 fold (and sometimes even more) due to memory stalls.
Simple example:
Try adding the elements of 2 arrays (each 1 GB large) to a third array.
With 2 x 1600 MHz memory channels (listed by Intel with 25.6 GB/s) the best real throughput is more in the 12-16 GB/s region. If you don't know how to tweak your access pattern, it is rather 8 GB/sec.
To add 2 double precision numbers, 3 x 64 bit memory transfers are necessary (2 reads, 1 write). If you divide 8 GB/sec by 24 Byte, the maximum flop/s your highly optimized LGA-1155 CPU can achive will not exceed with this simple program 333 Mflop/s.
Andy