I really hope AMD gets competitive soon, because we all know how lazy intel can get when AMD isn't breathing down its neck.
Unfortunately, the more I look at the numbers, I really just don't see it happening, at least short term. Right now, K8 is 20% slower clock-for-clock than Conroe. From the benches I've seen, it looks like Penryn will be another 10% faster than Conroe, meaning that Barcelona needs to have at least a 30% clock-for-clock advantage over K8 just to be competitive clock-for-clock with Penryn. Then, on top of that, Intel looks more than capable of producing chips clocked almost twice as high (3.33GHz) as AMD (1.6GHz so far). If Conroe is any indication, Penryn probably has a similar cushion of frequency headroom, meaning that there will probably be a decent number of Penryns that will be capable of 4GHz, AKA 2x AMD's targeted (and so far unable to produce) launch speed of 2GHz. Maybe AMD will be able to get to a 45nm die shrink sooner than we think, and this will allow barcelona to hit really high speeds, but the results so far are not encouraging.
Then, looking at the Barcelona architecture makes me even more depressed. The caches are ridiculous - did anybody else notice that they actually REDUCED the size of the L1 cache from 128K to 64K, AKA the same as Conroe's? The large on-die cache was one of AMD's biggest advantages in design, and they just threw that advantage out the window. On top of that, the L2 is still only 512K, 1/6 or 1/12 the size of those on Penryn. As for the L3, it is both small (2MB) and slow (friggin level three!!!!!) so I don't see why they decided that that shared L3 is going to answer all their problems. Additionally, AMD seriously needs to step it up with memory bandwidth - socket 939 used almost every bit of its DDR bandwidth, but AM2's DDR2 bandwidth efficiency was piss poor compared to any of Intel's chips. The only advantage I see is the native Quad-core design that doesn't rely on the bus for communication between chips, but benchmarks of current Kentsfields don't really show this to be a huge problem - the bus is fast enough to accommodate this, resulting in maybe a 2-3% performance hit. Of course, this is only an issue with heavily multithreaded apps, and very few programs currently use more than 2 cores simultaneously.