According to AMD, if the applications are CPU Vendor agnostic and use ISAs like SSEs, AES etc, then no recompile is needed and BD architecture will stretch its legs.
If that were true, general performance numbers would be higher.
If this is true, programmers can take advantage of ISA's for both CPU vendors writing a single code for both AMD and Intel
This
has been true for the K6 on up to Stars, but not so much for BD, or at least BD's initial public iteration.
It's not just a matter of ISA, but selecting and grouping sets of instructions and register accesses. Since most of the Windows and Linux world use canned binaries, optimized for prior CPUs, it is important (even 386 compatible binaries may very well be optimized for more modern deeply-pipelined CPUs). Intel's CPUs, starting with the Core 2 series, for example, happily run K8-optimized x86_64 binaries, and do so faster than the K8 itself, most of the time (all of the time, for plain statically-compiled code with no direct human optimizations). Even if it has been hand-tuned or profile-guided for the K8, such that the Core 2, Nehalem, or SB, would be at a disadvantage, it will still run well enough.
Prior AMD CPUs did this just fine, as well. A K6-2/3 could execute 386 and 486 code very fast (I honestly don't recall if there were any substantial differences from K6, aside from speeds). It could execute Pentium int code fast (often faster than a Pentium II). FP was a bit of an issue, but it was good enough for being a cheap CPU. The K7 could execute non-SSE-using PPro/PII/PIII code fast, Pentium code fast, 486 code fast, 386 code fast, and K6 code fast (I'm pretty sure common 286 and older features were being deprecated, by that point, like BCD, looping, etc.). The K8 could do the same, and do very well with up to SSE2 P4 code (or was it SSE3?).
In all cases, tweaking just for a given CPU, be it Intel's or AMD's, can give a major performance boost. But, as long as the instructions were supported, and the executable didn't check for GenuineIntel before it ran the good code, any executable already existing, made for a prior generation, would typically run faster than on the last generation CPU.
This has historically been a strength of the x86 platform, if not necessarily a planned one. Now, in past times, you could expect speed boosts, so slightly lower IPC running code made for older CPUs was OK. The Pentium II, FI, while not initially scaling as fast as Intel hoped, did scale up fast enough that the minor IPC hit running code tuned for Pentiums and 486s was generally a non-issue.
Today, speeds are only inching up, so increasing IPC on existing code is a must. At the least, AMD needed to be significantly faster per clock per thread than Stars. Making a P4-like, MIPS-like, or Alpha-like CPU from pretty much any time past ~2003 should have been a known bad idea. Even after seeing some latencies, I had hoped that AMD had been smart enough to not do that, possibly sacrificing some performance for the sake of speed scaling, since power/speed targets have consistently been a problem, but they apparently did not (IE, add in major IPC improvements, but sacrifice a little to make reaching certain speeds within a certain power envelope easier, because that happens almost every time).