<< With IA-64 it's a new ISA so they can make the instruction stream effecient, and then execute in order making the processor effecient.
Everything is effecient, everything is fast. >>
I beg to differ. It might become the case that performance of IA-64 architectures will increase in integer performance with more functional units (ala McKinely), and due to more advances in compiler technology, but I tend to disagree about everything being "efficient, and fast." FP code doesn't need to be OoO quite as badly as integer code, because in general, FP code has fewer branches, and control statements. This is why the Itanium is able to perform so well in FP applications (in part). Yet, if you've looked, it has SPECInt2k scores akin (at like clock speeds) to the UltraSparc III, which turns in some of the lower of the high-end RISC scores.
At one point, there was the joke going around about whether Merceds yields was going to be in dies per wafer, or good wafers per die. Intel has still not divulged the die size, but its not too tough to try to compare it.
Take the Alpha, for instance. Alpha 21264B @ 833mhz, and has a die size ~115mm2, with 15.4 million transistors. Both the Merced and Alpha have the same amount of on-die cache (128Kb). But Merced has 25.4 million transistors. This means that there are ~10 million more transistors in hard logic, which should add quite a bit to the die size of the Merced.
21464B uses commodity 128-bit DDR SRAMs (which run at < 300mhz, for 8.5Gb/sec bandwidth)), rather than the much more expensive custom SRAMs Intel uses for the Merced (which run at 733 and 800Mhz, respective to the chips they are on).
I'm not trying to start a war, but to say that with iA-64 everything is streamlined and more efficient, and fast just doesn't seem to ring true, at least, not yet.
The Int performance of Alpha based systems is dramatically higher than the Merceds, even given similar on-die cache resources, similarly equipped functional units, and with the Alpha have only 32 architectual registers (while Merced has 128). FP performance of the two chips is somewhat comparable, now that the "Spike" tool has been used in the submission of SPEC scores (Spike is somewhat similar to profiling, as far as I can gather, but real time, and an application, not disimilar to Dynamo....but if I'm wrong, someone please correct me). If all manufacturers took the time to use profiling, scores would certainly improve.
All I'm saying is that, right now, IA-64 is NOT efficient (for FP stuff, it is indeed fast). It takes far more die space, and uses more expensive parts for far less Int performnace, and somewhat equivalent FP (I didn't say the parts sold for less, as Merced chips are cheaper, but the costs of manufacturing Merceds is likely higher than Alpha parts. Intel can just amortize the R&D costs over more parts, so it's cheaper).
Given that IA-64 is basically a VLIW machine, and has no OoO, it appears (for now, at least) that the bells 'n whistles that IA-64 uses to make up for NOT being OoO don't do enough to bring it up to the performance afforded by preexisting technologies.
I'm not saying that IA-64 will be bad or inefficient in the future, but it is definetly inefficient now. Simply put, compared to the Merced, 21264B does more with a lot less.