What exactly defines the bitness of a CPU has been debated for a long time. There are exceptions to each definition. I have yet to hear one that everyone agrees upon.
<< To put it more simply, IA64 is a relatively complex new CPU design, x86-64 is an improvement/extension of the x86 Athlon architecture. >>
x86 is one of the most complex ISA's that I can think of. With all of the backwards compatibility and strange things like self-modifying code, all of the addressing modes and other assorted quirks, I can't think of anything that comes close in terms of complexity. IA64 is an entirely new ISA that is designed to move complexity (like BTB's) off chip to the compiler. Although the two implementations of the ISA are large, high-end designs, the ISA is actually fairly straightforward.
<< Expect x86-64 to run 32bit code faster b/c it can do it natively, while IA64 will have to emulate it (to the best of my knowledge at this point). >>
IA64 as an ISA has no knowledge of IA32. The two current microarchitectural implementations of the IA64 architecture have fully-backwards compatible built-in hardware support for 32-bit apps, and can actually boot 16-bit DOS if you so desire. These designs are not exactly speed demons compared with consumer oriented IA32 CPU's, but that's a design choice, not a limitation of the ISA.
<< As for 64bit code, that will have to wait and see, but I'll go out a limb here and speculate that x86-64 will be more scalable clockspeed-wise and will start at higher clockspeeds than IA64. >>
That's an interesting supposition and I'm curious on what you base it on. Clock frequency is practically completely independent of every instruction set that I can think of. If you want to clock it quickly, you can by simply putting more latches in the pipeline. IA64 should, in theory, allow longer pipelines by improving branch prediction through the use of predication and branch hints. Since branch prediction is one of the biggest issues that face long pipeline designs, there should be no reason why IA64 couldn't extend to longer pipelines than IA32. It's a design choice currently to minimize the pipeline to improve IPC. In addition, instruction decoding in IA32 CPU is quite complex and requires a significant amount of the pipeline. IA64 instruction decode is less complex and should require less circuitry to execute, so pipeline stages should be able to be optimized out.
<< Reports suggest that IA64 has trouble reaching just 1GHz. x86-64 may do to IA64 what P4 has done to Athlon in the performance arena - beat it with clockspeed. >>
As I mentioned, clock frequency is a design decision that is based on a lot of factors, from cache sizes, memory latency, cache hit rates, branch prediction accuracy, bus width to a host of other considerations. Current IPF designs have emphasized high IPC for various design reasons but there is no reason why the designs couldn't scale higher if this was desired. Most current high-end CPU's are running at approximately 1GHz: PA-RISC, UltraSPARC, Power4, etc. This tends to be the design target of the high-end arena.
Patrick Mahoney
IPF Microprocessor Design
Intel Corp.