Originally posted by: imgod2u
Originally posted by: imgod2u
Erm, not it doesn't. IA-64 does not support OoOE.
I'm sure you didn't mean it, but superscalar is not synonymous with OOOE...see the Pentium, UltraSPARC III, Alpha 21164, etc. Itanium does share a lot of features with superscalar that are not present in VLIW: dynamic allocation of instructions to execution units, register scoreboarding, and dynamic branch prediction.
As I understand it, superscalar processors all need to be able to support some form of reordering. If you execute 2 instructions in parallel, where normally they would be sequential (in terms of the code, but not neccessarily dependent), then those instructions aren't being executed in program order (as they're suppose to go one after the other, even if they're independent). This would, surely, require some form tracking to be kept, even if it's not as complex as a re-order window.
OOOE universally refers to the "second-generation" superscalar processors that can execute instructions that occur later in the program order that others. The processors I mentioned, among others, are always called in-order.
In-order processors do not need to support re-ordering; they are blocking designs just like Itanium. If an instruction stalls due to a hazard, all subsequent instructions in the program flow stall even if they are independent. Depending on the design, instruction resources hardly even need to be tracked (if at all) while they are in the pipeline, such as with the Pentium. Going to an out-of-order design not only has huge implications on the design of instruction issue and retirement, but also changes the programming model if not handled correctly.
Your reasoning for calling in-order superscalar "OOOE" is that it changes the programming model by allowing otherwise sequential, independent instructions to be executed in parallel. If you're going to use that reasoning, then you're going to have to call a pipelined, scalar design "OOOE" as well, since its instruction-level parallelism technique also changes the architecture's programming model with respect to a non-pipelined, scalar design. The strict program order is not only sequential, but the completion of one instruction before the next is started. Think about how a pipeline changes the programming model in the presence of program interrupts/execeptions and instructions with differing execution latencies.
Read "Architecture of the Pentium microprocessor" (Donald Alpert and Dror Avnon, IEEE Micro June 1993), "Tuning the Pentium Pro Microarchitecture" (David Papworth, IEEE Micro April 1996) and "The MIPS R10000 Superscalar Microprocesor" (Ken Yeager, IEEE Micro, April 1996)...you can probably find them on Google. They give good descriptions of what goes into the implementations of in-order and out-of-order superscalar designs and how they keep the programming model consistent.
It absolutely cannot run instructions which would normally be sequential in parallel (which is what superscalar does).
I don't follow you here...OOOE superscalar cannot arbitrarily execute dependent instructions out-of-order any more than Itanium can. If an instruction stalls in an OOOE processor due to a long latency event such as cache miss, only independent instructions may continue to execute...all dependent instructions stall as well.
Not dependent, but sequential. That is, in program order. Even if instructions are independent, they are still presented in normal assembly programs as a sequence of instructions one after the other. I was pointing out that Itanium doesn't have the ability to take sequential instruction streams and process those instructions in parallel (as it doesn't ever check for dependencies) and relies completely on the compiler to do such things. Thus, it has little, if nothing in common with superscalar designs.[/quote]
That's simply not true....there's a lot more that goes into building an instruction schedule than just finding data dependencies. Like current superscalar designs, Itanium dynamically resolves resource hazards and control dependencies, something that VLIW designs typically don't do.
In addition, a VLIW design might be non-interlocking...since VLIW exposes the structure of the pipeline to the software, this means that the schedule produced by the compiler needs to account for
all data, control and structural hazards that might occur in the pipeline. For example, let's say that a VLIW processor has a 2 cycle latency for multiplies. If a multiply is followed by an instruction that uses its result, it must be scheduled 2 cycles after the load. This presents obvious problems if a later implementation has a 3 cycle multiply latency.
Like superscalar designs, Itanium is fully interlocking...no knowledge of the pipeline implementation is required to build a schedule. Like the previous example, an Itanium compiler would optimally schedule the consuming instruction two groups after the multiply (with independent instructions inbetween). But if a future design had an multiply latency of 1 or 3 cycles, the software would still function...Itanium's scoreboarding would correctly issue the consuming instruction after the multiply had finished executing.