Originally posted by: imgod2u
Originally posted by: Sohcan
The template isn't necessarily for decoding purposes, but for instruction dispersal, when instructions are assigned to issue ports. Allowing any instruction to issue to any issue port will increase the fan-out on most of the instruction buffer slots by quite a bit. Keep in mind that most integer ALU instructions can go in an M slot or an I slot...I really don't think allowing any instruction to occupy any slot will decrease the code size by any noticeable amount, as the current templates cover the possibilities quite well. 8 of the 32 possible templates are still reserved, so there is room for expansion if necessary.
Shouldn't assignment be already known after decoding? I mean, an integer instruction is an integer instruction, a load/store is a load/store and an FP instruction is an FP instruction. Why would you need templates to do this?
Well, strictly speaking, this isn't possible on Itanium. The instruction operation is defined by
both a 4-bit opcode in each instruction, in addition to the template...you can't look at the instruction alone to determine what it does. Assuming that this wasn't the case, it's likely easier to look at the 5-bit template and
know how to disperse the instructions to appropriate issue slots, rather than decoding each instruction opcode and deciding how to issue the instruction. I'm not positive (I can find out ), but I think that the instruction opcodes are not looked at in the instruction dispersal stage...the template alone should be enough.
The template also serves to indicate if/where the stops occur in the bundle, which is especially important in the few cases where the stop is not at the end of the bundle. I'm guessing that the template information is pretty important in enabling the three instructions to fit in a 128-bit bundle. If the 5-bit template were removed, and four bits were added to each instruction opcode (3 to encode the instruction type, 1 to indicate the presence of a stop), the bundle would be 135 bits.
Does the concept of SIMD contradict RISC principles?
Personally, I consider RISC/CISC to be orthogonal to SIMD/SISD and VLIW/sequential architecture. The first VLIW architectures were obviously CISCy, since they predated RISC. Despite that Itanium's instruction set architecture packs in a lot of things, its instruction atoms are more RISCy than CISCy, and even takes some RISC principles to an extreme. It has a large number of general-purpose registers, the instructions are relatively simple (all ALU and integer operations take one cycle to execute, all FP operations take four cycles), it has only one addressing mode, and the instructions are fixed-length and have a relatively few number of regularly composed instruction formats.
SIMD/VLIW are by nature a 'complex' instructions, but does that mean that a true RISC chip shouldn't have SIMD at its disposal?
Some would probably say that the simple SIMD implementations we've seen aren't necessary on RISC processors, given their better floating-point architecture compared to x87. Given Itanium's FP performance using its scalar architecture, and that its multimedia SIMD architecture was almost an afterthought, I kind of agree with this view...with a well-designed floating-point architecture, a simple SIMD extension shouldn't be necessary. Of course, a "true" vector architecture, ala Cray with long 64+ element vectors, and a beefy memory system to support it, can do wonders for scientific computing and linear algebra routines....but it's purpose is otherwise pretty limited.
On the other hand, all the major RISC architectures (PA-RISC, POWER, SPARC, MIPS, I don't know about ARM) have had SIMD extensions, some even before x86...but I wonder how much of the motivation was just a fad, and if the extensions are widely used.