IA64 is not like x86-64?

sprockkets

Senior member
Mar 29, 2001
448
0
0
I don't get this. I've been hearing that IA64 dumps all legacy support of previous programs. Yet, it also runs any program that we have today. In fact, it has a IA-32 mode for backwards compatibility with older stuff, just MUCH slower. So why are people yacking at AMD for supporting current 32 bit programs while Intel is doing it too?

Of course, IA64 is nothing like IA32 in terms of how the instructions work, but the legacy support is still there. Isn't that just like the 32 bit mode of the hammer?
 

BurntKooshie

Diamond Member
Oct 9, 1999
4,204
0
0
Because x86-64 is a straightfoward 64-bit extension of 32-bit x86 - in much the same way that Intel made the 32-bit extension to the 16-bit x86.

AMD's basically going to (well, they had better if they want to continue as an independant company) give FAST 32-bit x86 support, AND 64-bit x86.

The 32-bit mode of hammer series is different than IA-64.

People say IA-64 dumps previous legacy support in the sense that no one in their right mind would bother using it, yet x86-64, at least initially, will probably be used much more in 32-bit mode than in 64-bit mode (where it should be faster, in part due to 16 GPRs). Of course, pointers will take up 8 bytes instead of 4, but that's not a huge deal, really. Basically, x86-64 is a much "smoother" transition to 64-bit computing
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
Yeah, think of x86-64 like the transition to 32-bit addressing with the 386...it could handle both 16-bit and 32-bit x86 code well. The Itanium, on the other hand, doesn't directly execute x86 instructions....I believe it attempts to translate x86 instructions into the explicitly parallel IA64 long instruction words. Since the Itanium is an in-order processor, it probably has trouble extracting instruction-level parallelism out of x86 code, so performance suffers.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
its in-order? any reason for that? anything with a pipeline needs OOO execution for best performance
 

br0wn

Senior member
Jun 22, 2000
572
0
0


<<
its in-order? any reason for that? anything with a pipeline needs OOO execution for best performance
>>



There are two reasons for choosing in-order over out-of-order for
a superscalar processor:
1. Preserving sequential consistency for out-of-order is MUCH
HIGHER EFFORT than for in-order.
2. The use of shelving leaves very little motivation
to use out-of-order. Since with shelving, the issue of
instructions is rarely blocked. Hence, out-of-order
would only offer a marginal benefit.

As for the Itanium's case, hmm...isn't it a VLIW processor,
meaning that compiler is responsible for finding enough
parallelism (doesn't make sense for out-of-order, does it).
 

Mday

Lifer
Oct 14, 1999
18,646
1
76
the reason i like x86-64 more than ia64 for now is that the x86-64 can handle 32bit x86 fine. while ia64 can do 32bit x86 in emulation, the speeds are much slower.

also, i like that ia64 is abandoning x86 which is an outdated dinosaur. =/

<------ headache city
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81


<< while ia64 can do 32bit x86 in emulation, the speeds are much slower >>


The Itanium processor doesn't emulate x86 - or at least no more than Pentium III's and Althon's currently do. x86 mode is implemented seamlessly (ie. you can switch back and forth in assembly code) in hardware. While the internal implementation might seem to be a hardware emulator if that's the case then we've only been using x86 emulators for some time now.
 

Noriaki

Lifer
Jun 3, 2000
13,640
1
71


<< anything with a pipeline needs OOO execution for best performance >>

That's not necessarily true, that statement depends on the instruction set you are executing. You have to remember that in IA-64 the compiler is responsible for oragnize opcodes so that the CPU stays busy. The compiler organizes the final code so it is inherently parallel/OOO. Thus by relying on smart compliers the IA-64 can quickly and effeciently extecute ops as they come at it.

Later generation x86 chips need OOO to make x86 code run quickly because it's compiled in order.

OOO execution is slower than in order on a processor level. But the x86 ISA is so ineffecient that the optimzations to that instruction set that can be gained by OOO outweigh the processor level penalties of OOO.

With IA-64 it's a new ISA so they can make the instruction stream effecient, and then execute in order making the processor effecient.

Everything is effecient, everything is fast.
 

BurntKooshie

Diamond Member
Oct 9, 1999
4,204
0
0


<< With IA-64 it's a new ISA so they can make the instruction stream effecient, and then execute in order making the processor effecient.

Everything is effecient, everything is fast.
>>



I beg to differ. It might become the case that performance of IA-64 architectures will increase in integer performance with more functional units (ala McKinely), and due to more advances in compiler technology, but I tend to disagree about everything being &quot;efficient, and fast.&quot; FP code doesn't need to be OoO quite as badly as integer code, because in general, FP code has fewer branches, and control statements. This is why the Itanium is able to perform so well in FP applications (in part). Yet, if you've looked, it has SPECInt2k scores akin (at like clock speeds) to the UltraSparc III, which turns in some of the lower of the high-end RISC scores.

At one point, there was the joke going around about whether Merceds yields was going to be in dies per wafer, or good wafers per die. Intel has still not divulged the die size, but its not too tough to try to compare it.

Take the Alpha, for instance. Alpha 21264B @ 833mhz, and has a die size ~115mm2, with 15.4 million transistors. Both the Merced and Alpha have the same amount of on-die cache (128Kb). But Merced has 25.4 million transistors. This means that there are ~10 million more transistors in hard logic, which should add quite a bit to the die size of the Merced.

21464B uses commodity 128-bit DDR SRAMs (which run at < 300mhz, for 8.5Gb/sec bandwidth)), rather than the much more expensive custom SRAMs Intel uses for the Merced (which run at 733 and 800Mhz, respective to the chips they are on).

I'm not trying to start a war, but to say that with iA-64 everything is streamlined and more efficient, and fast just doesn't seem to ring true, at least, not yet.

The Int performance of Alpha based systems is dramatically higher than the Merceds, even given similar on-die cache resources, similarly equipped functional units, and with the Alpha have only 32 architectual registers (while Merced has 128). FP performance of the two chips is somewhat comparable, now that the &quot;Spike&quot; tool has been used in the submission of SPEC scores (Spike is somewhat similar to profiling, as far as I can gather, but real time, and an application, not disimilar to Dynamo....but if I'm wrong, someone please correct me). If all manufacturers took the time to use profiling, scores would certainly improve.

All I'm saying is that, right now, IA-64 is NOT efficient (for FP stuff, it is indeed fast). It takes far more die space, and uses more expensive parts for far less Int performnace, and somewhat equivalent FP (I didn't say the parts sold for less, as Merced chips are cheaper, but the costs of manufacturing Merceds is likely higher than Alpha parts. Intel can just amortize the R&amp;D costs over more parts, so it's cheaper).

Given that IA-64 is basically a VLIW machine, and has no OoO, it appears (for now, at least) that the bells 'n whistles that IA-64 uses to make up for NOT being OoO don't do enough to bring it up to the performance afforded by preexisting technologies.

I'm not saying that IA-64 will be bad or inefficient in the future, but it is definetly inefficient now. Simply put, compared to the Merced, 21264B does more with a lot less.
 

br0wn

Senior member
Jun 22, 2000
572
0
0


<<
OOO execution is slower than in order on a processor level.
>>



Noriaki, I would be interested in papers or articles that show this.
According to some of research papers I read,
OOO outperformed in-order processor (especially those
that don't employ shelving and register renaming)
consistently 15-50% in their performance comparisons. The gap
is even wider for multimedia applications.



<<
The Int performance of Alpha based systems is dramatically higher than the Merceds, even given similar on-die cache resources, similarly equipped
functional units, and with the Alpha have only 32 architectual registers (while Merced has 128). FP performance of the two chips is somewhat comparable, ...
>>



Don't forget that Alpha has MUCH greater system bus bandwith and better compiler support
Plus, I wouldn't call 20% difference in the FP performance is
comparable
However, the 1 GHz Alpha outperforms Itanium in both int and FP
performance (are they still going to produce this 1 GHz chip?)

But, I agree with most of your posts that Itanium is quite inefficient at the moment. You can just compare it with McKinley (which has
higher clock rates with shorter pipelines than the Itanium).
 

BurntKooshie

Diamond Member
Oct 9, 1999
4,204
0
0
Well, the FP performance of the Alpha jumped quite a bit just recently, to be above that (peak) of the Itanium....before the new submission (of the same configurations), the Alpha was a little bit lower....just shows how much profiling helps, and once other vendors start duoing it more, one of Intels advantages will start to diminish.



<< Noriaki, I would be interested in papers or articles that show this.
According to some of research papers I read,
OOO outperformed in-order processor (especially those
that don't employ shelving and register renaming)
consistently 15-50% in their performance comparisons. The gap
is even wider for multimedia applications.
>>



The figure I've heard is ~30%, for heavily integer based applications. I think what he's referring to is the clock-rate issue....which is what Intel and HP were betting on (that the complexity of OoOE would significantly detrimental to clock-rate, and that the bells 'n whistles of EPIC would give the performance of OoOE, without the gotchas...the only gotcha being that it doens't appear to be working). Without OoOE, as you know, chips should be able to clock higher....but whether it allows the processor to scale in frequency enough to make up for the 30% performance loss...I don't think so (yet).




<< Don't forget that Alpha has MUCH greater system bus bandwith and better compiler support
Plus, I wouldn't call 20% difference in the FP performance is
comparable
However, the 1 GHz Alpha outperforms Itanium in both int and FP
performance (are they still going to produce this 1 GHz chip?)
>>



Well, yes, it has more main memory bandwidth (this is one of the gripes that a lot of people have with Merced), but with STREAM, the chipset is showing that it offers ~67% of theoretical bandwidth, which is much, much higher than most other chipsets. Also, the Merced has significantly more (50% more) bandwidth from the L3 cache than the Alpha has from the L2 cache. Of course, large scale server programs need memory bandwidth more than other systems need it (due to larger program foot-prints)

[/i] >>

But, I agree with most of your posts that Itanium is quite inefficient at the moment. You can just compare it with McKinley (which has
higher clock rates with shorter pipelines than the Itanium).[/i] >>

Well, I'm sure McKinely will do well (we have word from Wingnutz, or was it PM...that McKinely &quot;is all that&quot. I see little doubt that McKinely will be a top performer (the 2x performance increase that was cited actually seems resonable, given the mammoth main memory bandwidth improvement, decrease in mispredict penalty, ILP increases, and better memory hierarchy). The only thing I'm wondering is if the performance increase is worth the incredible die sizes associated with having 3 levels of cache on-die (I hope so )!
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81


<< Well, I'm sure McKinely will do well (we have word from Wingnutz, or was it PM...that McKinely &quot;is all that&quot. >>

That would be Wingz. But he's right.

Patrick Mahoney
IPF (McKinley) Microprocessor Design
Intel Corp.
 

BurntKooshie

Diamond Member
Oct 9, 1999
4,204
0
0
Hehehe, cool, PM

As for the 1ghz Alpha...it must be slated for release relatively soon, as SPEC scores were just submitted for it. They claim hardware availablity since Jun, but that doesn't seem quite right....

Anyway, it doesn't appear they used Spike for the 1ghz Alpha submissions, because the peak FP is actually lower than the 833 submission, which does use Spike.

Just for example: The inclusion of spike (i'm too lazy to look up ALL the compiler changes) made the 833 go from 685 peak up to 784 peak. The 1ghz Alpha is showing 756, but it too is using Spike....

Anyway, the hardware difference is that the 833 is on an ES40 based system, while the 1ghz chips are on the GS series.

Of course, the only Merced &quot;Peak&quot; scores are the same as the base, so there is still room for improvement there.
 

Noriaki

Lifer
Jun 3, 2000
13,640
1
71
Let me just clarify...
My comments were not meant to compare IA64 to the Alpha, or even really about the IA64 as it is now.
What I meant was more the instruction set design behind the IA64 is intended to have compilers produce an effecient stream of instructions, that it can execute in order at high speed.
I was reffering to the IA64-ISA, not any particular hardware implementation of that ISA.

And yes ,I was referring to the projected clockspeed increases for my statement about InOrder being more effecient at the processing level.

Sorry about the confusion there.


The point I was trying to make is that in this case the compiler should handle the OOO for you, letting the CPU just crunch ops in the order they are recieved, and leaving it up to the compiler/software to deal with parallelism and order.
Rather than where the x86 uses a dumb compiler that produces an orderd stream of instructions, and then you rely on clever hardware designs to make sure you don't get wasted processor time.
Now is the first way better in current implementations? No not really. But it's brand new. Most things aren't better than the old way when they are brand new.

Just a personal opinion, I think that relying on smart compilers to optimize is a better scheme, becuase the compiler has &quot;knowledge&quot; of the whole program. A CPU has only knowledge of what's in it's instruction buffer. Yes a &quot;smart&quot; compiling will take longer, but you only have to compile once, so you spend more time on that single compliation to make the program run faster in the future.
Perhaps it won't work out that way, but I think that relying on smart compilers is a long term better solution.
 

BurntKooshie

Diamond Member
Oct 9, 1999
4,204
0
0


<< Just a personal opinion, I think that relying on smart compilers to optimize is a better scheme, becuase the compiler has &quot;knowledge&quot; of the whole program. A CPU has only knowledge of what's in it's instruction buffer. Yes a &quot;smart&quot; compiling will take longer, but you only have to compile once, so you spend more time on that single compliation to make the program run faster in the future.
Perhaps it won't work out that way, but I think that relying on smart compilers is a long term better solution.
>>

I think in the long run it will pay off. I also think that existing architectures could make use of such technologies and exploit its performance benefits nearly as much as IA-64 chips. The difference is that with these other architectures, the hardware would simply be used less. I think with these other architectures, there would be a slow transition where the compiler slowly took over for the hardware. I think the Itanium, along with Transmettas chips, are proof that the line between the hardware and software interface (not intending to quote H&amp;P) is not only blurry, but I think the line is kind of wobbly too.
 

br0wn

Senior member
Jun 22, 2000
572
0
0


<<
<< Just a personal opinion, I think that relying on smart compilers to optimize is a better scheme, becuase the compiler has &quot;knowledge&quot; of the whole
program. A CPU has only knowledge of what's in it's instruction buffer. Yes a &quot;smart&quot; compiling will take longer, but you only have to compile once, so you
spend more time on that single compliation to make the program run faster in the future.
Perhaps it won't work out that way, but I think that relying on smart compilers is a long term better solution. >>

I think in the long run it will pay off. I also think that existing architectures could make use of such technologies and exploit its performance benefits nearly as
much as IA-64 chips. The difference is that with these other architectures, the hardware would simply be used less. I think with these other architectures,
there would be a slow transition where the compiler slowly took over for the hardware. I think the Itanium, along with Transmettas chips, are proof that the
line between the hardware and software interface (not intending to quote H&amp;P) is not only blurry, but I think the line is kind of wobbly too.
>>



In this part, I beg to differ.
Basically what you say that VLIW architecture (need compiler support)
will overcome superscalar architecture.

First, let me quote some of my own saying from another thread
(The future of microarchitecture) about the problems with
VLIW architecture.


<<
In VLIW architecture, compiler
is responsible for detection and removal of control,
data and resource dependencies, resulting its compiler
is much more complex. Not only that, in order to schedule
operations, the VLIW architecture has to be exposed
to the compiler in considerable detail.
This means that the compiler has to be aware of all the
important characteristics of the processor and memory,
such as the number and type of the available execution
units, their latencies, and so on.
The consequence is that a given compiler cannot be
used for subsequent models of a VLIW line, even if these
models are compatible in the conventional sense.
The impact of this can be accessed by imagining how
cumbersome it would be if each different x86 processor
required the use of a different compiler.
Thus, this sensitivity of VLIW compilers is possibly
the most significant drawbacks of VLIWs, and may decide
their future.
>>




Another thing to add about the ability of VLIW compiler in
scheduling instructions is that IT CAN NEVER outperform dynamic
scheduling scheme (done by superscalar processor).
Why?
The &quot;knowledge&quot; that VLIW compiler has is limited to the compile-time.
You can't resolve all dependencies until run-time as data for some
instructions are not available until run-time (some instructions
rely on user inputs).

Another problem is that static scheduling (scheduling done by compiler) has a disadvantage over dynamic scheduling (scheduling
done by superscalar processor) in term of function calls.
In compile-time, assuming compiler has ability to perform
interprocedural analysis (analysis needed to perform inter function
optimization), it generates a lot of TRUE dependencies because
of the function calls (frame and stack pointers).
However, dynamic scheduling doesn't have this problem as it just
doesn't see any difference with function calls. All it sees is
a stream of instructions. Thus, it is able to generate a better
schedule.

Does this mean dynamic scheduling is SUPERIOR to the static scheduling?
I'll let you decide for yourself.

Now, a lot of research are directed toward COOPERATION of compiler
and hardware.


 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |