The future of CPU architecture: post-RISC/CISC, VLIW, or other?

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
With the recent acquisition of the Alpha team by Intel, I was thinking about where the future of high-performance CPUs will lie.

The RISC/CISC barrier has blended a lot over the past 5 years, with x86 CPUs using a lot of RISC philosophies. While x86 CPUs still lag in FP performance based on SPEC (though the difference is no longer that significant), integer performance is more-or-less identical, and their cost is a fraction of RISC CPUs. Will the x86 instruction set reach a barrier in performance (discounting clock speed increases)?

On to RISC...many of the design philosophies that defined the RISC revolution in the 80s seem to have been broken:
1. Keep the instruction set small, thereby using hardwired control instead of microcode. Current RISC instruction sets are much larger than the original MIPS R2000, especially with the addition of SIMD extensions.

2 Only add a function in hardware if it shows a substantial increase in performance without a substantial increase in transistor count. The MIPS R2000, which stands for Microprocessor without Interlocking Pipeline Stages, attempted to use software to insert pipeline bubbles whenever there was a data or branch hazard. This proved infeasible, and the R3000 had hardware interlocks. Also, early RISC designs tried to rely on software compilers to schedule instructions in the pipeline, but compiler technology was not advanced enough and performance lagged. Current high-end RISC designs employ a number of traditionally software features in hardware at a huge expense to the die area: out-of-order execution and retiring, large re-order buffers, register renaming, etc.

3. With most scheduling functions supported in software, die size and design time will be minimized. RISC die sizes are now huge, and most companies cannot keep up with the design and production costs. Alpha's EV68 and EV7 have experienced years of delay, and the team has recently been bought by Intel, and SGI, HP, and IBM have dropped or reduced support for their RISC designs in favor of Itanium for their workstations and servers. IBM's Power4 is taking the only route left: the extreme high-end with ultra-scalability for mainframes and super-computers. The only truely successfuly RISC platform has been Sun's UltraSPARQ (despite it's lack-luster performance), since they have been able to market the entire hardware/software package.

With these highlights in mind, will RISC's niche market continue to shrink, or does it have a future?


As for VLIW, the only CPU with mass-market potential seems to be Itanium and future IA64 processors. Crusoe will likely remain a laptop CPU, and Sun's MAJC is aimed at high-end scalability. But a lot of Itanium's performance depends on its compilers. When (if ever) will the IA64 compilers mature to the point where its integer performance makes it feasible to be introduced to the desktop?

Discuss.
 

Demon-Xanth

Lifer
Feb 15, 2000
20,551
2
81
Mixed, there is no pure CISC or pure RISC CPUs in PCs. The future is probably going to be alot like the past.
 

BurntKooshie

Diamond Member
Oct 9, 1999
4,204
0
0
About VLIW...the current consensus has been that, for most integer code, VLIW isn't the medium of choice (though it works well in DSPs). Of course, McKinnely could (should?) change that "consensus".

As for Sun and the MAJC - IMO, it's only a quasi VLIW, as they have two bits dedicated to describing the length of the instruction word. The "VLIW" can range from on instruction, all the way up through 4, so it's not like other implementations which simply have to add in NOPs because they can't find enough instructions in the compiler. If the MAJC architecture can't find 4 insructions, it simply makes the instruction word shorter (having only something like 1 or 2 instructions instead of the full 4)
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0


<< If the MAJC architecture can't find 4 insructions, it simply makes the instruction word shorter (having only something like 1 or 2 instructions instead of mthe full 4) >>

Isn't that potentially more efficient, by not wasting memory with useless NOPs?

I'm interested in McKinley...Intel has said it will be &quot;twice as fast&quot; as Merced. Apparently it will be clocked 50% higher than Merced, despite having 7 pipeline stages vs. Merced's 10...that's quite an engineering feat, considering they're being manufactured on the same process.
 

BurntKooshie

Diamond Member
Oct 9, 1999
4,204
0
0


<< Isn't that potentially more efficient, by not wasting memory with useless NOPs? >>

Well of course, my point being that it's not the same kind of VLIW that the Itanium uses. I don't know if it's really quite the same thing as VLIW, because the bundles can be of different lengths, so I was just thinking that MAJC doesn't quite fit the VLIW paradigm the same way that IA-64 does.


Last I heard was that shortly before tape-out, there was a &quot;last minute&quot; problem somehow with Itanium, and that they needed to add an additional stage. On the other hand, EV6x and EV7x have 7 stage pipeline (not quite fair to compare architectures I know), and EV6x clocks higher than Itanium. I can't possibly claim to be well versed in this area, but I've heard opinions that the differences between McKinely and Merced are likely due either to more engineering talent working on McKinely, simply having a cleaner design to start from (learning from Itanium mix-ups), but probably both.

The other things I find amazing are the fact that McKinely will have more execution units, more L2 cache, AND on-die L3 cache, shorter pipeline, and will clock higher in the same process (of course, massive die size)
 

Mapidus

Senior member
Jun 9, 2001
457
0
0
Right now it seems like all of the current technologies in new forms are possible future performance platforms.

VLIW of course is the most dramatic departure to the current norm in computer architecture. Intel and HP are betting everything they have on EPIC, there implementation of VLIW. Merced was much delayed for both hardware and software reasons and currently performance is not up to what was expected. Part of this is that compiler technology for VLIW has not progressed as far as was expected. If anybody is to pull of general purpose VLIW, it would be HP because they have much of the technology in chip and compiler technology for VLIW. HP gained this expertise by bying out several of the companies that were the original pioneers in VLIW. Because of HP's experience, Intel decided to partner with HP. McKinley is mainly an HP engineered chip (you'll see this if you ever visit Fort Collins) and rumors say it is much better than Merced (maybe because it has been worked on even longer). The hardware side of VLIW is pretty much solid since many parts are similar to current architectures. Compiler technology for VLIW and EPIC though has a long way to go. Once the compilers get to an acceptable point though the architecture will be a very strong one.

Another approach, which is what IBM is doing with their POWER line, is to integerate multiple cores on a chip (2 in the POWER 4) and to package multiple chips in a cartridge that has high speed interconnects between the chips (currently up to 4 in a cartridge for the POWER4). This gives 8-way SMP per catridge. IBM's reasons for this is that they believe it is easier to get enough process or thread level parallelism to drive SMP systems than it is to try to find the instruction level parallelism that is needed to drive a VLIW system to its fullest.

Compaq/DEC Alpha EV8 (maybe EV7 also, but this was so delayed that 8 was following after by less than a year) was experimenting with Simultaneous multi-threading (SMT). This is kind of like doing SMP on a single core and is another way to make use of thread-level parallelism. Who knows what will happen with these chips now that Intel has taken over. I'm not even sure if Intel has access to any of the DEC compiler researchers that were probably key to the success of SMT.
 

BurntKooshie

Diamond Member
Oct 9, 1999
4,204
0
0
it's not symetric multithreading, it's Simultaneous Multithreading. Basically, it is like combining the horizontal-waste removing capabilities of CMP, with the vertical waste saving abilities of FMT, but without having to sacrifice single threaded performance (unlike CMP).

Also, the POWER 4 has:
2 CPUs per die
4 dies per MCM
4 MCMs per board

That's 32-way NUMA multiprocessing, not SMP.

Also, EV7 is still coming out, but EV8 has been canned.
 

Mapidus

Senior member
Jun 9, 2001
457
0
0


<< it's not symetric multithreading, it's Simultaneous Multithreading. Basically, it is like combining the horizontal-waste removing capabilities of CMP, with the vertical waste saving abilities of FMT, but without having to sacrifice single threaded performance (unlike CMP).

Also, the POWER 4 has:
2 CPUs per die
4 dies per MCM
4 MCMs per board

That's 32-way NUMA multiprocessing, not SMP.

Also, EV7 is still coming out, but EV8 has been canned.
>>


Ok, fixed the Simultaneous part, thanks for catching it.

There is nothing about the POWER4 that dictates that it can only be used in a NUMA configuration though. 8-way SMP configurations with only 1 MCM in a system is a configuration that IBM said was likely. Of course if you go with 4 MCMs in a board you would want NUMA for scalability issues, you could make it SMP, but would not get the performance desired.
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81


<< McKinley is mainly an HP engineered chip (you'll see this if you ever visit Fort Collins) >>

I would dispute this statement: it was a joint design effort. I'd also dispute a couple of your other points, but I'm not sure how much detail I'd be allowed to go into.

Patrick Mahoney
IPF (McKinley) Microprocessor Design
Intel Corp.
 

unrequited

Member
Jun 13, 2000
103
0
0
I don't really see how Intel has conquered the entire processor market. The biggest draw of the x86 is cost/performance ratio. When compared to monster workstations and servers, there's just no way you can even think about x86 systems. It's often a lot smarter to spend $1500 or $2000 per x86 system in a cluster, rather than spending $50,000 for a killer RISC workstation. However, what do you do when you want performance an x86 system just can't give? Sometimes, you have to pay the big bucks and get an UltraSPARC or Alpha.

I have an old DEC Alpha. It's about the speed of a 90 MHz to 200 MHz Pentium, depending on whether you're doing floating point or integer operations. However, it's a true 64 bit CPU, something that not even the Pentium IV at 1.5 GHz can say. It cost me about $100 on an auction site, too. Runs NT4, Digital UNIX, VMS, BSD, and Linux, among other operating systems. It's a great machine, and I hope that the Alpha doesn't become lost technology. Compaq and Intel really haven't shown much enthusiasm for the Alpha architecture.
 

br0wn

Senior member
Jun 22, 2000
572
0
0
The following is my take on the current and future
of microarchitecture.
Currently, there are two competing architectures
which are superscalar and VLIW architecture.

The battle between superscalar against VLIW is much
like the battle between RISC and CISC.
In fact, they share the same battle. The real
advantage of a VLIW design over a comparable
superscalar design is the higher possible
clock rate
because of its reduced need for complexity.
This is the same advantage for RISC processors over
their CISC counterparts, RISC processor can be clocked
higher because its reduced need for complexity in
a load/store architecture versus the comparable memory
architecture of a CISC processor.
However, does this mean VLIW architecture is superior
against superscalar archicture? Don't go too fast.

In VLIW architecture, compiler
is responsible for detection and removal of control,
data and resource dependencies, resulting its compiler
is much more complex. Not only that, in order to schedule
operations, the VLIW architecture has to be exposed
to the compiler in considerable detail.
This means that the compiler has to be aware of all the
important characteristics of the processor and memory,
such as the number and type of the available execution
units, their latencies, and so on.
The consequence is that a given compiler cannot be
used for subsequent models of a VLIW line, even if these
models are compatible in the conventional sense.
The impact of this can be accessed by imagining how
cumbersome it would be if each different x86 processor
required the use of a different compiler.
Thus, this sensitivity of VLIW compilers is possibly
the most significant drawbacks of VLIWs, and may decide
their future.

On the other side, in order to achieve high performance
superscalar processors have introduced intricate
instruction issue policies, involving
advanced techniques such as shelving, register
renaming, and speculative branch processing
.
Another task to be handled is parallel instruction
execution
such as out-of-order execution.
However, this introduces problem as sequential program
execution need to be preserved. And yet, another
problem is arise, which is the preservation of consistency
of exception processing.

If we step up a little bit, we will see that
the above two architectures are classes of ILP
architectures (which is a family of function
parallel architectures
). Another function parallel
architectures is Process-level architectures
(MIMD).
There is another parallel architecture which is a
family of Data-parallel architectures.
This family includes vector architectures, neural
architectures, SIMDs, and systolic architectures.
Most of these architectures (other than ILP-architectures)
haven't found their way to the commercial market.
However, many of its techniques have been adapted
into ILP-architectures.
So stay back, as there are MANY more exciting
architectures to be explored and marketed.
What we have seen is just the beginning.
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0


<< I don't really see how Intel has conquered the entire processor market...When compared to monster workstations and servers, there's just no way you can even think about x86 systems >>

But Intel has conquered much of the high-end market, not with x86 but Itanium...ironically, before it has even seen mass production. Compaq, IBM, HP, and SGI all have adopted Itanium for most of their 64-bit workstation and server needs. Compaq sold Alpha to Intel...it only has a few years of life left with the EV68 and EV7; HP and SGI dropped their PA-RISC and MIPS, respectively, in favor of Itanium; IBM's upcoming Power4 is aimed at uber-expensive ultra-scalable machines. The only surviving RISC platform for workstations and servers is the UltraSPARQ, but the US-III's performance is pretty dismal. Sun is successful because they are a marketing machine, and are able to promote the entire software/hardware package.



<< So stay back, as there are MANY more exciting
architectures to be explored and marketed.
What we have seen is just the beginning.
>>

/me twiddles thumbs patiently.
 

br0wn

Senior member
Jun 22, 2000
572
0
0
Sohcan, it seems that you have lose your hope for RISC
Don't forget that current superscalar CISC processors (like
Pentium 4 and Athlon 4) are implemented using a superscalar RISC core.
CISC instructions are first converted into RISC-like
instructions (micro-ops) during decoding,
then they are executed using a superscalar RISC core.
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0


<< Don't forget that current superscalar CISC processors (like
Pentium 4 and Athlon 4) are implemented using a superscalar RISC core
>>

Oh, I'm aware of that...it's just a little disappointing to see a lot of these ambicious 64-bit RISC design go.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |