The Cell Processor by IBM and....

Holmecollie · Aug 11, 2003

Well I read som archived post mentioning the Cell proccessor by IBM and Toshiba and read the article @ gamespot

http://www.gamespot.com/ps2/news/news_6073040.html

Let's say this proccessor is that good, will it reach the mainstreem or end up powering supercomputers and $$$ servers?

BoberFett · Aug 11, 2003

Originally posted by: Holmecollie
Well I read som archived post mentioning the Cell proccessor by IBM and Toshiba and read the article @ gamespot

http://www.gamespot.com/ps2/news/news_6073040.html

Let's say this proccessor is that good, will it reach the mainstreem or end up powering supercomputers and $$$ servers?

Forgive my skepticism, but 100 times more power than a P4?

Do you really think the engineers at IBM are 100 times smarter than the engineers at Intel?

Robor · Aug 11, 2003

Originally posted by: Holmecollie
Well I read som archived post mentioning the Cell proccessor by IBM and Toshiba and read the article @ gamespot

http://www.gamespot.com/ps2/news/news_6073040.html

Let's say this proccessor is that good, will it reach the mainstreem or end up powering supercomputers and $$$ servers?

The real question is, how many of us need a CPU that fast? Plus as the article said it's going to take a lot of programming to get an OS and software together for it. But to answer your question I think it would windup where the $$$ is at.

DurocShark · Aug 11, 2003

Originally posted by: BoberFett

Originally posted by: Holmecollie
Well I read som archived post mentioning the Cell proccessor by IBM and Toshiba and read the article @ gamespot

http://www.gamespot.com/ps2/news/news_6073040.html

Let's say this proccessor is that good, will it reach the mainstreem or end up powering supercomputers and $$$ servers?

Click to expand...

Forgive my skepticism, but 100 times more power than a P4?

Do you really think the engineers at IBM are 100 times smarter than the engineers at Intel?

No, but they're not hindered by needing x86 compatability. I'm sure if the Intel engineers were allowed to do something totally unique and unsupported, they'd come up with something super powerful too.

cow123 · Aug 11, 2003

i read this thing a while back, anyway its not 100x faster than the p4 generally... just in floating point performance, also i wonder what p4s were out when that article was published... maybe willamettes?

edit: oh nevermind it said 2.5ghz

WarCon · Aug 11, 2003

I am just curious how they are going to power/cool a beast that has 16 processors on one die? Even if they manage a 50% power reduction per processor, your still looking at 640 watts of power at full load (Based on current processor power usage). Maybe they aren't going to have much onboard cache, which would further reduce power needs.

If this becomes real and affordable, it will make P4/P5/Opteron a thing of the past as even a poor OS emulator that only goes 1/4 the speed will still be putting out 250 gflops.

PlatinumGold · Aug 11, 2003

Originally posted by: DurocShark

Originally posted by: BoberFett

Originally posted by: Holmecollie
Well I read som archived post mentioning the Cell proccessor by IBM and Toshiba and read the article @ gamespot

http://www.gamespot.com/ps2/news/news_6073040.html

Let's say this proccessor is that good, will it reach the mainstreem or end up powering supercomputers and $$$ servers?

Click to expand...

Forgive my skepticism, but 100 times more power than a P4?

Do you really think the engineers at IBM are 100 times smarter than the engineers at Intel?

Click to expand...

No, but they're not hindered by needing x86 compatability. I'm sure if the Intel engineers were allowed to do something totally unique and unsupported, they'd come up with something super powerful too.

They were with the Itanium Processor. NO x86 compatability, clean sheet design. still not 100 time more powerful than a P4.

cow123 · Aug 11, 2003

Itanium is still x86, its just IA64 x86

Random Variable · Aug 11, 2003

The processors on the Wildcat VP can achieve 1.2 TeraOps and 200 Gflops.

buleyb · Aug 11, 2003

Originally posted by: cow123
Itanium is still x86, its just IA64 x86

They support 32bit x86 through emulation.
**EDIT** meaning, like said below this post, IA64 is not x86

And making all these processors run together isn't tough. Making software that can use them for a reasonable cost, thats a hell of a lot tougher. Parallel systems are much harder to write code for, because timing and syncronization is such a overwhelming problem.

and making a 100x faster floating point engine isn't hard, its called altivec, and now its in parallel.

TerryMathews · Aug 11, 2003

Originally posted by: cow123
Itanium is still x86, its just IA64 x86

Ummm... Say what? Itanium is not x86. It can't run x86 instructions. IA64 != x86. Opteron counts as x86 as it can run x86 apps.

Here's a perfect example. Every x86 machine should be able to run edit or edlin from MS-DOS. Good luck getting an Itanic to boot MS-DOS (without the assistance of Windows and emulation)

pm · Aug 11, 2003

Good luck getting an Itanic to boot MS-DOS (without the assistance of Windows and emulation)

You wouldn't need luck. An Itanium boots to MSDOS fine. I don't know why you would want to, but you could easily. Without Windows or emulation. There is an internal hardware translation engine on the core of the Itanium and the Itanium 2 that translates IA32 instructions on the fly into IA64. The system will boot MSDOS, or Windows 3.11 if you want.

As far as Cell. My only reply is to wait and see. I seem to remember similar levels of enthusiasm for Transmeta's products. What they are doing is not what I would call "revolutionary", and it's my expectation that, while it may be signficantly faster at certain tasks, it won't be substantially faster for more general operations. It noteworthy that IBM itself says that "elements of its design will be seen in future server chips from IBM". Note that they are not saying "we are replacing our entire product line with Cell." IBM's current high end server processor is the Power 4 and it's not "100 times faster than a 2.5GHz Pentium 4".

BD231 · Aug 11, 2003

Good stuff, it's about time for another big jump in processing power/technology. I really hope processor companies start thinking about heat output though......, I'd get a 3+ghz P4 but the need for an air conditioner turns me off.

All purpose gaming/24-7server processors that requier no cooling and have more power than anyone could ever possibly need.

TerryMathews · Aug 11, 2003

Originally posted by: pm
You wouldn't need luck. An Itanium boots to MSDOS fine. I don't know why you would want to, but you could easily. Without Windows or emulation. There is an internal hardware translation engine on the core of the Itanium and the Itanium 2 that translates IA32 instructions on the fly into IA64. The system will boot MSDOS, or Windows 3.11 if you want.

This is interesting. I'm going to have to bitch-slap my sources. I guess they got confused by Intel's emulation layer for Windows and assumed that the chip couldn't natively do it. Evidently, the Windows emulation layer is just better than the hardware support?

buleyb · Aug 11, 2003

Don't worry Terry, you'll be right in due time. Intel is removing/disabling the hardware emulation in future Itaniums in favor of a software emulation (as it performs better, who knew). So future Itaniums won't be able to boot MS-DOS natively

FearoftheNight · Aug 11, 2003

Can pm or someone here explain this to me? WHat exactly does x86 architecture mean? And what do "optimizations" such as sse/sse2/ do? thnx.

wetcat007 · Aug 11, 2003

Originally posted by: FearoftheNight
Can pm or someone here explain this to me? What exactly does x86 architecture mean? And what do "optimizations" such as sse/sse2/ do? thnx.

I'll put it as plainly as possible, x86=Linux/Windows platform, where as Mac uses it's own platform, that's why u cant install windows onto a mac. x86 is a way to keep all parts compatible with the OS's without having many different platforms competing that need different hardware. Current day CPU's are 686, and previous generations were 586, 486, 386, and so one.

Optimizations are when a program uses a set of instructions set within the CPU which can often speed up when a program uses them. Now in the case of Intel, it's pushing for software companies to use them, instead of AMD Optimizations heh, since they then get the push of SSE2, without 3dnow professional support, it makes there cpu generally faster in applications that take advantage of that, generally media applications.

As for IBM's cpu's I don't really have any faith this will be all they claim, and the kind of heat it produces as well as the insanly complex instructions needed will result in havin it slowed down anyways.

pm · Aug 11, 2003

This is interesting. I'm going to have to bitch-slap my sources. I guess they got confused by Intel's emulation layer for Windows and assumed that the chip couldn't natively do it. Evidently, the Windows emulation layer is just better than the hardware support?

To be honest I wasn't completely certain that it would work either. So I walked over and asked someone who works on the systems. He said that he'd booted Windows 95 in 32-bit mode, but had never tried MSDOS. But he said that he was certain that it would work. There is native hardware support for IA32 and IA16 on the Itanium and Itanium 2 (McKinley and Madison). There is also a binary translation mode that does a form of software emulation.

Can pm or someone here explain this to me? WHat exactly does x86 architecture mean? And what do "optimizations" such as sse/sse2/ do? thnx.

This will get pretty far off the original subject of the thread, so I apologize to the person who started the thread for somewhat hijacking it .

"x86" is a generic term that refers to binary compatibility with any x86 chip that Intel has produced (note that one might argue that this definition is very 'Intel-centric' but I think that it's honestly the correct definition) . The x86 family includes among others: the 8086, the 80286, the i386, the i486, the Pentium, etc. So a chip that is compatible with the "x86 architecture" should be able to run any program that was designed to run on any of these chips and any other previous microprocessor that is part of the "x86 family". Nowadays we take it for granted that you could run a program written 14 years ago to run on a 80286 microprocessor will work fine (and much faster) on a modern microprocessor like the Pentium 4, but throughout the greater history of the computer backwards compaibility with previous generations has been pretty rare. The term "x86" is IMO generic since it doesn't distinguish between 16-bit code and 32-bit code and I personally prefer to use IA16 and IA32 instead to be more specific.

As the microprocessor has developed, new instructions have been added such as MMX, SSE, and SSE2 among others. The purpose of all three of these additional instruction sets is primarily to be able to process multiple chunks of data simultaneously with one instruction. This is usually refered to as SIMD - Single Instruction Multiple Data. Other examples of SIMD include Motorola's Altivec and AMD's 3DNow instruction sets.

An example of a SIMD instruction taken completely at random would be the SSE instruction ADDPS - which adds 4 single precision FP numbers in one register to 4 single precision FP numbers in another register all with one instruction. Where this would normally take 4 instructions - if not more - this one instruction processes several numbers in parallel. This kind of parallel functionality is very useful in multimedia, 3D rendering and encryption and can speed up performance in these applications substantially.

But, of course, you only see these performance gains if these new instructions are used. If the program you are running never tries to use any of the SIMD instructions - possibly because it was written/compiled before these new instructions existed - then it will never see their benefit. So that leads to the final part of the question which is optimizing code for SIMD. In many cases, software code (such as written in the C programming language) can simply be recompiled with the latest compiler which knows about these instructions to be able to use them. But to really get the full benefit from any SIMD set of instructions, in most cases the authors of the software need to hand-optimize (write their code in the native language of the microprocessor rather than a higher-level language like C) their code to run best under a SIMD instruction. For example if they have a set of code that performs an operation on a matrix of numbers, in order to get the highest performance from their code they will probably try and hand-write a routine that implements this operation using SIMD. So optimizations for any of the SIMD instruction sets can take two forms: hand-optimized code which can be a difficult task but can yield very high performance gains, or recompiled optimizations where you just stuff your program into the latest compiler and tell the compiler to optimize for certain instructions sets.

For more information on the history of the microprocessor and the "x86 family" I highly recommend borrowing from your local library "The Microprocessor: A Biography" by Michael S. Malone. It's getting a little dated now, as it was written in 1995, but this actually allows it to focus more on the early history of the microprocessor. It's a great book - although a bit "fluffy" when it's not taking about history. It also has a good section on how microprocessors work.

For more information on SIMD and optimization, I would recommend typing expressions like "SSE optimize" and "MMX optimize" into Google.

zephyrprime · Aug 12, 2003

The thing has 16 cores so to be 100 times faster than a p4 each core would have to be about 6x faster. But each core would also be much smaller than a p4 core. I just don't think it's possible. To do about 1Teraflop, each core would have to support something like a 32 word long FP vector and 2GHz.

zephyrprime · Aug 14, 2003

You know, I've been thinking about it some more and I guess it could sorta be possible to attain something vaguely like the speed that is being claimed for the cell processor.

Alright, here's what I imagagine the cell processor will be like:

I reckon it will have about ~150-200 Million transistors and be made on a 0.09 micron process. So each core would have ~10M transistors. This is roughly equivalent to the number of transistors in an old P2 with off chip L2 cache.

So if each core ran at 2GHz with 32 word long vector capabilities it could attain 1 teraflop. Or, if it ran at 4GHz with 16 word long vector capabilities, it could also attain 1 teraflop.

The but instruction set that such a core could support would be very limited due to the limited number of transistors available. Also, some of the vector instructions may not be able to operate on an entire vector in one cycle. For example, it should be fairly cheap in terms of transistors to implement 16 word long add/sub/and/or/xor/shift/cmp/mov instructions but a mul instruction may only be able to act on 4 words at a time. And the transcendetal functions like cos/sin/log ? Forget about it. It may be neccessary to resort to good old software emulation to do these operations.

So in a way, such a processor I describe would be a sort of hybrid between a modern vector processor and an old school vector processor.

And such a processor would need humongous memory bandwidth.
This is just some speculating on my part so take it with a grain of salt.

anthrax · Aug 14, 2003

IBM already have Processors with 600 million transitors on chip such as the POWER 4 seiries used on the high end P series servers....The chip, and the board I belive is housed a book mounting which intergratea huge heat sink..

0roo0roo · Aug 14, 2003

Originally posted by: BD231
Good stuff, it's about time for another big jump in processing power/technology. I really hope processor companies start thinking about heat output though......, I'd get a 3+ghz P4 but the need for an air conditioner turns me off.

All purpose gaming/24-7server processors that requier no cooling and have more power than anyone could ever possibly need.

never heard of zalman? quiet cooler works well on p4.

NateSLC · Aug 14, 2003

Originally posted by: 0roo0roo

Originally posted by: BD231
Good stuff, it's about time for another big jump in processing power/technology. I really hope processor companies start thinking about heat output though......, I'd get a 3+ghz P4 but the need for an air conditioner turns me off.

All purpose gaming/24-7server processors that requier no cooling and have more power than anyone could ever possibly need.

Click to expand...

never heard of zalman? quiet cooler works well on p4.

Zalman might help get that heat off the processor and into the room more efficiently, but he'll still need an A/C unit to get that heat out of the room.

0roo0roo · Aug 14, 2003

i forget how much a p4 3ghz puts out, but it can't be more then a 100watt lightbulb you live in hell or somethign?

zephyrprime · Aug 14, 2003

IBM already have Processors with 600 million transitors on chip such as the POWER 4 seiries used on the high end P series servers

The Power4+ only has 184 million transistors.

The Cell Processor by IBM and....

Member

Lifer

Elite Member

Lifer

Senior member

Diamond Member

Lifer

Senior member

Lifer

Golden Member

Lifer

Elite Member Mobile Devices

Lifer

Lifer

Golden Member

Diamond Member

Diamond Member

Elite Member Mobile Devices

Diamond Member

Diamond Member

Senior member

No Lifer

Senior member

No Lifer

Diamond Member