Originally posted by: BladeVenom
Yes, but in PCs and consoles most of that is handled by the video and sound cards. What does that leave for the cells to process? Physics maybe? So far it's looking pretty useless for most things. Maybe someone will come up with some science or math programs for the cell processor, but how useful would that be for most PC and console users?
Cell is interesting not because it will be extremely useful or easily integrated into in modern desktop or embedded platforms, but because it is designed around a programming model that is revolutionary from the perspective of commodity microprocessors (Intel, AMD, SUN, IBM etc). Specifically, cell uses an explicitly parallel programming model that is exposed to the programmer at a very low level. It starts with a stripped down version of a Power5 that is architecturally similar to modern x86 processors in that it uses techniques like multilevel caches, multilevel branch predication, register renaming, reorder buffers, can support multiple instruction issue/retire, has parallel functional units, etc... This Power5 is supplemented with 8 other RISC cores that are completely different from modern architectures in that they throw away things like hardware caching, all but basic branch prediction, out of order execution (a big one), register renaming, etc. This is almost unheard of since practically all commercial advances made in computer architecture from the start of the 90s up to even Intel's new Core architecture have come from defining and refining those techniques.
The engineers who built cell justify these decisions in the following way: they argue that modern processors are not mainly made of components that do useful work. What is useful work? Its addition, multiplication, bitwise logical operations, comparison, memory operations, etc. Most processors devote something like 10-15% or even less of chip area to logic that actually does useful work. The rest is devoted to looking at instructions that were generated by compilers and programmers without much underlying knowledge of the architecture that they were to be run on and shuffling them around so that they can execute immediately without having to wait for a valued to be loaded from slow off-chip memory or for the result of another instruction. In x86 they additionally need to be translated from variable width CISC to fixed width RISC. Peter Hofstee (one of the chief architects who designed cell) argues that, at least in the case of the applications targeted by cell, that it is a better use of chip area and power resources to devote most of the chip to functional units that do useful work and provide the programmer with an interface to resolve dependencies between instructions and hide memory latency. Whether it actually is better is not immediately clear as it depends on how well people (and compilers) are able to do these kind of optimizations when they write programs. It is the case that if the operations were optimized ideally for a modern Intel/AMD type architecture and a Cell type architecture, Cell would be dramatically faster.
Aside from the whole ditching caches and most of the prediction/reordering/control logic, cell is also interesting in the fact that it uses an explicitly parallel programming model. This is one proposed solution to writing programs for multi-core environments that is different from the kind of model that you would find in even a multiprocessor or even a traditional multi-core machine where applications are divided into different threads that are essentially treated like separate programs with possibly some shared memory or support for synchronization primitive operations between threads. Instead, the interface for loading parts of a program into different cores (called SPEs) and transferring data between SPEs is exposed directly to the programmer. Once again, it puts more control in the hands of the programmer and in this case, takes it away from the operating system. It is my opinion that multi-core systems will have to move to this programming model eventually as the number of cores increases simply because it is extremely difficult to design a compiler or build support into the operating system that does this partitioning efficiently because the optimization space grows exponentially with the number of cores in the system.
Anyways, I seem to have gotten off topic here but I guess the main point is that experimental architectures like that used in cell have the potential to be much more efficient and better performing than traditional architectures as long as programmers actually take the time to learn the requirements of the architecture and make full use of the interfaces provided by the programming model. Conversely, if you just hack something together out of code that was intended to run on an Intel x86 and expect the operating system and underlying architecture to optimize your code for you, then you are out of luck and will get much worse performance than running on a traditional architecture.