IBM Cell as GPU

djhuber82

Member
May 22, 2004
51
0
0
I'm sure everyone here has heard of IBM's Cell processor (aka the PS3 processor) and its highly parallel architecture, but here's Anand's latest take on it just in case:
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2379

It sure seems like the Cell would make a decent GPU. It's got ridiculous memory bandwidth, and 8 parallel DSP-like cores that run at +4GHz. Are there any GPU people out there who can tell me why this wouldn't work well? Is a modern GPU just way more parallel than this? Because if it is feasible, seems like somebody could slap a Cell and some memory on a board, write some software and have a decent 3D video card. And since the GPU would be implemented in software on a well-documented processor, there is the potential for an open-source video card without the need to fabricate a custom chip. Any thoughts on why this would/wouldn't work?
 

djhuber82

Member
May 22, 2004
51
0
0
Before I get flamed as a Linux freak let me say that the open-source thing was just an afterthought. What I'm really interested in is the techical feasibility of a Cell-based graphics card.
 

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
the reason they went with the Nvidia architecture was that the cell was not performing in the GPU environment as well as they wanted....
 

Todd33

Diamond Member
Oct 16, 2003
7,842
2
81
I've been to a few NDA meetings about the Cell, nothing that saw makes it a GPU at all. It's fast at math, each SPE has it's own local memory for code and data and they can all run in parellel.
 

Tenshodo

Member
Jul 8, 2005
116
0
0
I thought the reason that they didn;t use cell as the GPU was that it would drive up costs...
 

djhuber82

Member
May 22, 2004
51
0
0
I'm not that farmiliar with the hardware that's inside a GPU (I'm an analog guy). I guess I just want to know if Cell's SPEs could mimic that hardware well.

Originally posted by: Todd33
I've been to a few NDA meetings about the Cell, nothing that saw makes it a GPU at all. It's fast at math, each SPE has it's own local memory for code and data and they can all run in parellel.

Isn't that what a GPU does? Lot's of fast math in parallel?

Thanks for the comments everybody.
 

Todd33

Diamond Member
Oct 16, 2003
7,842
2
81
Originally posted by: djhuber82
I'm not that farmiliar with the hardware that's inside a GPU (I'm an analog guy). I guess I just want to know if Cell's SPEs could mimic that hardware well.

Originally posted by: Todd33
I've been to a few NDA meetings about the Cell, nothing that saw makes it a GPU at all. It's fast at math, each SPE has it's own local memory for code and data and they can all run in parellel.

Isn't that what a GPU does? Lot's of fast math in parallel?

Thanks for the comments everybody.

GPUs have specialized hardware for special functions. Pixel shader for example. The Cell is inbetween a general CPU and something specialized like a GPU. There are some good overviews of the Cell , like:

http://www.blachford.info/computer/Cell/Cell0_v2.html

 

Bassyhead

Diamond Member
Nov 19, 2001
4,545
0
0
Originally posted by: Todd33
Originally posted by: djhuber82
I'm not that farmiliar with the hardware that's inside a GPU (I'm an analog guy). I guess I just want to know if Cell's SPEs could mimic that hardware well.

Originally posted by: Todd33
I've been to a few NDA meetings about the Cell, nothing that saw makes it a GPU at all. It's fast at math, each SPE has it's own local memory for code and data and they can all run in parellel.

Isn't that what a GPU does? Lot's of fast math in parallel?

Thanks for the comments everybody.

GPUs have specialized hardware for special functions. Pixel shader for example. The Cell is inbetween a general CPU and something specialized like a GPU. There are some good overviews of the Cell , like:

http://www.blachford.info/computer/Cell/Cell0_v2.html

Exactly. General purpose CPUs aren't well suited for special uses like a GPU, which are targeted to generating graphics. Plus generalized CPUs take a lot more investment to develop, while GPUs have a shorter life cycle and are developed maily for performance and not optimization. The same goes for today's CPUs and GPUs, not counting the Cell processor at all.
 

djhuber82

Member
May 22, 2004
51
0
0
I understand that GPUs are mostly just DSP and can be developed quickly using logic sythesis tools (Synopsis, etc). And Cell is general purpose DSP, so it will be slower than something designed for a specific purpose. But I don't think it should be THAT much slower. I acknowledge that there are probablly some specialized functions that the SPEs could not handle well, but what are these? What specialized hardware do you need that cannot be done with some really fast floating point units and memory? Big lookup tables or fixed multipliers or some such thing? Do GPUs even use floating point, by the way? If not then the SPEs might be alot of overhead. From the link that Todd33 posted, there is a figure showing the SPEs linked together to form a Digital TV decoder:
http://www.blachford.info/computer/Cell/Cell2_v2.html
Why couldn't you stream those SPEs together to form a GPU? 7 SPEs might not be enough, but apparently you could add more Cells. And since they're targeted at imbedded applications they should be cheap once they're being produced in volume.
 

Bassyhead

Diamond Member
Nov 19, 2001
4,545
0
0
Originally posted by: djhuber82
I understand that GPUs are mostly just DSP and can be developed quickly using logic sythesis tools (Synopsis, etc). And Cell is general purpose DSP, so it will be slower than something designed for a specific purpose. But I don't think it should be THAT much slower. I acknowledge that there are probablly some specialized functions that the SPEs could not handle well, but what are these? What specialized hardware do you need that cannot be done with some really fast floating point units and memory? Big lookup tables or fixed multipliers or some such thing? Do GPUs even use floating point, by the way? If not then the SPEs might be alot of overhead. From the link that Todd33 posted, there is a figure showing the SPEs linked together to form a Digital TV decoder:
http://www.blachford.info/computer/Cell/Cell2_v2.html
Why couldn't you stream those SPEs together to form a GPU? 7 SPEs might not be enough, but apparently you could add more Cells. And since they're targeted at imbedded applications they should be cheap once they're being produced in volume.


A GPU is much more specialized than just a few functions. Compare a GPU pipeline to a CPU. Typically, GPUs use much more parallelism. The pipelines are much longer, the GeForce3, for example, typically takes 800 clocks compared to 10-20 for CPUs of its generation to pass through a single pipeline.

GPUs do use floating point, and they're usually pretty good at it. In fact, I believe nVidia has software that allows the Quadro to perform floating point calculations among other things as a secondary processor to a system however there are cheaper more effective specialized processors that could be used in things say render farms for this purpose.

Again, don't forget that a GPU is usually much cheaper than a CPU. A midrange GPU might cost $50-70 (the GPU itself, not a graphics card) while a midrange CPU will set you back $200 or so.
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
I'm going to try and keep this simple and will focus on one major element to give a general illustration- I'm not trying to be comprehensive as it you could pen a book on this subject

To simplify things we are going to say the Cell has eight SPEs(makes it simple) which is certainly a possible configuration(actually it is how they are fabbed with one SPE disabled).

For the calculations a GPU handles Cell is an absolute monster overall- it can handle them at a speed that would easily make it comparable to a current GPU overall. But in practice we run into a major issue with how the chip is laid out.

Looking at moving a single pixel through the pipeline we have to factor in the amount of samples that that pixel is going to need for final rasterization. With basic bilinear filtering that number is four, trilinear is eight with 2x AF upping it to 16 and so on until we hit 16x AF with 128 samples per trilinear filtered pixel.

GPUs have hardware dedicated to grabbing these samples and blending them in the pipeline on the fly- they do this with hardly a bump in performance even when looking @128 samples per pixel. There may be a ~10% overhead give or take, but particularly at the lower sampling levels there is no performance hit at all.

With Cell each pixel would have to be handled once for every sample and then again for the blending phase. If you talk about basic trilinear filtering you would need nine passes through a SPE, or you would need to consume all eight SPEs for a single pixel and then hand it off to the general purpose unit for blending operations. For basic trilinear filtering the monster power of Cell would be reduced to a 3.2GPixel part- nothing in comparison to today's 7GPixel and higher offerings. But it gets worse, if you were to compare Cell to a modern GPU when running 16xAF then Cell falls all the way down to a pathetic 200MPixel part- placing it behind the TNT1 in terms of raw fill(although with 16x AF which the TNT couldn't handle).

So there you can see why Cell falls down flat as a rasterizer, they could rectify this by adding more transistors to the design to compensate- but that would be a complete waste of die space elsewhere. Also, these figures are based on Cell only handling basic rasterization, forgetting completely about all of the other functionality a GPU must handle. Pixel shaders bring about an entirely different nightmare performance wise, and one that would be cummulative on top of the horrible performance hit you are taking from basic rasterization.

One area where Cell is particularly potent is in Vertex Shader ops. Here it can easily go toe to toe with any GPU we have seen, although if this will be utilized or if they are planning on including the VS units in the RSX isn't clear at this point in time.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |