Suggestions for a very good 3D rendering machine

Polishwonder74 · Jul 24, 2007

Howdy, fellas!

With the advent of 2 and 4-core CPUs, I am wondering: Would it be worth it to sell a kidney and go for a dual-quad-core (8 cores total) setup?

I've been giving a lot of thought to spending some serious cash on a new system so that I don't have to wait 30+ hours for some of my animations to render. I'm still running on an old AthlonXP 2500 for my workhorse, and I steal a little CPU time at work on a couple Core2duo machines from time to time, so I think it's probably time for an upgrade soon.

I figure a good place to start would be a Anandtech's monthly guides, but does anyone know for sure that rendering time goes down that significantly as you add more and more cores? Like most thinks, I figure there must be some level of diminished return as far as rendering power vs. money.

This thread over at Blenderartists got me thinking, so I checked with the good peoples at Dell, and an 8-core monstrosity over there goes for like $4,000. That leads me to believe that I should be able to put myself together a system pretty similar to it for well under $2,000.

Do you guys have any experience with grossly-overpowered rendering boxes like this, or seen any articles with some rendering benchmarks lately?

:beer:

undeclared · Jul 24, 2007

4-cores and 8 gigs of ram is very budgetable..

8 cores is a lot more expensive (2 grand for CPUs and a few hundred more for the mobo)
8 gigs of ram is about ~$500

I suggest you go the route of that, and a video card made for this stuff.

CPU: Q6600 (Various prices lead it to be around $290)
Mobo: Any PCI-Express X16 supporting IMO
Video: FireGL or Quadro FX series... Anywhere from $1,000 to $2,500 and SLI is possible
Ram: 8GB DDR2-800 http://www.newegg.com/Product/...x?Item=N82E16820227195 (x2) ($459.98)

Note: You could also get 8gb ddr2-1066 for more than double the price, and with a P35 chipset mobo, it might actually be able to handle the b/w assuming it helps in this kind of thing. I don't suggest getting DDR3 as the latency/price make it not worth the price yet.

Polishwonder74 · Jul 24, 2007

Wow, that's a hell of a graphics card. I've been getting by pretty well with a crusty, old, AGP Radeon 9800.

This is exciting stuff, it looks like this could be a surprisingly reasonably-priced venture!

undeclared · Jul 24, 2007

For CAD you really want preferably a CAD card..

Unless you want a gaming card =/

oynaz · Jul 24, 2007

The OP wants a machine for rendering 3D animations. Not CAD, not gaming, and definitely has no need for a FireGL or Quadro FX! He needs the power for final rendering, that is CPU power.

I agree, Blender does fine with a Radeon 9800, but you might as well upgrade a bit when getting your new rig. Something in the region of a 7600GT or X1950 should suit you fine.

I cannot really help you with the CPU part, as I do not know how well Blender utilizes multiple cores.

imported_Tick · Jul 24, 2007

Blender claims to be fully threaded, so 8 cores should work. If you really want to go all out, how bout:

X7DBE
4x 4GB FB-dimms
2x E5345

Also, does blender use a scratch disk? Do you have a dedicated disk for scratch, as sometimes, this can make a huge difference.

In the end, it's about how much your time is worth to you. The above is the best of the best for workstation use. I will say this, though. I've seen much better preformance/dollar out of renderfarms rather than super high end workstations. Especially if you can buy recycled pc's for the farm. Or first gen Xbox's. That may be a much better way to go.

jpeyton · Jul 24, 2007

Although you could make use of 8 cores, the cost goes way up if you want that luxury. The cost/performance ratio of a desktop Q6600 machine is a lot better, even without overclocking, but especially with a mild overclock like 3GHz.

Polishwonder74 · Jul 24, 2007

Originally posted by: oynaz
I cannot really help you with the CPU part, as I do not know how well Blender utilizes multiple cores.

That is kind of the crux of my question. I am trying to figure out if it's worth it to go for 8 cores, or just go for one quad core.

Fullmetal Chocobo · Jul 24, 2007

Originally posted by: oynaz
The OP wants a machine for rendering 3D animations. Not CAD, not gaming, and definitely has no need for a FireGL or Quadro FX! He needs the power for final rendering, that is CPU power.

I agree, Blender does fine with a Radeon 9800, but you might as well upgrade a bit when getting your new rig. Something in the region of a 7600GT or X1950 should suit you fine.

I cannot really help you with the CPU part, as I do not know how well Blender utilizes multiple cores.

Have you ever actually set up a scene in a 3d rendering program? Yes, the CPU power does benefit you in rendering, but the video card is what aids you in the setting up of the scene. And trust me, it helps a lot. Going from a Quadro 4 980 XGL to a 6800 severly limited how quickly I could setup a scene in 3D Studio Max 6.

I did scene rendering (as a hobby) on a P3 all the way through a dual Xeon system. If you are doing this professionally, and time is money, then the cost of an 8 core system very well could be justified. If this is for a hobby, save the money, let it take a bit longer, and go with quad core. But by no means discount getting a Quadro card.

Hmmm. Blender doesn't have any certified drivers on the Quadro. I may stand corrected if the program doesn't take advantage of the video hardware.

heyheybooboo · Jul 24, 2007

I'm with Tick . . .

Build a $500 X2/C2D rig and research a Blender render farm. Make sure your new mobo has dual gigalan. Network your current rig into the render farm. The longer you hold out the better the price/performance of your builds. Stay at least one year behind new releases.

As you add new rigs salvage what you can and cycle out the dinosaurs.

Think how cheap those dual/quad core rigs are going to be in 12-18 months!

Polishwonder74 · Jul 24, 2007

Maybe we can swindle Anand into doing an article that benchmarks a few different setups in Blender, Yafray, etc.

There is an un-official benchmark going that relies on the honesty of fanboys that I found here:

http://www.eofw.org/bench/

But I would love to see an article written by the good peoples of Anandtech. Now that this most recent crop of CPUs can give you a sub-$1,500 rendering workhorse, it would seem pretty relevant.

How efficient is the render-farm approach? I have a couple old dinosaurs laying around, would I need to do like a linux/Dr Queue sort of thing? I'm kind of skeptical of that approach after seeing how badly a Core2duo laptop destroys my Barton 2500 in both rendering and baking physics animations. I've also never tried it, so for all I know you're absolutely right.

:beer:

Polishwonder74 · Jul 25, 2007

I found another good thread to add to our confusion here:

http://blenderartists.org/forum/showthread.php?t=69557

An argument is made that increasing RAM is overall more influential than increasing CPU power.

oynaz · Jul 26, 2007

Originally posted by: Fullmetal Chocobo

Originally posted by: oynaz
The OP wants a machine for rendering 3D animations. Not CAD, not gaming, and definitely has no need for a FireGL or Quadro FX! He needs the power for final rendering, that is CPU power.

I agree, Blender does fine with a Radeon 9800, but you might as well upgrade a bit when getting your new rig. Something in the region of a 7600GT or X1950 should suit you fine.

I cannot really help you with the CPU part, as I do not know how well Blender utilizes multiple cores.

Click to expand...

Have you ever actually set up a scene in a 3d rendering program? Yes, the CPU power does benefit you in rendering, but the video card is what aids you in the setting up of the scene. And trust me, it helps a lot. Going from a Quadro 4 980 XGL to a 6800 severly limited how quickly I could setup a scene in 3D Studio Max 6.

I did scene rendering (as a hobby) on a P3 all the way through a dual Xeon system. If you are doing this professionally, and time is money, then the cost of an 8 core system very well could be justified. If this is for a hobby, save the money, let it take a bit longer, and go with quad core. But by no means discount getting a Quadro card.

Hmmm. Blender doesn't have any certified drivers on the Quadro. I may stand corrected if the program doesn't take advantage of the video hardware.

You stand corrected, then ;-)
I am nothing but a dabbler in 3D design, but Blender behaves quite differently from 3D studio Max, and does not require as much GPU power when setting up a scene.
If cost is not much of an issue, though, you are right. A very good GPU does make a small difference, and might be worthwhile, especially if the rig needs to be used professionally.

Tristor · Jul 26, 2007

I've played around doing povray and blender on high-end workstations and on renderfarms before. I have to say that in my experiences, the renderfarm approach works a LOT better and it much cheaper than doing a high-end workstation. For the $3.5k you are going to spend building a monster workstation yourself, you could probably go to a local asset recovery places that buys used systems from businesses and get as many SFF PCs or slim towers as you can, cluster them together using Dyne:Bolic.

What you do, is you setup a "master system" (I usually choose the biggest/baddest system I have available, needs 2GB of RAM to work well, no need to even have a HDD except for scratch storage) and boot from the Dyne:Bolic LiveCD. And then you go to each other computer system and boot it using netboot/PXE, where it will go across the network, grab a DHCP address from the "master system", and then pull across a disk image to store in memory (which requires at least 256MB per client to work right, fyi) and boot from that. Each system will be running a consoled version of Linux which OpenMOSIX running, which will coordinate everything for you. After all systems are booted, there is a utility on the "master system" to check to see how many CPUs and RAM is available across the cluster, and what the load is. Then you just start up Blender and get to work. The nicest thing about this is that the clients don't need HDDs, and if any of them are multicored it justs works out even better.

I was able to build a respectable render farm to play around with for $900, using the desktop I already have as a "master system". I spent $15 a pop for the clients, got them in slim desktop cases with P3 750s, 256MB of RAM, and onboard NIC with no HDD each. You can do this headlessly too, which is nice. Just stack them in the corner, hook them all up, and if you start the systems up slow you can even out the power load. I ran a 60 node cluster split between two 20amp circuits in my bedroom with no problem.

For what you are doing, you should look for early P4 systems, Northwoods and Willamettes, with 256MB of RAM each, look for cheapo compaq/hp desktops. Or if you can find them, P3 1GHz boxes, as they are even more power efficient and have a lower heat output. Your concern isn't raw proc power in the clients, as you are going to have the aggregate power of hundreds of them, you want to be concerned with their peak power usage and their heat output, so as to lower your TCO/cost of running the farm. The nicest thing about a system like this is that OpenMOSIX is extremely fault tolerant, so if a system drops out it's not the end of the world, it just passes its threads to another node. So as you are finishing up your rendering, you can start to slowly shut down each of the systems (just power off, they don't store anything important on them, as they are diskless), finally leaving your finished product and your "master system" as the last one running, and just save it, and you are done. You can use the "master system" without the rest of the cluster too, just fine, so there's no danger at all of problems occuring due to breakers popping or overheat shutdowns on the clients.

Really, you could build something that would spank Blender's monkey for about $2k, and make a dual 2.66 clovertown workstation look like a joke. Also, just doing exactly what I did has more than 8GB of aggregate RAM, and since the RAM is usable across the cluster as a shared resource, it's way way better than that high-end workstation idea. (256*60 = 15GB of RAM + 2GB in my "master system", for 17GB of RAM available to the cluster)
If you've got a large room with serious cooling and decent power (6 20amp circuits and at least 10k btus of cooling) you can probably set up cheapo shelving, use rackmounted UPS and power strips, and just go buy a pallet load of used business desktops from the local place and come out under $2k and still win on the deal. I'm going to call the place I bought my stuff from later today and see what they have in stock by the pallet load, I bet I can find systems that will come out to beat it and be under your budget.

EDIT2:

Oh, and as another option, you could build your bad-ass workstation now, and upgrade 4 years down the road by doing the cluster and buying up tons of cheap C2D desktops to make up your clients. I could see something like that really rocking, just doing 30 nodes would give you roughly 4x the computing power I had with my 60, twice as much RAM (most C2D systems don't have less than a gig in them), and lower power consumption and heat output.

EDIT3:

Also, just FYI, nodes in an OpenMOSIX cluster don't have to be identical like they do in traditional clustering. They don't even have to be of the same architecture. You could just as easily throw 8 iMacs, 2 SparcStations, 5 P3 750 desktops, 9 dual Xeon MP 1GHz workstations, 2 PS3s, an Xbox, a hacked PS2, a hacked Gamecube, a 16 proc AlphaServer, a VAX, 2 iPaq PocketPCs, and a PalmPilot together with OpenMOSIX and make a cluster. The only requirement OpenMOSIX has is that each system is able to get an IP via DHCP. Of course, it complicates things with netbooting if you are mixing archs, since you have to have differently compiled software for each architecture (and may not be able to netboot it at all), but it is workable. So, my point is, throw together as many cheap systems as you have the power and cooling for without regard for architecture, cluster them, and go for it. Personally, I think for Blender it might even be better to go with old iMacs hacked to disable the monitor, since PPC is absolutely wonderful for cpu intensive operations.

EDIT4:

The only architecture issue you will run into is that the fastest results will come from using a unified architecture across the cluster with optimized code. So, your best performance option is having clusters that all match CPU wise at least with your "master system" and using all optimizations possible for that arch (like with P4s you have access to MMX, SSE, SSE2, and SSE3, and with later ones EM64T, which is 64bit instruction set identical to AMD64). The default builds of Dyne:Bolic are i386 with no optimizations for extra instructions sets (AFAIK). Also, the newest version of Dyne:Bolic removed the OpenMOSIX system, but you can make a custom LiveCD using the Dyne system to add OpenMOSIX back in (I don't know why they removed it from the default build).

I don't really think you'll see major performance differences between using SSE2 optimized code in a mixed SSE2/SSE3 environment, vs doing SSE3 optimization, but it is something worth considering.

Oh, and you can use other OpenMOSIX based liveCDs to make the cluster, and only use an OpenMOSIX enabled Dyne:Bolic on your "master system" or do an install of linux on the "master system" with OpenMOSIX installed that is customized to your needs. It is not required to run the "master system" disklessly or off a LiveCD or even any of the nodes. It is possible (although more expensive) to run an install of Linux with OpenMOSIX on every node, instead of netbooting. When I did my cluster I actually used ClusterKnoppix to do the DHCP, and netbooting, and then ran Slackware w/ OpenMOSIX on my desktop.

MichaelD · Jul 26, 2007

I nominate Tristor's post for "Most Informative Post Of The Year." :thumbsup:

Nice info on render farms! I've always wondered how they did that.

The problem w/farms is the HEAT and power consumption. I had a small farm (7 boxes) running W2K doing Folding@Home. I did a lot of WU's per day, but the problem was the boxes heated the room up like a stove.

Tristor · Jul 26, 2007

Originally posted by: MichaelD
I nominate Tristor's post for "Most Informative Post Of The Year." :thumbsup:

Nice info on render farms! I've always wondered how they did that.

The problem w/farms is the HEAT and power consumption. I had a small farm (7 boxes) running W2K doing Folding@Home. I did a lot of WU's per day, but the problem was the boxes heated the room up like a stove.

Yeah, I actually think when I had it running I spent more power keeping the room cool than I did running the farm. A 60 node cluster puts off close to 8k BTUs of heat a day, which is quite a bit for having sitting in my bedroom. That's the main thing that made me stop running the cluster, because otherwise I'd probably have it doing F@H now.

wwswimming · Jul 26, 2007

thanks for the description of the render farm.

what are the details on power consumption, the electricity bill ?

Tristor · Jul 26, 2007

Originally posted by: wwswimming

thanks for the description of the render farm.

what are the details on power consumption, the electricity bill ?

I don't have anything available to show offhand, but it's a safe bet to figure wattage per system for this application the following way:

((CPU Wattage + 30W) * opposite of PSU Efficiency rating) + CPU Wattage + 30W) = system power draw

When I did it the procs in all my systems were SL3VN coded P3 750s which have a rated TDP of 19.5W each. The PSUs contained in the systems were 150W PSUs with a 72% efficiency rating. So I can calculate it as ((19.5 + 30) * .28) + 19.5 + 30 = system power draw, which is 63.36W, rounded up to 65W per system. I had 60 nodes, so that comes out as 3.9kW of power draw for the cluster.

That leaves some room for fluctuations, so my power draw usually was probably closer to 3.5kW for the cluster but may have gotten as high as 4kW, depending on load and ambient temperature in the room (which can greatly affect power efficiency in electronics on a mass scale).

Just fyi, 3.9kW at 120VAC comes down to 32.5 Amperes of current draw, so the split between two 20amp circuits was just adequate.

Polishwonder74 · Jul 26, 2007

Tristor. . . You are the man. The MAN!!!

And I am DEFINITELY going to resurrect all of my old pieces of crap and experiment with this. If it works, my next step will be finding a local place that sells bulk refurbished computers.

I have a friend in aerospace engineering that does a lot of CFD, this just might help him out, too!

Thanks for taking the time to put together such a well-written and complete article. That was INSANELY helpful!!

Tristor · Jul 27, 2007

Originally posted by: Polishwonder74
Tristor. . . You are the man. The MAN!!!

And I am DEFINITELY going to resurrect all of my old pieces of crap and experiment with this. If it works, my next step will be finding a local place that sells bulk refurbished computers.

I have a friend in aerospace engineering that does a lot of CFD, this just might help him out, too!

Thanks for taking the time to put together such a well-written and complete article. That was INSANELY helpful!!

When you get around to doing it, if you need any help just PM me. I may or may not be able to answer the questions you have, but I'll give it a shot.

Also, for the funny, if you want to go with all new systems and spend a bit more for SSE3 hardware you can get Acer Q6600 system for $750 at CompUSA right now. Getting 4 of those and a cheap 8 port gigabit switch to put everything on would run you about $3k and give you 16 cpus with 8GB of aggregate RAM. I don't really recommend this though since you can still do way better with cheapo systems, but it would be kinda funny. Especially since you could take the 500GB drive out of 3 of them and stick them in the one system, and use that as your "master system", install linux on it, and netboot the rest (assuming they netboot, which they might not considering they are cheapo acers with onboard ethernet)

Polishwonder74 · Jul 27, 2007

I'd say that's probably a pretty good option, though. Think of the savings in heat/electricity.

Also, I was on the toilet yesterday thinking about this: I figure since you're distributing threads to a large bank of slower computers, there must be a critical number of computers where the delay that comes from sending a thread from the memory of the master computer through the NIC, to the network switch, to the NIC of the slave and into its CPU (instead of from a normal computer's RAM to its CPU) breaks even and subsequently becomes faster than just a normal computer. Have you seen this? Am I just being ridiculous since the extra computing power is significantly higher enough that the extra latency becomes negligible?

While I'm at it, you mentioned that I'll need to dig up an old distro. Is it still in existence on the internets anywhere?

BTW, I posted a link to this thread over at Blenderartists to see what I can do to get the word out. Your work could be to the rendering world what Napster and Kazaa were to the music pirating industry, how about that?? Maybe sometime soon farming work to multiple older computers will be so easy that grandma could do it. That outta piss Al Gore off, and I'll be buying stock in power companies

:beer:

Polishwonder74 · Jul 27, 2007

Holy hell, I just read a quick n' dirty article about how to set up a real Frankenstein-style render farm. The Live CD method seems waaaaaaay better. You could pull a computer out of a dumpster, plug in power and ethernet, throw in the Live CD and have 1 more node.

Article here:
Build Your Own Render Farm

Tristor · Jul 28, 2007

Nice article, btw. Basically, as far as the latency issue goes, I can't really say for sure on the numbers, but there is a point of diminishing returns when you are dealing with SMP, even if you are doing multiple cores/procs in a supercomputer type situation. Whether you are talking about a cluster of Cray X1s or a cluster of G4s like the Big Mac, everything is determined mathematically using Amdahl's Law. At some point, the gains given by adding more cpus to the cluster is less than the costs in splitting a process into another thread and/or the latency of transferring data between systems on the cluster.

You are correct in that there is a latency issue caused in clustering, which is even more profound in grid and distributed computing, but the way that OpenMOSIX implements threading helps to reduce the latency issue on local networks. Obviously, the best circumstances call for all nodes on a cluster to share a LAN segment, be physically near each other, and be on a high-bandwidth, low-latency connection. The best you could possibly do is 256 nodes on a 10gbit fiber LAN. Of course, that's not likely, and the differences between doing that and the same 256 nodes with 1gbit ethernet are going to be slim.

For your purposes, latency will essentially be a non-issue. I'd be more concerned about the efficiency of Blender's multi-threaded code, and how much of its code is parallelized. If you can calculate that in a meaningful way, you can use Amdahl's Law to find the optimal number of cpus in a cluster, as well as the maximum you could put in a cluster and see a worthwhile return. In real terms, you will probably never see an instance where consumer code in the wild (anything used for rendering would qualify here) would ever make a worthwhile use of over 256 cpus (which is the maximum that any one system can have on it, or any one cluster can have nodes, unless you start clustering clusters, which gets to be ridiculous except in highly-parallelized applications, mainly scientific).

Also realize, that for the purposes of most people doing this, they should be more concerned about the real ROI of the system, versus it's mathematical returns in performance. Even if you would gain by adding more nodes, if the cost increase is large enough it won't be justifiable. The best you can possibly do cost wise is have total cost of acquisition that is essentially $0, that ends up with you acquiring low-power systems which won't have a large draw or put off much heat, which lowers your TCO enough that if you make any money at all through your work you will be making a profit.

So, I guess for the too long; didn't read crowd, the answer is latency doesn't matter, and you will have a real financial incentive to limit your number of nodes long before you hit a point of diminishing returns.

Oh, and on the other side of the curve, yes there is a point where you have to have X number of nodes before you will outperform say an dual 4-core higher speed system. Usually you can figure on being able to base things on clock speed and number of total cpus/cores as a comparison, but since the gains from SMP are not linear, and the gains from clustering are even less so, that is no guarantee. Assuming you are using 1GHz P3s, I'd say you'd need to have at least 20 nodes to match a dual Xeon x3220 system (which is 8 2.4GHz cores). The real advantage to the render farm approach is a reduction in your total cost of acquisition, even if it costs more the run (which it usually does somewhat). With 1GHz P3s, if you are lucky you can find a steady supply of these which are free, just by dumpster diving, or get them rock-bottom at asset recovery stores by the pallet-full.

EDIT:

By the way, I am going to be doing a new clustering project soon if I can get the fundage. I've gotten back into being interested in video encoding, especially for HD stuff, and I'm working towards building the storage system and htpc necessary to take advantage of HD. Eventually I'm going to be building a small cluster (96 nodes) to do my encoding on, since x264 is highly-parallelized and extremely compute intensive.

EDIT2:

I realized I never answered your question about the older distro. I was referring to the old version of dyne:bolic. Really, on second consideration I'd say it'd even be better to just install whatever distro you are comfortable with on your "master system", set up OpenMOSIX there, and use whatever the latest release of ClusterKnoppix is on the rest of the systems. As far as the LiveCD approach goes, you don't necessarily have to put a CD in every node, all OpenMOSIX using LiveCDs support DHCP/netboot with PXE, so you can just put the LiveCD in one system and boot the rest over the network off of it disklessly, so that you don't need any drives in them at all. Obviously, you can just boot off the CD on every node if they don't support netboot for some reason, but it may complicate matters with trying to netboot other nodes as more than one DHCPd will be responding to queries.

EDIT3:

On another note, I have to say that Wikipedia is an absolutely excellent resource on Parallel Computing and especially on HPCC (High-Performance Clustered Computing). If you have a chance, read through the entirety of all the articles in the Parallel Computing category, starting with the articles on Amdahl's Law and Scalability. It is well worth your time. Also, I hope you enjoy applying math to solving problems, because that's what it's all about and half the explanations are mathematical equations (but if you understand them they explain well).

EDIT4:

Also, I wanted to point out that PXE/netboot is not all rainbows and sunshines. In fact, it's a major PITA to get to work if you don't have decent 3com or Intel NICs in every client system/node. Before you get started on this journey, you should read up about PXE and look at the EtherBoot Project. You will invariably end up doing one of 5 things to get PXE to work:

1. Trying to pick up some cheap 3Com 3Com905B-TX or Intel PRO/100 NICs (when I did my project I found the 905B-TXs for $1 a pop at the dollar computer shop). Even then, they may not have a bootrom on them, which means you will need a floppy-assisted netboot using EtherBoot

2. Buying the cheapest Realtek RL8139 based NIC you can find in bulk and flashing your own ROMs (which requires a rom burner) with Etherboot

3. Praying to the network gods that the integrated NIC in the motherboard supports netboot/PXE and/or you can rebuild your motherboard's BIOS with EtherBoot inside it, and get it to successfully flash

4. Doing one of the above and then having to deal with the nightmare that is BootP if for some reason it does support PXE but not DHCP

5. Doing one of the above and ending up still having to use a floppy-assist, even with a bootrom because the bootrom sucks

I had two very different experiences when I did an OpenMOSIX cluster. When I did one at school for a class project to accompany a presentation I gave in class about HPCCs, I had the easiest experience I've ever had. It was literally as simple as booting the "master system" off the LiveCD and going around to each client to change netboot to the top of their boot order in the BIOS and saving the settings. When I did the 60 node cluster at home, I ended up doing a floppy-assisted boot on each system since I was able to pick up the floppy drives and NICs for $1 each. Not only having to deal with some troubles because of defective or finicky used hardware, I also had to deal with floppies, which angers me just by their very existence (they were horrible when they were mainstream, now using a floppy is insulting with the advent of cheap flash memory and usb)

I'm not attempting to dissuade by any means, but don't expect it to be a breeze. It will be definitely something that can be a good learning experience and also fun, but you /will/ invariably encounter problems if not using new hardware, and even then you will either encounter problems or have to go to greater initial expense.

Also, invest in GOOD switches: Intel, HP, Cisco, and Nortel all make very good switches you can get for cheap. I highly recommend HP managed switches, as they have an excellent featureset, have wonderful throughput, and are some of the most easily managed switches I've ever dealt with. If you are a Cisco person, Cisco switches are great. I am not, however, a Cisco person, so I prefer to stick to Intel and HP switches. Don't buy random cheap 24 porters off ebay or newegg or wherever, get something decent and take the time to research before you buy. If you have questions or want suggestions, please do PM me. Your switch will make a big difference in how well your cluster performs as crap switches will end up getting bad throughput increasing latency to a noticeable level.

EDIT5:

In relation to wanting to do another cluster myself, I went and looked for used hardware and cheap new hardware. It turns out that going for used P3 systems is probably the best bet, because they have a significantly lower TDP than even the cheapest new procs (AMD Semprons). Unfortunately, there is nothing in the proper price-range on ebay right now, it looks like people are generally asking too much for the Optiplex GX150s that are common on ebay. If you can pick them up for less than $20 each with no HDD, that's probably your best bet as they are 1GHz P3 systems in SFF cases. Check locally first. When buying used hardware the opposite of new is almost always true, it's cheaper to get them local than to buy online.

It may even be worth it for performance reasons to just trying to find shell boxes of them, with no RAM, proc, or hdd, and buying the procs used separately. It looks like the SL6QU revision is one of the last P3 1GHz produced, and has a TDP of 12.1W with 512kb of cache and a 133MHz bus (which is better than what I was using beforehand). The proc notes say it was intended for server usage at the time, but perhaps you can find some that will be compatible with the workstation boards used in things like the Optiplex series. Really it's all trial and error, but the end result is the lower the TDP of the procs you use are, the lower you daily cost for running the cluster is, and the lower the cost of acquiring the individual nodes, the more nodes you can put in the cluster within your budget.

Your best friends right now are Intel Processor Spec Finder and AMD Compare. It seems like P3s are winners on TDP but I'm trying to find more info about legacy AMD procs. AMDCompare doesn't list anything older than the socket 754 64bit stuff.

If you want to go the shell box route, look for ebay auctions like this: http://cgi.ebay.com/ws/eBayISA...Item&item=250125573254 In fact, you might msg that guy and see if he has any available still (that was sold July 16th). I only see him listing 3 lots right now, and they are slower P3s that are as-is with known to be pent or missing pins meant for gold recovery. I doubt you'd want to put in the work necessary to repair those, so ask about guaranteed working lots like that. Cheap way to get the procs.

Tristor · Jul 28, 2007

I found the thread on BlenderArtists and one of the posters there brought up a good point in that queue managers are usually more efficient than using something like OpenMOSIX because of the nature of rendering animations (as I said I just played around with Blender and povray on the cluster, I am not an accomplished artist and I built the cluster mostly for a learning experience). So, I would say you might consider following his advice and looking at things like Dr. Queue and maybe even something lightweight (which you will have to adapt to your purpose) like rubyqueue.

OpenMOSIX does do process migration as opposed to thread migration by implementing a customized way of forking a process across the cluster. From my experience, multi-threading apps will have their threads spread across the cluster as well, as logically each cpu in each node is treated as a single cpu in a single system (being the master system) by software on that system. The latest version of Blender is now threaded, while the version included with ClusterKnoppix and Dyne:Bolic is not. Using a newer version of Blender paired with OpenMOSIX on the "master system" may possibly work just as well as using a queue manager. When I tried it out I used the version of Blender and Povray included, which did not support threading. YMMV

My reason for initially suggesting OpenMOSIX is due to my success with using it, and the fact that is eases setup of a cluster versus having to have a disked install on every node with a queue manager distributing work since OpenMOSIX is a Single System Image system. If a queue manager is indeed more efficient (which it seems it perhaps is for rendering purposes), then you may have to go to slightly more work to setup the cluster, and it may make it difficult for you to run it as a diskless environment (but should still be possible with some extra work and adaption on your part).

EDIT:

Shinobu posted on extremetech about his experiences using renderfarms in a professional environment, which he says are less than usable if they aren't high end systems. Considering that I was just playing around, I didn't really hit any bottlenecks or have problems and most of my animations were small. It seems that according to him if you are doing something like this with animations that require lots of lighting effects and what-not, that you are better off using high-end systems as the nodes or using a high-end workstation. You should probably take this into account.

This guy runs a rendering farm business, and says that Blender using yafray gets better results with low-end farms than other applications do. So, perhaps you are fine. I'm not sure, since I wasn't doing any hardcore rendering when I was playing around. I'm an IT geek that enjoys the hardware and network aspects of clustering, but the unique requirements of rendering are a bit beyond me, I'm thinking, upon further reading. My experiences may have been atypical due to me not as strenuously using the system as most professional renders would.

EDIT2:

I've been reading through a ton of docs about Blender and apparently its multithreading support is limited. Blender only supports 8 cpus/cores. So, the only way to harness more is going to be using a queue manager. However, with that few supported, it would almost be better to go ahead and spend the money on a serious high-end workstation, although it will be a bit beyond your budget of $2k. It'd be trivial to do an 8 node cluster of higher-end systems though, something like s939 Athlon64 Venice 3400+s. You could even manage it easily within budget, I would think. For 8 nodes, I'd really go with new hardware. Let me see if I can come up with a configuration that's going to be workable using a queue manager for blendering.

EDIT3:

I realized what I said in edit2 might not have been clear. To clarify, Blender only supports 8 cores at most on the system it is on, but when you use a queue manager in your farm instead of OpenMOSIX, each node is running it's own system and own instance of Blender, so you can have up to 8-cores per node with a queue manager like Dr. Queue (which looks perfect for your task and even includes Blender support). But it looks like it might be more productive to a smaller cluster with higher-end equipment, since each node has to render a full single frame at least. I was thinking cheap core2duo boxes, or even cheaper Sempron SFF boxes as nodes.

It looks like you could build a cluster of 1.86GHz C2D nodes for $464 a pop, and Sempron64 1.6GHz nodes for $311 a pop. Really, for doing the queue manager idea, it might be worth it to go this route. for $2k you could do 4 C2D nodes (which totals 8 cores) and then use your current system to be the "master", set up scenes, etc.

What's outside of that budget though is whether you do the queue manager or a traditional style of computer cluster, you are going to need some sort of network storage for the nodes to use. I'd recommend you get something like a Buffalo Linkstation or a ReadyNAS.

lxskllr · Jul 28, 2007

Originally posted by: Tristor
I also had to deal with floppies, which angers me just by their very existence (they were horrible when they were mainstream, now using a floppy is insulting with the advent of cheap flash memory and usb)

:thumbsup: It's good to hear about other people who hate floppies as much as I do. Stupid slow, grinding, data losing POSs

Thanks for your detailed, informative posts :thumbsup: It isn't much use to me, but I find this very interesting.

Suggestions for a very good 3D rendering machine

Senior member

Senior member

Senior member

Senior member

Platinum Member

Diamond Member

Moderator in SFF, Notebooks, Pre-Built/Barebones

Senior member

Moderator<br>Distributed Computing

Diamond Member

Senior member

Senior member

Platinum Member

Senior member

Lifer

Senior member

Banned

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

No Lifer