Nice article, btw. Basically, as far as the latency issue goes, I can't really say for sure on the numbers, but there is a point of diminishing returns when you are dealing with SMP, even if you are doing multiple cores/procs in a supercomputer type situation. Whether you are talking about a cluster of Cray X1s or a cluster of G4s like the Big Mac, everything is determined mathematically using
Amdahl's Law. At some point, the gains given by adding more cpus to the cluster is less than the costs in splitting a process into another thread and/or the latency of transferring data between systems on the cluster.
You are correct in that there is a latency issue caused in clustering, which is even more profound in grid and distributed computing, but the way that OpenMOSIX implements threading helps to reduce the latency issue on local networks. Obviously, the best circumstances call for all nodes on a cluster to share a LAN segment, be physically near each other, and be on a high-bandwidth, low-latency connection. The best you could possibly do is 256 nodes on a 10gbit fiber LAN. Of course, that's not likely, and the differences between doing that and the same 256 nodes with 1gbit ethernet are going to be slim.
For your purposes, latency will essentially be a non-issue. I'd be more concerned about the efficiency of Blender's multi-threaded code, and how much of its code is parallelized. If you can calculate that in a meaningful way, you can use Amdahl's Law to find the optimal number of cpus in a cluster, as well as the maximum you could put in a cluster and see a worthwhile return. In real terms, you will probably never see an instance where consumer code in the wild (anything used for rendering would qualify here) would ever make a worthwhile use of over 256 cpus (which is the maximum that any one system can have on it, or any one cluster can have nodes, unless you start clustering clusters, which gets to be ridiculous except in highly-parallelized applications, mainly scientific).
Also realize, that for the purposes of most people doing this, they should be more concerned about the real ROI of the system, versus it's mathematical returns in performance. Even if you would gain by adding more nodes, if the cost increase is large enough it won't be justifiable. The best you can possibly do cost wise is have total cost of acquisition that is essentially $0, that ends up with you acquiring low-power systems which won't have a large draw or put off much heat, which lowers your TCO enough that if you make any money at all through your work you will be making a profit.
So, I guess for the too long; didn't read crowd, the answer is latency doesn't matter, and you will have a real financial incentive to limit your number of nodes long before you hit a point of diminishing returns.
Oh, and on the other side of the curve, yes there is a point where you have to have X number of nodes before you will outperform say an dual 4-core higher speed system. Usually you can figure on being able to base things on clock speed and number of total cpus/cores as a comparison, but since the gains from SMP are not linear, and the gains from clustering are even less so, that is no guarantee. Assuming you are using 1GHz P3s, I'd say you'd need to have at least 20 nodes to match a dual Xeon x3220 system (which is 8 2.4GHz cores). The real advantage to the render farm approach is a reduction in your total cost of acquisition, even if it costs more the run (which it usually does somewhat). With 1GHz P3s, if you are lucky you can find a steady supply of these which are free, just by dumpster diving, or get them rock-bottom at asset recovery stores by the pallet-full.
EDIT:
By the way, I am going to be doing a new clustering project soon if I can get the fundage. I've gotten back into being interested in video encoding, especially for HD stuff, and I'm working towards building the storage system and htpc necessary to take advantage of HD. Eventually I'm going to be building a small cluster (96 nodes) to do my encoding on, since x264 is highly-parallelized and extremely compute intensive.
EDIT2:
I realized I never answered your question about the older distro. I was referring to the old version of dyne:bolic. Really, on second consideration I'd say it'd even be better to just install whatever distro you are comfortable with on your "master system", set up OpenMOSIX there, and use whatever the latest release of
ClusterKnoppix is on the rest of the systems. As far as the LiveCD approach goes, you don't necessarily have to put a CD in every node, all OpenMOSIX using LiveCDs support DHCP/netboot with PXE, so you can just put the LiveCD in one system and boot the rest over the network off of it disklessly, so that you don't need any drives in them at all. Obviously, you can just boot off the CD on every node if they don't support netboot for some reason, but it may complicate matters with trying to netboot other nodes as more than one DHCPd will be responding to queries.
EDIT3:
On another note, I have to say that Wikipedia is an absolutely excellent resource on Parallel Computing and especially on HPCC (High-Performance Clustered Computing). If you have a chance, read through the entirety of all the articles in the Parallel Computing category, starting with the articles on Amdahl's Law and Scalability. It is well worth your time. Also, I hope you enjoy applying math to solving problems, because that's what it's all about and half the explanations are mathematical equations (but if you understand them they explain well).
EDIT4:
Also, I wanted to point out that PXE/netboot is not all rainbows and sunshines. In fact, it's a major PITA to get to work if you don't have decent 3com or Intel NICs in every client system/node. Before you get started on this journey, you should read up about PXE and look at the
EtherBoot Project. You will invariably end up doing one of 5 things to get PXE to work:
1. Trying to pick up some cheap 3Com 3Com905B-TX or Intel PRO/100 NICs (when I did my project I found the 905B-TXs for $1 a pop at the dollar computer shop). Even then, they may not have a bootrom on them, which means you will need a floppy-assisted netboot using EtherBoot
2. Buying the cheapest Realtek RL8139 based NIC you can find in bulk and flashing your own ROMs (which requires a rom burner) with Etherboot
3. Praying to the network gods that the integrated NIC in the motherboard supports netboot/PXE and/or you can rebuild your motherboard's BIOS with EtherBoot inside it, and get it to successfully flash
4. Doing one of the above and then having to deal with the nightmare that is BootP if for some reason it does support PXE but not DHCP
5. Doing one of the above and ending up still having to use a floppy-assist, even with a bootrom because the bootrom sucks
I had two very different experiences when I did an OpenMOSIX cluster. When I did one at school for a class project to accompany a presentation I gave in class about HPCCs, I had the easiest experience I've ever had. It was literally as simple as booting the "master system" off the LiveCD and going around to each client to change netboot to the top of their boot order in the BIOS and saving the settings. When I did the 60 node cluster at home, I ended up doing a floppy-assisted boot on each system since I was able to pick up the floppy drives and NICs for $1 each. Not only having to deal with some troubles because of defective or finicky used hardware, I also had to deal with floppies, which angers me just by their very existence (they were horrible when they were mainstream, now using a floppy is insulting with the advent of cheap flash memory and usb)
I'm not attempting to dissuade by any means, but don't expect it to be a breeze. It will be definitely something that can be a good learning experience and also fun, but you /will/ invariably encounter problems if not using new hardware, and even then you will either encounter problems or have to go to greater initial expense.
Also, invest in GOOD switches: Intel, HP, Cisco, and Nortel all make very good switches you can get for cheap. I highly recommend HP managed switches, as they have an excellent featureset, have wonderful throughput, and are some of the most easily managed switches I've ever dealt with. If you are a Cisco person, Cisco switches are great. I am not, however, a Cisco person, so I prefer to stick to Intel and HP switches. Don't buy random cheap 24 porters off ebay or newegg or wherever, get something decent and take the time to research before you buy. If you have questions or want suggestions, please do PM me. Your switch will make a big difference in how well your cluster performs as crap switches will end up getting bad throughput increasing latency to a noticeable level.
EDIT5:
In relation to wanting to do another cluster myself, I went and looked for used hardware and cheap new hardware. It turns out that going for used P3 systems is probably the best bet, because they have a significantly lower TDP than even the cheapest new procs (AMD Semprons). Unfortunately, there is nothing in the proper price-range on ebay right now, it looks like people are generally asking too much for the Optiplex GX150s that are common on ebay. If you can pick them up for less than $20 each with no HDD, that's probably your best bet as they are 1GHz P3 systems in SFF cases. Check locally first. When buying used hardware the opposite of new is almost always true, it's cheaper to get them local than to buy online.
It may even be worth it for performance reasons to just trying to find shell boxes of them, with no RAM, proc, or hdd, and buying the procs used separately. It looks like the
SL6QU revision is one of the last P3 1GHz produced, and has a TDP of 12.1W with 512kb of cache and a 133MHz bus (which is better than what I was using beforehand). The proc notes say it was intended for server usage at the time, but perhaps you can find some that will be compatible with the workstation boards used in things like the Optiplex series. Really it's all trial and error, but the end result is the lower the TDP of the procs you use are, the lower you daily cost for running the cluster is, and the lower the cost of acquiring the individual nodes, the more nodes you can put in the cluster within your budget.
Your best friends right now are
Intel Processor Spec Finder and
AMD Compare. It seems like P3s are winners on TDP but I'm trying to find more info about legacy AMD procs. AMDCompare doesn't list anything older than the socket 754 64bit stuff.
If you want to go the shell box route, look for ebay auctions like this:
http://cgi.ebay.com/ws/eBayISA...Item&item=250125573254 In fact, you might msg that guy and see if he has any available still (that was sold July 16th). I only see him listing 3 lots right now, and they are slower P3s that are as-is with known to be pent or missing pins meant for gold recovery. I doubt you'd want to put in the work necessary to repair those, so ask about guaranteed working lots like that. Cheap way to get the procs.