Thank you for your thoughts. I'm grateful to have input from others because it is tedious working alone.If there is a "Trial" and example software to test, you could post in this sub-forum. Users have many different CPU and RAM configurations that would likely be willing to run on various hardware.
Distributed Computing
Discuss SETI@Home, F@H, FaD, CPDN, SoB, E@H, BOINC and other distributed computing topics here.forums.anandtech.com
Other options, besides regular commercial cloud cpu that you might be able to get run-time stats from specific cpu / ram combinations include places like:
Charity Engine
www.charityengine.com
I wonder if a large LL3 cache would not speed up this kind of task some or a bunch? Either 3dX AMD or an Intel large unified cache cpu? Might require few threads per task?
There is a well-defined difference between parallel and distributed computing -- at least among high-performance computing professionals. The application being discussed is designed for parallel, not distributed computing. Licensing issues are a major road block to the latter.
I've been looking at the cache issue and agree. Each time the program performs a regex search on a chromosome String of size ~20 MB, I watch latencies of 0.01 sec occur at regular intervals.
I've been spoiled in the past by supercomputers with 16+ processors on the same memory backplane and consequently the same number of memory channels. At present I have 1 processor with 10 cores and 2 memory channels. Optimizing for shortest run time is a new challenge for me.