I am running RC5-72 via Moo! Wrapper on four hosts at the moment, and get curious performance differences.
Moowrap has 3 CUDA application versions for Linux, and 1 for Windows (
https://moowrap.net/apps.php).
Distributed.net Client
v1.04 (cuda70) x86_64-pc-
linux-gnu:
This application uses one GPU per task.
On a 1080Ti dialed down to 220 W board power, I get:
about 7.7 billion keys/second (according to the log of the latest valid task)
2,400 GFLOPS (measured from the last few hundred consecutive valid tasks)
1.19 M boinc-PPD (calculated from the last 20 valid tasks)
about 220 W GPU power usage (configured), about 96 % GPU core utilization
Distributed.net Client
v1.04 (cuda60) x86_64-pc-
linux-gnu:
This application uses one GPU per task.
On a 1080Ti dialed down to 220 W board power, I get:
about 7.7 billion keys/second (according to the log of the latest valid task)
2,500 GFLOPS (measured from the last few hundred consecutive valid tasks)
1.22 M boinc-PPD (calculated from the last 20 valid tasks)
about 220 W GPU power usage (configured), about 96 % GPU core utilization
Distributed.net Client
v1.03 (cuda31) x86_64-pc-
linux-gnu:
This application uses all GPUs in the system at once for a task.
On a dual-1080Ti PC I get:
5.4...5.7 billion keys/s (according to the log of the latest valid task; there is more fluctuation than with the cuda70 and cuda60 application versions)
1,600 GFLOPS measured from the last few hundred consecutive valid tasks
a pitiful 0.48 M boinc-PPD per dual-GPU host! (calculated from the last 20 valid tasks)
about 2x 75 W GPU power usage, about 22 % GPU core utilization
The real PPD must be a bit nearer to the the v1.04 versions though, judging from the total points granted to the host which has these tasks.
Distributed.net Client
v1.03 (cuda31) windows_intelx86:
Again, uses all GPUs at once for a task.
I have been running this for a bit longer now on a triple-1080Ti PC, and due to the poor GPU core utilization I configured to run three of such tasks in parallel. I get:
1.2...1.4 billion keys/s, say the logs of the last few tasks. Huh?
1.86 M boinc-PPD per triple-GPU host (calculated from the last 20 valid tasks, taking into account that three jobs run in parallel)
3x 205 W GPU power usage, an average of 65 % GPU core utilization
(so that's 3x 68 W per task, and 22 % per task)
Long story short,
- the v1.03 cuda31 application version is very inefficient, compared with the v1.04 cuda60/70 versions.
- That's very bad news for Windows users.
I added my Linux PCs to moowrap.net just 2 weeks ago, and have them running it continuously only since a little over 2 days. Luckily, the server has sent all three Linux CUDA application version already. But sadly, it sent only cuda70 to host 1, only cuda60 to host 2 (apart from a single cuda70 task among them), and only cuda31 tasks to host 3.
I am now anxiously waiting for cuda60/70 tasks to be sent to this 3rd host. I wonder how long the scheduler will take to do that. I believe I have seen other projects sending out different application versions a lot sooner.