RC5-72 - 1080Ti key rate

LANMAN · Dec 22, 2017

Just wanted to post that the key rate for a dual 1080ti setup is a whopping 8.1 billion keys per second in the RC5-72 challenge.

Also when you compare the processing power of when the project started (early 2003 give or take) I remember when most CPU's were downloading single blocks of work. Now the average work size is 64 blocks.
I would really like to know the setup of the top performers - hitting around 1 million blocks per day, but its great to see distributed.net still has some people out there hammering away on this beast.
With all the crypto miners etc.. I can only imagine how fast the keyspace would be chewed up with even a fraction of the GPU's being sold today.

That being said.. I will finally have time today to try the F@H Linux Mint roll out on a 1070.

--LANMAN

Orange Kid · Dec 22, 2017

Good to see ya hanging around LANMAN. Now get those 1080ti's on folding.

Kiska · Dec 22, 2017

LANMAN said:
Just wanted to post that the key rate for a dual 1080ti setup is a whopping 8.1 billion keys per second in the RC5-72 challenge.

Hrm, dual r9 280x would net around 7.5-8 billion keys per second.

Now you are tempting me to get another r9 280x and really make my system dual r9 280x. Currently I have 1 r9 280x doing about 3.5-3.75 billion keys per second, provided its in winter

LANMAN · Dec 22, 2017

Orange Kid said:
Good to see ya hanging around LANMAN. Now get those 1080ti's on folding.

LANMAN · Dec 22, 2017

I know its a little early....(sales just started a bit ago) Anyone know someone with a Titan V?
Not investing anything that expensive until I see some results.

SETI, RC5 or F@H.. anything. I"m not picky.. lol

Ken g6 · Dec 22, 2017

LANMAN said:
I know its a little early....(sales just started a bit ago) Anyone know someone with a Titan V?
Not investing anything that expensive until I see some results.

SETI, RC5 or F@H.. anything. I"m not picky.. lol

You could say I know someone: https://www.anandtech.com/show/12170/nvidia-titan-v-preview-titanomachy/5

I think you want to look at the single precision results for most projects.

StefanR5R · Feb 27, 2018

I am running RC5-72 via Moo! Wrapper on four hosts at the moment, and get curious performance differences.

Moowrap has 3 CUDA application versions for Linux, and 1 for Windows (https://moowrap.net/apps.php).

Distributed.net Client v1.04 (cuda70) x86_64-pc-linux-gnu:

This application uses one GPU per task.
On a 1080Ti dialed down to 220 W board power, I get:

about 7.7 billion keys/second (according to the log of the latest valid task)
2,400 GFLOPS (measured from the last few hundred consecutive valid tasks)
1.19 M boinc-PPD (calculated from the last 20 valid tasks)
about 220 W GPU power usage (configured), about 96 % GPU core utilization

Distributed.net Client v1.04 (cuda60) x86_64-pc-linux-gnu:

This application uses one GPU per task.
On a 1080Ti dialed down to 220 W board power, I get:

about 7.7 billion keys/second (according to the log of the latest valid task)
2,500 GFLOPS (measured from the last few hundred consecutive valid tasks)
1.22 M boinc-PPD (calculated from the last 20 valid tasks)
about 220 W GPU power usage (configured), about 96 % GPU core utilization

Distributed.net Client v1.03 (cuda31) x86_64-pc-linux-gnu:

This application uses all GPUs in the system at once for a task.
On a dual-1080Ti PC I get:

5.4...5.7 billion keys/s (according to the log of the latest valid task; there is more fluctuation than with the cuda70 and cuda60 application versions)
1,600 GFLOPS measured from the last few hundred consecutive valid tasks
a pitiful 0.48 M boinc-PPD per dual-GPU host! (calculated from the last 20 valid tasks)
about 2x 75 W GPU power usage, about 22 % GPU core utilization

The real PPD must be a bit nearer to the the v1.04 versions though, judging from the total points granted to the host which has these tasks.

Distributed.net Client v1.03 (cuda31) windows_intelx86:

Again, uses all GPUs at once for a task.
I have been running this for a bit longer now on a triple-1080Ti PC, and due to the poor GPU core utilization I configured to run three of such tasks in parallel. I get:

1.2...1.4 billion keys/s, say the logs of the last few tasks. Huh?
1.86 M boinc-PPD per triple-GPU host (calculated from the last 20 valid tasks, taking into account that three jobs run in parallel)
3x 205 W GPU power usage, an average of 65 % GPU core utilization
(so that's 3x 68 W per task, and 22 % per task)

Long story short,

the v1.03 cuda31 application version is very inefficient, compared with the v1.04 cuda60/70 versions.
That's very bad news for Windows users.

I added my Linux PCs to moowrap.net just 2 weeks ago, and have them running it continuously only since a little over 2 days. Luckily, the server has sent all three Linux CUDA application version already. But sadly, it sent only cuda70 to host 1, only cuda60 to host 2 (apart from a single cuda70 task among them), and only cuda31 tasks to host 3.

I am now anxiously waiting for cuda60/70 tasks to be sent to this 3rd host. I wonder how long the scheduler will take to do that. I believe I have seen other projects sending out different application versions a lot sooner.

[H]Coleslaw · Feb 27, 2018

have you tried the Windows OpenCL v1.04 version instead of the CUDA version? My teammates found it to be much better. https://hardforum.com/threads/moo.1829187/#post-1043279596

StefanR5R · Feb 27, 2018

For reference, here is an example app_config.xml which affects the cuda31 tasks, but not the cuda60/70 tasks:

Code:

<app_config>
    <app_version>
        <app_name>dnetc</app_name>
        <plan_class>cuda31</plan_class>
        <!-- default: 0.2 CPUs + all GPUs -->
        <!-- <avg_ncpus>0.2</avg_ncpus> -->
        <ngpus>0.5</ngpus>
    </app_version>
</app_config>

On a dual GPU system, this results in four tasks running in parallel in total. And since each task uses each GPU by 22 % at average, this results in total GPU utilization of 88 % at average.

[H]Coleslaw said:
have you tried the Windows OpenCL v1.04 version instead of the CUDA version? My teammates found it to be much better. https://hardforum.com/threads/moo.1829187/#post-1043279596

Oh, thanks a lot for this. Haven't tried this yet.

A while ago I attempted to write my own app_info for another project (don't remember which one), and failed miserably.

From what I read at your thread, and given that my Windows host which ran Moo! Wrapper at several occasions now (added 11 months ago) --- am I to conclude that moowrap.net's scheduler is seriously buggy and never sends alternative application versions?

(Earlier today I also browsed the moowrap.net forum for a little bit because of this issue, but did not notice a discussion of this. I only went away with the impression that there is not a lot of technical help to get there.)

[H]Coleslaw · Feb 27, 2018

Without doing the app_info, I have never received the OpenCL application on any of my nVidia cards. So, I would say your assumption is correct.

[H]Coleslaw · Feb 27, 2018

I will also say that I have never seen their forums well supported since its beginning. Most of the real help comes from other users sharing what they have found. Some of the oddities that come about have also been somewhat in the DNet clients in the past. Like the issue with one card getting load while another card not showing load issue. I found that in the DNet client a few years back. It also wasn't using the proper GPU that was assigned to it either. I had a test rig at work for a small period with an 8400 GS and a 210 in it. Kept doing various experiments but couldn't get it to fully behave. I forget whom I was working with on this without actually going back through emails. But I have no idea if they ever fixed any of the issues. I haven't really ran their client much since.

StefanR5R · Feb 28, 2018

It's a pity that moowrap.net is down at the moment. I wonder if the TeAm would have become top producer today:
http://stats.free-dc.org/stats.php?page=teams&proj=moo&sort=yesterday

crashtech · Mar 1, 2018

Hey guys, is there any reason why Moo! wrapper won't work on an HD 7990? I've played with different driver revisions on its platform, Asus B150 Pro Gaming D3 with a Pentium G4560 and Win10x64 OS to no avail. BOINC downloads the WUs and pretends to run them, but the GPU(s) stay idle. Einstein and Milkyway crunch just fine, though.

StefanR5R · Mar 2, 2018

I see from your hosts list that you run moowrapper on a host with single Tahiti GPU just fine. Have you tried different Crossfire settings on the 7990?

Edit,
apparently AMD's control center lost the Crossfire related settings in the Crimson driver, and the older control center from Catalyst 15.7.1 WHQL may be needed to get to these settings:
https://community.amd.com/thread/202396

crashtech · Mar 2, 2018

Hmm, I didn't think Crossfire came into play, but perhaps an even older driver should be tried. The puzzling part to me is that all other GPU projects work fine on both GPUs, and they can be independently controlled when ULPS is turned off.

StefanR5R · Mar 2, 2018

First!
http://stats.free-dc.org/stats.php?page=teams&proj=moo&sort=today

crashtech · Mar 2, 2018

Wow, outdoing Gridcoin!

ultimatebob · Mar 2, 2018

Wait... they are STILL trying to crack RC5-72? I remember being part of the team that helped crack RC5-64 back in 2002.

It makes me wonder why I've been busy sunsetting "weak" RSA 128 bit ciphers on my servers for 256 bit ones, when it's taken over 15 years to crack RC5-72 with little success.

StefanR5R · Mar 3, 2018

Yes, there is still a prize up for grabs. But remember, that we are still trying, doesn't mean that "THEY" haven't succeeded already. ;-)

StefanR5R · May 10, 2018

Re Moo! Wrapper application versions:

[H]Coleslaw said:
have you tried the Windows OpenCL v1.04 version instead of the CUDA version? My teammates found it to be much better. https://hardforum.com/threads/moo.1829187/#post-1043279596

StefanR5R said:
Oh, thanks a lot for this. Haven't tried this yet.

A while ago I attempted to write my own app_info for another project (don't remember which one), and failed miserably.

From what I read at your thread, and given that my Windows host which ran Moo! Wrapper at several occasions now (added 11 months ago) --- am I to conclude that moowrap.net's scheduler is seriously buggy and never sends alternative application versions?

[H]Coleslaw said:
Without doing the app_info, I have never received the OpenCL application on any of my nVidia cards. So, I would say your assumption is correct.

There is news on this issue:

Teemu Mannermaa said:
BOINC Scheduler changes for multiple app version case
BOINC Scheduler has had problems sending different app versions to clients when there's multiple possible versions for a platform. For example, this happens when there's both OpenCL and Stream/CUDA or both 32-bit and 64-bit CPU app version available. To hopefully fix this our scheduler has been changed to send each app version until it has enough host specific speed samples. Only exception is when that version has been failing.

Please report any problems of getting work or having them fail more often in our forums. Thank you and happy crunching!
2 May 2018, 12:00:34 UTC · Discuss

So that sounds good.

However, there is one user in this thread reporting failures of the CUDA application on GTX 1080. It fails for me on GTX 1070 too. Need to investigate.

All tasks fail with "No OpenCL platforms available!". - edit: Fixed by reboot of the PC.

StefanR5R · May 12, 2018

The fix to moowrap.net's scheduler works for me so far.

On a GTX 1070 under Windows, it took probably less than 2 hours for client and server to try out the two possible applications (cuda31 and opencl_nvidia_101), after which it settled for the latter which utilizes the card's CUs and power budget fully with one task at a time.

On a GTX 1080Ti under Linux, it took maybe 3 hours to try the four Nvidia+Linux applications (cuda31, cuda60, cuda70, opencl_nvidia_101), after which it settled for the opencl one too.

BTW, CPU background load seems to impact the CUDA application versions more than the OpenCL application. But even without additional CPU load, performance of the OpenCL application is better than the CUDA ones.

--------
edit
Apparently the scheduler sends 16 tasks for each application version in order to determine which one performs best on a given host.

StefanR5R · Nov 25, 2018

StefanR5R said:
The fix to moowrap.net's scheduler works for me so far.

On a GTX 1070 under Windows, it took probably less than 2 hours for client and server to try out the two possible applications (cuda31 and opencl_nvidia_101), after which it settled for the latter which utilizes the card's CUs and power budget fully with one task at a time.

On a GTX 1080Ti under Linux, it took maybe 3 hours to try the four Nvidia+Linux applications (cuda31, cuda60, cuda70, opencl_nvidia_101), after which it settled for the opencl one too.

I attached new client instances yesterday, and the process is broken on multi-GPU systems:

On Linux, again all four versions are received (one after another, about 16 tasks of each), and the server records the performance of each. Real performance on Pascal is still cuda31 < cuda60 < cuda70 < opencl_nvidia_101. cuda60, cuda70, and opencl_nvidia_101 use one GPU per task. cuda31 uses all GPUs in a system at once, and therefore has shortest run time on dual and triple GPU systems.

Now the problem: When the server determines relative performance of these application versions, it fails to take into account that only one cuda31 task can be performed at a time, but 2 or 3 cuda60/cuda70/opencl_nvidia_101 tasks are run at a time on a dual or triple GPU host. (Edit: And thus, the server ends up sending only cuda31 to multi GPU hosts after the probing period.)

I am now going to write an app_info for my Linux/Pascal hosts, based on the Windows app_info which phoenicis posted.

Edit:
My app_info.xml is OK by itself, but the server sends me short tasks with 1/12 the run time but only 1/24 the credit per task. I will remove the app_info.xml, and use an app_config.xml instead which runs multiple cuda31 tasks at a time for a level playing field with cuda60/cuda70/opencl_nvidia_101.

XML:

<app_config>
    <app_version>
        <app_name>dnetc</app_name>
        <plan_class>cuda31</plan_class>
        <!-- default: 0.2 CPUs + all GPUs -->
        <!-- <avg_ncpus>0.2</avg_ncpus> -->
        <ngpus>1</ngpus>
    </app_version>
</app_config>

I reset the project and applied the app_config.xml. I hope the server begins to test the four application versions again, but this time correctly identifies cuda31 as the slowest of them all.

Edit 2:
There is one flaw in my plan. When a opencl_nvidia_101 task is being run concurrently with a cuda31 task, the latter detrimentally affects the performance and FLOPS bookkeeping of the former. For this process to settle in, one should watch the hosts and manually ensure that cuda31 is not run at the same time as cuda60/cuda70/opencl_nvidia_101.

ultimatebob · Nov 25, 2018

You guys should start advertising this in the Video Card forum. There are a more than a few frustrated cryptocurrency miners in there that are looking for new uses for their high end Nvidia gaming rigs.

lane42 · Nov 25, 2018

https://setiathome.berkeley.edu/result.php?resultid=7176161848.....1080Ti, blc06

https://setiathome.berkeley.edu/result.php?resultid=7176135088....TiTan V, blc06

lane42 · Nov 25, 2018

Links don't seem to work for me, don't know why, but it's a 1080TI @ about 52 seconds
a workunit on seti, blc06. TiTanV @ about 31 seconds for a blc06.

RC5-72 - 1080Ti key rate

Platinum Member

Elite Member

Golden Member

Platinum Member

Platinum Member

Programming Moderator, Elite Member

Elite Member

Member

Elite Member

Member

Member

Elite Member

Lifer

Elite Member

Lifer

Elite Member

Lifer

Lifer

Elite Member

Elite Member

Elite Member

Elite Member

Lifer

Diamond Member

Diamond Member