SETI@Home is a special case. Efficiency of their current mainline GPU application is better than that of their mainline CPU application, but then again only at the same level as third party optimized CPU applications.
In the SETI@Home Wow Event 2017, I ran among else a host which has 2 GTX 1080Ti and 1 GTX 1080.* Most of the time during the event I ran 3 tasks in parallel on each card.** This host was measured with
300 GFLOPS during the event. As far as I understand, this number is per task, which would mean an average of 900 GFLOPS on each of my cards.
Since performance scales nearly linearly with shader count of a card when the card is fully utilized by the application, that would mean about
990 GFLOPS on
1080Ti and 710 GFLOPS on 1080 (again as the sum of GFLOPS of the three tasks per card).
I also ran dual Xeon E5-2690v4 which was measured at
23.6 GFLOPS using the optimized CPU application. I presume this number means GFLOPS per logical CPU, hence 660 GFLOPS for this 14C/28T processor (and double for the dual socket host).
And dual
Xeon E5-2696v4: 20 GFLOPS per logical CPU presumably, or
880 GFLOPS for the 22C/28T processor.
The optimized SETI application makes use of AVX. Therefore it is difficult to tell how Broadwell-EP's performance translates to Threadripper's performance, since their AVX units are built differently.
@Markfw had his Threadripper running Asteroids@Home recently, and I the E5-2696v4. Judging from the average task durations on both processors, Threadripper 1950X comes very close in performance to E5-2696v4. Asteroids@Home uses AVX too (on processors which support it, like the ones discussed). Whether or not this also means similar SETI@Home performance is not clear to me.
Caveat:
There may be grave mistakes in my calculations.
not too many people buying 7-800 dollar graphics cards to run seti.
Performance per Watt of 1080Ti and 1080 is identical (considering only power consumption of the card, not of the host). The choice between them is therefore decided by density, cooling, and other considerations.
------------
*) I need to clean this up; it's better to have only cards of the same model in a host.
**) One task per card would not saturate the cards; saturation with 2 tasks would have been OK, but 3 was slightly better still, and 3x3 tasks was a convenient way for me to overcome Maintenance Tuesday. The application has command line parameters which can be tweaked for particular GPU models, which I presume could be a way to saturate big cards with just one task at a time, but I don't know how to work with these parameters.