Ethereum GPU mining?

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

hawtdawg

Golden Member
Jun 4, 2005
1,223
7
81
I mined some a few months ago, and a single 980ti was getting about 20mh/s after overclocking, and now i can't get it to break 8? what gives?
 

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
Your firewall may be blocking the traffic?



There are some switches you can try.

First make sure you add this. It will get on new work faster. --farm-recheck 100 default is 500 (in ms)

You can also try to play around with the global and local work values.
--cl-global-work 4096 default is 4096, try 8192 or 16384 also. I use 16384 personally.
--cl-local-work 64 default is 64. 128 and 256 can be tried. I read some dev chatter that 64 was the most efficient for AMD because their wavefront is 64 threads. Nvidia should use 32.

Please note, there are not big gains to be had like tweaking for BTC mining.

Awesome thanks for the advice. I just added my MSI 390 and it's hitting around 20.5 Mh/s at 1075Mhz core with a small undervolt. I still have an Asus mini Geforce 970 to add so hopefully with these 4 cards and your tweaks I'll be at 100Mh/S. If not I'll throw a few Kaveri APU's at it

I also have an old 2GB 5870 sitting around, can you still mine on these older pre GCN cards?
 

sandorski

No Lifer
Oct 10, 1999
70,231
5,806
126
I mined some a few months ago, and a single 980ti was getting about 20mh/s after overclocking, and now i can't get it to break 8? what gives?

As these currencies age it becomes more difficult to calculate.
 

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
How many Furys is that for?

Two Fury X's at stock clocks. They're on an open bench so running cool and quiet. It feels weird to mine coins with almost no noise lol. I'll try the tweaks mentioned to see if I can break 30 per card.
 

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
That has absolutely nothing to do with hashrate

Where are you reading your speeds from? Maybe try DDU and reinstall latest drivers? Perhaps maybe try an older CUDA driver?

I'll be setting up my 970 soon so maybe I'll be of more help then.
 

hawtdawg

Golden Member
Jun 4, 2005
1,223
7
81
Where are you reading your speeds from? Maybe try DDU and reinstall latest drivers? Perhaps maybe try an older CUDA driver?

I'll be setting up my 970 soon so maybe I'll be of more help then.

I did that. I actually have 2 rigs, one with 780's and one with 980ti's and both are mining at like 1/3 their potential speed. I think it must have something to do with windows 10? I could install linux and mine that way and i doubt i'd have issues, but that's a bit of a hassle. I may go ahead and do that tomorrow night anyway.

edit: reading speeds from both ethminer and ethpool.
 

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
I did that. I actually have 2 rigs, one with 780's and one with 980ti's and both are mining at like 1/3 their potential speed. I think it must have something to do with windows 10? I could install linux and mine that way and i doubt i'd have issues, but that's a bit of a hassle. I may go ahead and do that tomorrow night anyway.

edit: reading speeds from both ethminer and ethpool.

Weird. Probably a CUDA specific issue with Windows 10. FWIW my two rigs are running Windows 10 as well but OpenCL.

Edit: I'll update the thread if my 970 runs into speed issues with 10 as well.
 

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
My R9 380 Nitro runs like a dream in Windows 10 with 20MH/s and excellent desktop response, when playing games its falling to 15MH/s, GCN just rules in OpenCL computing and everything multitasking.
 

beginner99

Diamond Member
Jun 2, 2009
5,231
1,605
136
2. If you're using my directions, then you're going to get paid via pool 2x/day. You can see how much you're making via: http://eth.nanopool.org/account/0xyour_address

How do you get paid out? Will it be visible in your account in Ethereum Wallet? Asking this because my balance is still 0.

EDIT:

Ignore that. Seems that the wallet App has a bug. It just was disconnected (0 Peers). Simply restarting it solved the issue.

Note that prior to restart I also synced my system time with a time server because I read that that solved certain issue. So it could be if this particular problem occurs you need to sync time and restart the wallet.
 
Last edited:

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
25MH/s on 290 running stock clocks and heavy undervolt. Drops to 20 MH/s when playing games.
 

zagitta

Member
Sep 11, 2012
27
0
0
Hmm, not sure if Fury X's are optimized for or perhaps the pool is being overloaded. Getting average of 57.5 Mh/s at http://eth.nanopool.org. Stock clocks.

Some super rough napkin math suggests the theoretical limit for fury is around 78MH/s based on memory accesses being the limiting factor.
The math is that one hash takes 2^16 bits (8Kb) of memory accesses in 128 byte chunks so with 512 Gbit/s memory bw we arrive at around 78MH/s theoretical max.

Looking at the OpenCL kernels suggests the code is pretty poor so I'd expect pretty large optimizations being possible for Fury (X).
 
Last edited:

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
Awesome thanks for the advice. I just added my MSI 390 and it's hitting around 20.5 Mh/s at 1075Mhz core with a small undervolt. I still have an Asus mini Geforce 970 to add so hopefully with these 4 cards and your tweaks I'll be at 100Mh/S. If not I'll throw a few Kaveri APU's at it

I also have an old 2GB 5870 sitting around, can you still mine on these older pre GCN cards?

FYI I played around with the switches you provided. I wasn't able to hit your speeds but got close. I'm hitting around 60Mh/s with two Fury X's now.

The following gave me the best results so far.

"ethminer.exe -F http://eth1.nanopool.org:8888/"addres"/miner1 -G -t 2 --cl-global-work 32768 --farm-recheck 100 --cl-local-work 32"

More interesting is my MSI 390 at -25mv and -10 board power running 1100Mhz core / default memory is giving me almost 30MH/s so approx the same speed as the Fury X's are. And this is before trying the above tweaks! I'm guessing this program was optimized for Hawaii and the Fury has a bunch of untapped potential left given how many more shaders and memory bandwidth it has at its disposal.

I'm still puzzled how you're getting 32MH/s though. Would you mind sharing your command line arguments exactly?

FYI my 2GB 5870 fails to start due to "insufficient memory". Anyone else here with 2GB card get theirs to work? This card uses to kill it at BTC mining, would be a shame not to use it.
 
Last edited:

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
Some super rough napkin math suggests the theoretical limit for fury is around 78MH/s based on memory accesses being the limiting factor.
The math is that one hash takes 2^16 bits (8Kb) of memory accesses in 128 byte chunks so with 512 Gbit/s memory bw we arrive at around 78MH/s theoretical max.

Looking at the OpenCL kernels suggests the code is pretty poor so I'd expect pretty large optimizations being possible for Fury (X).

I haven't looked at the code yet (probably wouldn't understand it if I tried) but thanks for listing the theoretical limits of the Fury X. Something is holding back these cards, they're clearly faster at compute compared to Hawaii cards and I don't think it's the number of ROPS holding it back.
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
FYI my 2GB 5870 fails to start due to "insufficient memory". Anyone else here with 2GB card get theirs to work? This card uses to kill it at BTC mining, would be a shame not to use it.
2GB Pit Cairns and Tahiti LEs do work, so it may just be that a 5870 is too old.
 

Despoiler

Golden Member
Nov 10, 2007
1,967
772
136
FYI I played around with the switches you provided. I wasn't able to hit your speeds but got close. I'm hitting around 60Mh/s with two Fury X's now.

The following gave me the best results so far.

"ethminer.exe -F http://eth1.nanopool.org:8888/"addres"/miner1 -G -t 2 --cl-global-work 32768 --farm-recheck 100 --cl-local-work 32"

More interesting is my MSI 390 at -25mv and -10 board power running 1100Mhz core / default memory is giving me almost 30MH/s so approx the same speed as the Fury X's are. And this is before trying the above tweaks! I'm guessing this program was optimized for Hawaii and the Fury has a bunch of untapped potential left given how many more shaders and memory bandwidth it has at its disposal.

I'm still puzzled how you're getting 32MH/s though. Would you mind sharing your command line arguments exactly?

FYI my 2GB 5870 fails to start due to "insufficient memory". Anyone else here with 2GB card get theirs to work? This card uses to kill it at BTC mining, would be a shame not to use it.

You are going to want a local work size of 64, 128, 192, or 256 for AMD. 64 is one wavefront. It's the smallest amount of work that can fill a CU. 32 is going to only use about half a CU thus wasting the other half. Nvidia should use 32 as a min because that is the size of their warps. See below link.

The fundamental unit of work on AMD GPUs is called a wavefront. Each wavefront consists of 64 work-items; thus, the optimal local work size is an integer multiple of 64 (specifically 64, 128, 192, or 256) work-items per work-group.

http://developer.amd.com/tools-and-...ncl-optimization-guide/#50401334_pgfId-458820

After more experimentation I'm using
--cl-global-work 16384
--cl-local-work 256

The OpenCL Optimization Guide says to use the largest global possible. I don't think you want to get too crazy, but 4096 is small. The global size is the total size of the problem array. If you want to find blocks you want to have more data to work with. Local work you have to experiment with. I believe it's better to stack work up so that your CUs are always being utilized. 256 would be 4 wavefronts of work. I think this ties in with what Zagitta said about memory bandwidth.

http://haifux.org/lectures/267/OpenCL_Dos_and_Donts.pdf
 

metalliax

Member
Jan 20, 2014
119
2
81
You are going to want a local work size of 64, 128, 192, or 256 for AMD. 64 is one wavefront. It's the smallest amount of work that can fill a CU. 32 is going to only use about half a CU thus wasting the other half. Nvidia should use 32 as a min because that is the size of their warps. See below link.



http://developer.amd.com/tools-and-...ncl-optimization-guide/#50401334_pgfId-458820

After more experimentation I'm using
--cl-global-work 16384
--cl-local-work 256

The OpenCL Optimization Guide says to use the largest global possible. I don't think you want to get too crazy, but 4096 is small. The global size is the total size of the problem array. If you want to find blocks you want to have more data to work with. Local work you have to experiment with. I believe it's better to stack work up so that your CUs are always being utilized. 256 would be 4 wavefronts of work. I think this ties in with what Zagitta said about memory bandwidth.

http://haifux.org/lectures/267/OpenCL_Dos_and_Donts.pdf

Thanks for this. I am now getting ~24 Mh/sec on my 290x, running at ~900Mhz and undervolted by 50mV. System power draw is ~215W, so I'll assume wall-power for 290x is close to 170W.
 

Despoiler

Golden Member
Nov 10, 2007
1,967
772
136
Thanks for this. I am now getting ~24 Mh/sec on my 290x, running at ~900Mhz and undervolted by 50mV. System power draw is ~215W, so I'll assume wall-power for 290x is close to 170W.

With the defaults I was getting 26.4 Mh/s on my unlocked Fury. With my tweaks I am @ 35.8 and gaining for the last 6 hours.
 
Last edited:

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
You are going to want a local work size of 64, 128, 192, or 256 for AMD. 64 is one wavefront. It's the smallest amount of work that can fill a CU. 32 is going to only use about half a CU thus wasting the other half. Nvidia should use 32 as a min because that is the size of their warps. See below link.



http://developer.amd.com/tools-and-...ncl-optimization-guide/#50401334_pgfId-458820

After more experimentation I'm using
--cl-global-work 16384
--cl-local-work 256

The OpenCL Optimization Guide says to use the largest global possible. I don't think you want to get too crazy, but 4096 is small. The global size is the total size of the problem array. If you want to find blocks you want to have more data to work with. Local work you have to experiment with. I believe it's better to stack work up so that your CUs are always being utilized. 256 would be 4 wavefronts of work. I think this ties in with what Zagitta said about memory bandwidth.

http://haifux.org/lectures/267/OpenCL_Dos_and_Donts.pdf

Strange, if I set anything higher than 32 my MH/s go down! Hmmm..

Perhaps it's because I'm running two Fury X's on one system?

I tried enabling/disabling crossfire and Freesync. Both settings made no difference in speeds.

Are you overclocking your unlocked Fury? What OS are you running? I'm still a good 4-5Mh/s slower than you.
 

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
With the defaults I was getting 26.4 Mh/s on my unlocked Fury. With my tweaks I am @ 35.8 and gaining for the last 6 hours.

How are you gaining over time? Are you looking locally or at the pool speeds? If your pool is PPLNS based you can't judge the speed of your cards from it.
 

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
Thanks for this. I am now getting ~24 Mh/sec on my 290x, running at ~900Mhz and undervolted by 50mV. System power draw is ~215W, so I'll assume wall-power for 290x is close to 170W.

Hawaii is very power efficient when undervolted, especially the reference cards with Volterra VRM's. Good stuff!
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Thanks. What Etherium app is working with 2GB Pitcairns and Tahitis?
I run Windows so I've been using the miner recommended by coinotron for Etherium:

https://coinotron.com/app?action=help

I still have a bunch of Win 8 systems that were first setup for Litecoin and was happy to find that they could still mine Etherium with Catalyst 14.4 which is the last release for Win 8. Other systems running Win 7 are using the Crimson drivers.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |