Oh great, more issues.

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ch33zw1z

Lifer
Nov 4, 2004
38,096
18,569
146
Are you still running the OC? Maybe run at stock speeds...

What PSU do you have?

Yea, maybe work on that update issue

RAM brand and spec? turn off the box, pull a stick and look

or open a terminal and run: sudo lshw -html > name of file.html

depending on the system, this command may or may not return good results for RAM brand and specs, worth a shot.

I dunno if you need to spend a ton of money, I would at least start with running at stock speeds and installing the OS on something other than the Vertex.
 
Last edited:

lxskllr

No Lifer
Nov 30, 2004
57,867
8,119
126
I'm not familiar with Kubuntu, but if they have some kind of finder utility, you can type update and it should bring up Ubuntu's software manager. Alternatively, in the terminal
Code:
sudo apt-get update && apt-get upgrade
will update the repos, and upgrade your packages. I like using Synaptic for package management. You can install that through apt-get if you want to try it.

Your system is pretty new. Have you looked for busted caps, or some other obvious signs of damage? Is BIOS up to date?
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
Are you still running the OC? Maybe run at stock speeds...

What PSU do you have?

Yea, maybe work on that update issue

RAM brand and spec? turn off the box, pull a stick and look

or open a terminal and run: sudo lshw -html > name of file.html

depending on the system, this command may or may not return good results for RAM brand and specs, worth a shot.

I dunno if you need to spend a ton of money, I would at least start with running at stock speeds and installing the OS on something other than the Vertex.


Nope no OC, with all the issues I would not even want to bother with that.

The board is a X79-UD3 made by Gigabyte.

PSU is an OCZ 1000w.

Just found the update program, doing an update now. I guess should I try a bios update too? That always makes me nervous though because if something goes wrong it could brick the whole motherboard, but I suppose it's worth a shot considering I might end up replacing it anyway. I'll just run a long extension cord to the big UPS in the server room, just in case the power goes out. (my current UPS is only good for like 10-15 minutes)
 

lxskllr

No Lifer
Nov 30, 2004
57,867
8,119
126
I guess should I try a bios update too? That always makes me nervous though because if something goes wrong it could brick the whole motherboard, but I suppose it's worth a shot considering I might end up replacing it anyway. I'll just run a long extension cord to the big UPS in the server room, just in case the power goes out. (my current UPS is only good for like 10-15 minutes)

Hold tight on the system update. Maybe that'll fix your problems, though I doubt it. Read over the BIOS releases, and see if they might relate to your issues. I agree regarding BIOS updates in general, but having problems is a good reason to try.
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
But that also means new ram, unless I can get lucky enough that the ram I have is on the HCL for the new motherboard, but the odds are quite slim especially with my luck.

RAM doesn't have to be on an HCL to work properly. Just get standard RAM that matches the specs of the motherboard. Crucial, Kingston, and pretty much any other memory reseller will have a memory chooser tool that will tell you exactly what memory goes with your board. I've never had any memory that I've specced this way not work.

Is there some kind of debug mode in Linux that would give me more details on what causes the freezes?

https://wiki.ubuntu.com/Kernel/Debugging

Linux also has the ability to use SysRq key combinations to interact with a system that is otherwise frozen. You can find more detail here:

http://en.wikipedia.org/wiki/Magic_SysRq_key

(On a related note, if the magic SysRq key works, then your problem is unlikely to involve the CPU, RAM, the core components of the mobo, or the PSU.)

What about CPU, can those go bad?

CPUs can go bad, but they are loaded with error-checking circuitry that will generate Machine Check Exceptions if the processor isn't working properly. Linux will typically log these, or at least write them to the console.

At least it will be ECC ram and server grade so it should be more reliable than consumer crap.

ECC only corrects single-bit errors. Crashes caused by memory corruption are usually caused by chip failures, which basic ECC is not going to prevent (although it may be able to detect it as a RAM failure). More advanced ECC techniques like HP's Advanced ECC or IBM's ChipKill can sustain entire chip failures, but systems that can use them are not priced for purchase by mere mortals.

<logging shit>

Your log has a huge gap in the middle of it. That can be a hard lock, but this can also happen if your system disk is failing and syslog isn't configured to write anywhere else.

ram: 12GB of ram (I forget exact brand/specs, is there a way to check?)

The 'dmidecode' command will tell you what type of RAM you have installed, among many other things.

I guess should I try a bios update too? That always makes me nervous though because if something goes wrong it could brick the whole motherboard, but I suppose it's worth a shot considering I might end up replacing it anyway.

Any modern BIOS worth a shit will have some type of recovery mechanism in place in case you screw up your BIOS update. That being said, I've never once screwed up a BIOS update, and I've probably done several hundred of them.

I just want a computer that works period.

Maybe you should just get an iPad

Seriously, I'm leaning more toward a storage issue. Storage problems normally don't cause complete hard locks, but if you're running X, it can effectively lock the machine if X stops working and you can't switch to a text console.

I'd advise sending at least kernel logging messages to a remote syslog server, and either disabling your swap file or moving it somewhere else. While these steps won't necessarily prevent malfunctions, they can at least make them easier to identify.

You can also run drive tests using the 'smartctl' command, although I'm not sure how well that utility works with SSDs.
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
Actually setting up an external syslog server is something I always wanted to look into. I'll have to read up on that. Now if I setup everything to point to this syslog server if there is a network issue do machines normally revert to local syslog file? I recall seeing an option in pfsense as well, may as well redirect that too.

When I get the chance (probably after Christmas) I will go ahead and see if there's a bios update, and also swap SSDs. I have windows on the other SSD which is a Crucial, while Linux is on the OCZ which are known to have issues. Something I overlooked. Now that you mention storage. Though I have never seen a storage issue cause total lockups before. I have unplugged the OS drive by accident before and the system continues to respond. Obviously it would start to get IO errors and crash, but it would not really lock up. Though, that was in Windows, maybe Linux is different. I don't want to actually test it though, probably bad for corruption. At least these things wont cost money so I suppose I can start with that.

Done the drive tests before and all checked out ok. I will try it again though. Same with memtest I will try to do one every night before I go to bed, and next night shifts I'll try to run it for the entire set of nights, which is 4 days usually.
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
Actually setting up an external syslog server is something I always wanted to look into. I'll have to read up on that. Now if I setup everything to point to this syslog server if there is a network issue do machines normally revert to local syslog file? I recall seeing an option in pfsense as well, may as well redirect that too.

You can have syslog write to the local file system and a remote syslog server simultaneously.

Though I have never seen a storage issue cause total lockups before. I have unplugged the OS drive by accident before and the system continues to respond. Obviously it would start to get IO errors and crash, but it would not really lock up. Though, that was in Windows, maybe Linux is different.

Linux can lock up if you've got a memory page in swap and your swap file goes away. X can also lock up if your DE fails.

I don't want to actually test it though, probably bad for corruption.

Modern filesystems are journaled, and can recover themselves to a consistent state very quickly. If you're just sitting at an empty desktop, you should be okay. If you want to be safe, run the 'sync' command and then wait a few seconds before unplugging it.
 

ch33zw1z

Lifer
Nov 4, 2004
38,096
18,569
146
Nope no OC, with all the issues I would not even want to bother with that.

The board is a X79-UD3 made by Gigabyte.

PSU is an OCZ 1000w.

Just found the update program, doing an update now. I guess should I try a bios update too? That always makes me nervous though because if something goes wrong it could brick the whole motherboard, but I suppose it's worth a shot considering I might end up replacing it anyway. I'll just run a long extension cord to the big UPS in the server room, just in case the power goes out. (my current UPS is only good for like 10-15 minutes)

One thing at a time. Do the OS updates and see how it goes. BIOS updates for home boards can be a little scary, but do the homework and you'll be fine. I strongly recommend you use the DOS updater or the Q-Flash (Gigabytes embedded utility). do not use the Windows utility. I know I know...you are running linux, but I still gotta say this

http://www.gigabyte.com/webpage/20/HowToReflashBIOS.html

latest BIOS is from September 2013: http://www.gigabyte.com/products/product-page.aspx?pid=4050#bios

On the bright side, if you brick it....RMA it to Gigabyte. Cmon man, if I can flash a BIOS...you can

Yes, getting the box onto a UPS is a great idea during a BIOS update. Really, it's a great idea in general. Any machine you rely on should be adequately powered with a UPS that performs line filtering to some degree.
 
Last edited:

slashbinslashbash

Golden Member
Feb 29, 2004
1,945
8
81
Linux can lock up if you've got a memory page in swap and your swap file goes away. X can also lock up if your DE fails.

IMO swap on an SSD is a recipe for killing your drive; and when your drive starts to die, your PC will start crashing. Get a ton of RAM and disable swap entirely. Or get a separate cheap (32GB, 64GB) SSD to put your swap and /tmp on.
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
I've always ran without swap but some people swear that it can cause issues to have no swap, so I ended up adding one at some point so I can rule that out as being the cause of all my issues. But yeah I don't really like the idea of having swap on a SSD. Though Linux is smarter than Windows when it comes to swap, it does not use it if it does not need it. Mine is sitting at 0 used right now when I do top.
 

smakme7757

Golden Member
Nov 20, 2010
1,487
1
81
I've always ran without swap but some people swear that it can cause issues to have no swap, so I ended up adding one at some point so I can rule that out as being the cause of all my issues. But yeah I don't really like the idea of having swap on a SSD. Though Linux is smarter than Windows when it comes to swap, it does not use it if it does not need it. Mine is sitting at 0 used right now when I do top.
It's almost always going to sit at 0% used. However it's a smart idea to have a little swap for the times when your machines hits an error in memory and uses swap to compensate. You won't know it's happened unless you monitor your swap usage from ms to ms.

I have at least 2GB swap on all my machines as a minimum. The performance benefit of no swap and the impact on modern SSDs of using swap is so small that it shouldn't even be a concern any more. Lets face it, if you are swapping to disk so often that it would actually shave off a single month of life on your SSD then you don't have enough physical memory.

It's worth mentioning that when Linux starts hitting a memory limit it will terminate processes to stay alive.
 

powerhouse65

Junior Member
Dec 29, 2013
24
0
0
Sounds weird. I had similar unexplainable symptoms with bad memory. To make sure that it's not memory, you need to run the memtest for a long time (72 hours). Still, memtest can miss things.
Install prime95 and memtester and run both. Make sure your CPU has good cooling as prime95 can really smoke it. While memtester (unlike memtest86+) cannot test all memory areas, you are able to stress test your machine this way.

If it's not the memory, my next guess would be the PSU. Even if your PSU seems to work OK and shows good voltages, have it checked by lab using an oscilloscope!!! Forget the normal multi-tester checks, they are useless when it comes to checking a PSU.

Some of the better labs have hardware based stress test equipment. They can insert a PCIe card and have the entire machine checked with various loads and see what's wrong. You'd pay some money for these tests, but perhaps less than replacing expensive hardware when it turns out to be the wrong guess.
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
Those labs sound interesting... but by the time I ship out my PC over the border (I doubt Canada has any of these) and back it would cost more than the system itself by the time I pay shipping and customs both ways. Not to mention being stuck without a PC for a month or two. I wonder if I can rent an oscilloscope though... always wanted to play with one anyway, would be a fun project. I doubt that's something easy to get my hands on though.

I just swapped the two hard drives (I put windows on the hard drive that had Linux and Linux on the drive that had windows) so that gives me a different hard drive to run on. I got a memtest going on, when I get home from my night shift it will be 12 hours, I'll let it go and go to bed, and check after. I'll probably let it go for a couple days.
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
24h and so far so good. I'll let it go. I have two night shifts left so may as well let it go throughout all of that.

I also had a 1TB HDD in that system that I unhooked, it had Linux Mint on it from a previous troubleshooting test where I ran it for a bit.

So right now I'm down to:
Linux installed on Intel M4 SSD
Windows installed on OCZ vertex
12GB of ram - testing
1 video card (ATI)

Switching from my previous nvidia card to ATI did fix like 99% of the issues though. So here's hoping the Vertex was maybe part of other issues and now I'll be good. I don't boot into Windows often enough so even if I just moved the issues to Windows chances are I'll never see them.

I still have the bios update to try, I will wait it out and try that at a later time.

Any tests I can do at home for the PSU without needing expensive tools? Would an arduino board be able to sample fast enough to detect any voltage fluctuations? Maybe I can just make something where if it detects a fluctuation outside normal range it lights up a LED or something like that.
 

lakedude

Platinum Member
Mar 14, 2009
2,677
473
126
Sometimes just swapping parts around will fix problems that are caused by poor connections. Over time parts get loose and corrode.

Just recently my friend's weird problems were solved by reseating the CPU.

My old XP system will run for about 6 months and then it will act up. Take everything out, put it all back in and it works fine for another 6 months.

Got 2 hard drives swapped in a system at work because if I swap em back one system will not work properly. With em swapped both systems work fine?? Beats me, must be some sort of bad connection.

Cleaning the parts can help as well...
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
Ram test checked out ok after 60 something hours. With the hard drives being swapped out I will wait it out and see if it does it again.
 

powerhouse65

Junior Member
Dec 29, 2013
24
0
0
Yeah, lakedude is right. Sometimes it's just taking it apart and reassembling it and everything is fine.
You got a pretty strong system so it doesn't make sense to switch it for a new one. I found that these quad channel RAMs for X79 boards can be tricky, that's why I suggested the RAM test. 60 hours and no problem should suffice.

One more thing: You could try running Windows for a while, give it something to do (prime95 ? - but don't blow your CPU), and see if Windows is stable. As rule of thumb, if Windows is running fine, then Linux should run even better.

When I had the weird memory issues with Linux, I never thought of RAM (=hardware) issues. Only when I tried to install Windows and it quit on me right away did I start to suspect the hardware.

Also, reset your BIOS to default. Then check and enable XMP for your RAM, if it's supported. Perhaps you had changed some BIOS settings?
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
Here we go again... guess it was not the hard drive. Guess the next step is trying a bios update.

Though I'm starting to wonder, could it be some kind of cron job that's causing this, here's where it froze:

Jan 18 06:43:20 falcon dhclient: DHCPREQUEST of 10.1.2.10 on eth0 to 10.1.1.1 port 67 (xid=0x65f64d3b)
Jan 18 06:43:20 falcon dhclient: DHCPACK of 10.1.2.10 from 10.1.1.1
Jan 18 06:43:20 falcon dhclient: bound to 10.1.2.10 -- renewal in 3192 seconds.
Jan 18 06:43:20 falcon NetworkManager[1397]: <info> (eth0): DHCPv4 state changed renew -> renew
Jan 18 06:43:20 falcon NetworkManager[1397]: <info> address 10.1.2.10
Jan 18 06:43:20 falcon NetworkManager[1397]: <info> prefix 16 (255.255.0.0)
Jan 18 06:43:20 falcon NetworkManager[1397]: <info> gateway 10.1.1.1
Jan 18 06:43:20 falcon NetworkManager[1397]: <info> hostname 'falcon'
Jan 18 06:43:20 falcon NetworkManager[1397]: <info> nameserver '10.1.1.10'
Jan 18 06:43:20 falcon NetworkManager[1397]: <info> domain name 'firewall.loc'
Jan 18 06:43:20 falcon dbus[1072]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper)
Jan 18 06:43:20 falcon dbus[1072]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jan 18 07:05:01 falcon CRON[21993]: (root) CMD (ntpdate borg.loc)
Jan 18 07:05:07 falcon CRON[21992]: (CRON) info (No MTA installed, discarding output)
Jan 18 07:17:01 falcon CRON[22667]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jan 18 07:30:01 falcon CRON[23432]: (root) CMD (start -q anacron ||
Jan 18 07:30:01 falcon anacron[23435]: Anacron 2.3 started on 2014-01-18
Jan 18 07:30:01 falcon anacron[23435]: Will run job `cron.daily' in 5 min.
Jan 18 07:30:01 falcon anacron[23435]: Jobs will be executed sequentially
Jan 18 12:38:29 falcon kernel: imklog 5.8.11, log source = /proc/kmsg started.


Would be nice to know what these "jobs" are, because I don't have any set, this is not a server. Can I just disable crond completely to rule this out or is there lower level system jobs that are needed?
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
This just gets worse. Now my NFS mounts randomly stopped working. It's saying access denied for everything. What the hell? I don't see any NFS logs nowhere to be found, either so not much to go by. Nothing changed as far as permissions go.
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
All of the stuff in your latest log snippet looks pretty routine. It's unlikely that a cron job is crashing your PC --- you'd have to be doing something incredibly stupid, and even if you were, the problem would be happening on a well-defined schedule. Don't disable cron, as your distro likely uses it for routine system management stuff.

Without anything further to go off of, it's looking more and more like you might have a hardware fault of some sort. Unfortunately, commodity equipment rarely provides any sort of facility to log or otherwise detect these faults.
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
I've been thinking of replacing this system with an Intel Nuc, I might just go ahead and do that, and then use this system strictly for Windows. Seems like it's my best bet. It works fine when it's being taxed, like gaming etc. All the issues are always when it's idle. So if I never let it idle and I only turn it on to game I'll probably be fine.

Looks like my NFS issues might be the server because I can't seem to mount from my other machine either. What a pain in the ass, I can't seem to get a single system in this house to work properly.
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
If your stability problems are strictly happening when you're idle, you may be having a problem with some type of power management functionality. Try shutting off the power management functionality in the BIOS (Cool & Quiet or whatever Intel calls it, disk poweroff, etc.), and then disable standby/hibernate in Linux. If the problem goes away, you can at least narrow down your troubleshooting.
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
Default bios settings, and disabled power stuff. It's been happening in Windows too now, and right in the middle of gaming. Did it twice this week, maybe 10 hours or so of UT3, if that. Going to look at getting a new motherboard. Started a thread in Mortherboards to see if I can get a suggestion, so I don't end up picking up another dud board (if it even is the board). Failing that, I'll have to try another CPU, At that point, I'll have tried everything. Failing that, I'll just get a Dell. Failing that, time to get a Mac. lol.
 

Scarpozzi

Lifer
Jun 13, 2000
26,389
1,778
126
I had a 2 year old system build go crazy 12 years ago. When it comes to mobo, processor, RAM, and power supply issues, they're not easy to diagnose...especially on Windows 2000.

When I compared the time and cost of troubleshooting an intermittent kernel crash, it made more sense to buy a small business class Dell server at the time.

I agree to go the BIOS and firmware route. Look at all devices in your system for firmware updates....including hard drive and video drivers. I've had Linux servers crash from processor/mobo firmware issues.
 

Red Squirrel

No Lifer
May 24, 2003
68,260
12,506
126
www.anyf.ca
And here we go with cursor related issues now that I moved back to dual monitors.... what a freaking pain in the ass. I'm rebooting this Linux system more often than a windows 98 machine.




I still have to get another motherboard though, but this cursor thing only seems to happen with multi monitors as it's similar to issues I was getting before when I had attempted multi monitors, so I think no matter what I'll have to go back to one, and if that's really the case I might have to just say fuck it and go back to Windows. I simply cannot code or do anything productive with just one monitor. Just not doable. At least not until they start to make higher resolutions.

I had a setup using synergy that was ok, but some stuff was choppy like hovering over any web based stuff that had effects and it requires having another machine running, so that means more noise, heat, and power usage.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |