Sponaneous Reboot/Shutdown

Netopia

Diamond Member
Oct 9, 1999
4,793
4
81
I've been having a problem with a machine, where I'll come in to work and it will be shut off (it's a personal machine they let me keep on their network). My office is locked and only the VP and I have keys.

I checked the UPS, disabled shutdown on no CPU fan or CPU overheating in the BIOS. No luck... still does it. I've unplugged everything in the box but the CPU fan and the drives. Still does it. Unhooked it from the KVM so that it's only connections are power and ethernet. Still does it. Checked all the memory (it's fine) and ran a torture test (passed). I even put the power supply on a tester and everything came out good and stable (hardware tester, not software) I'm at a dead end.

Sometimes it takes four or five tries to get it to reboot all the way... I've even had it reboot while in the BIOS, so something must be wrong with some part of the hardware, though I can't seem to identify anything.

A couple days ago I happened to be in an SSH terminal to the machine and suddenly up popped a message saying that system shutdown had been initiated by root and that everyone should get out. The machine proceeded to shut down. Hmmmm... I'd love to look at the logs that might have more info on who "root" was or what the command was that made it power down. I've sniffed around /var/log but can't seem to find anything of significance.

I'm sure it's hardware, but I thought that maybe the logs could at least give a clue. Any idea WHAT logs I might be able to look at?

Thanks,

Joe
 

cleverhandle

Diamond Member
Dec 17, 2001
3,566
3
81
/var/log/messages or /var/log/syslog are catch-all logs that will usually contain kernel messages. If it exists, /var/log/kern would have much the same thing. I don't think you're going to see much in the logs that's helpful, though - it sounds like the hardware is going belly-up and bringing down the system with it. You might get a one-liner in a log indicating some kind of error, but you might not even get that.
 

Netopia

Diamond Member
Oct 9, 1999
4,793
4
81
Thanks for the response. I have only /var/log/messages, none of the others you mentioned exist on my box. Just as you said, I don't get much info, most only look like this:


Which really tells me nothing.

I'm really scratching my head. What sort of hardware crash causes an OS to shut down gracefully? Bizarre!

Joe
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
If it logs that it's shutting down it's not crashing, you need to find out what's shutting it down. Is it always at 22:18?
 

Netopia

Diamond Member
Oct 9, 1999
4,793
4
81
Nope... random times. And as I said, sometimes it will just spontaneously reboot even while I'm in the BIOS. It can go days being fine and then I'll come in and find it off. Often when I try to restart it, it takes a few tries to get it to not go through a cycle of rebooting (before even getting to GRUB). There's a ghost in the damned thing!

Joe
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
It sure sounds like a hardware problem but just out of curiosity I'd probably try putting Windows on it and see if it lasts more than a few days.
 

nweaver

Diamond Member
Jan 21, 2001
6,813
1
0
I've seen spontaneous reboots from bad caps on a motherboard, but never s gracefull shutdown like that...
 

Netopia

Diamond Member
Oct 9, 1999
4,793
4
81
I left an ssh session running on my Windwos desktop. Below is what I came into this morning. One possibility that I hadn't thought of until now is that there are two things happening. I have my doubts, but I should check anyway. It occurred to me that I could have some hardware instability, but ALSO have some rootkit or something screwing me up. I'll scrutinize the machine for a rootkit just to be sure that isn't what's gracefully shutting it down.

If it is hardware... I know that pressing the power button once will cause the system to gracefully shut down (just like Windows), so I'm wondering also about the possibility of something on the mother board having gone whacko. For instance, the signals that tell the computer to shut down in case of no CPU fan or CPU over-heating.

Does anyone know of a good way I could monitor (and log) CPU temp and fan speeds from inside Linux? It would be interesting to see if there's any correlation.

Joe
who wishes this BBS software would allow code to be inserted in a post and not just appended to the end of it!
 

TonyRic

Golden Member
Nov 4, 1999
1,972
0
71
Possibly a bad powersupply. A change in the +12V rail that is not enough to cause a crash, could cause a shutdown signal to be sent to the OS. OR a bad keyboard chip on the motherboard which is intermittantly sending what is interpreted as a C-A-D
 

skyking

Lifer
Nov 21, 2001
22,386
5,360
146
this is one of the more frustrating faults, and tossing hardware at it is the only quick solution. I would try a different power supply, and then conclude the mobo is probably at fault. You can test the memory and hard drive easily, but hard to diagnose a bad motherboard.
My desktop is doing that now, it will be off or rebooted every now and then.
 

Cr0nJ0b

Golden Member
Apr 13, 2004
1,141
29
91
meettomy.site
I think that you said it was rebooting in BIOS...right? If that's the case, it's not your OS it's your HW. I would look at power for sure...then heat...then memory...

When odd things like this start happening to me I usually take a deep breath and start taking apart the system. I take out all the cards, memory and everything. I usually mix around the cards and slots in case it's an IRQ thing and I talk very solftly to the computer. That's the improtant part. Most people forget that a computer is a living, feeling entity and it might just be starved for attention. So make sure that you are talking to it while you do your troubleshooting.

Once everything is back in place boot up and leave it in BIOS for a while...if it reboots from there...I would swap the PS first...keep the case open and put a fan on it, to make sure that heat isn't the issue.

Pat the case on top...they like that.

If you can't reproduce the reboot in bios, then boot the system up and load up some stress tools like Pi or some other benchmark or burnin tools. Run them for a while and see if you can get it to reboot in a more consistent fashion. Again, if it's still random, then I would say PS is the issue. but check memory to be safe.

hope that helps.
 

Brazen

Diamond Member
Jul 14, 2000
4,259
0
0
check out lmsensors. I can't remember how to use it, but you can google for it. I'm pretty sure lmsensors is the linux program that monitors heat and such, if your motherboard has sensors for it.
 

Netopia

Diamond Member
Oct 9, 1999
4,793
4
81
All you guys are great!

When I got in Thursday morning, she wouldn't stay up for more than a minute or so, so I had to wait until lunch so that I could dedicate some time to her. She was crashing right in the BIOS screen or half way into Linux or just after the desktop... or where ever.

During lunch, I switched out video cards, and at first I thought that did the trick, but it started again after a few minutes. So... I half way gutted
her. Took out all the hard drives, disconnected all the electrical stuff, loosened (but didn't remove) all the screws holding the mother board... and
then shook the crap out of her. Didn't hear anything rattle (was hoping a screw was lodged somewhere and shorting). Slowly put everything back
together EXCEPT for the hard drives and booted. Ended up at a "no system" prompt, but it stayed at that prompt for a good 15 minutes. So then, I booted from a Damn Small Linux live cd. Still good for about 15 minutes. I then put the drives back in and booted to a Damn Small Linux desktop and left her there for about a half hour or so. She didn't crash, so I rebooted into FC4.

She's been up stable for about two days now, but that's not a record or anything, so I'll wait another few days/a week before I will accept that she is fixed. I'm hoping that something, somewhere, was just shorting... though I could find no evidence. If she starts crashing again, even though I've bench tested the power supply, I'll try another one. If that fails, I'll replace the mobo and CPU and just upgrade to one of the fantastic S-939 deals that are out there.

I tried to stress her some, doing CPU stress tests and copying about 150GB between the three drives while doing other stuff, but everything stayed 100% up and stable. Perhaps there was a small ghost that became afraid of me.... who knows? Whatever the case, I thank all of you for your input and will let you know the outcome in a few days.

Joe
 

Netopia

Diamond Member
Oct 9, 1999
4,793
4
81
So far 6 days and no reboots. Something, somewhere must have been shorting in a most bizarre way!

I'll post now and then on uptime, but I think that although we really don't know what it was, that it is none the less solved.

Joe
 

Netopia

Diamond Member
Oct 9, 1999
4,793
4
81
Well, it lasted 8 days.

I've replaced the RAM and graphics card. Bench tested the power supply. Do a root kit check. This is just WEIRD because it's shutting down cleanly. It's now booting back up cleanly too... no more crashing... but something is still telling it to power down. Been through the logs... nothing... it happens at random times. I guess I'll replace the power supply next, but I'm out of obvious solutions!

Joe

 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
Hmm. Maybe log ACPI somehow? Something is getting a signal to shut down. Or replace your init command and use a proxy program to log calls to it?

I think the power button is being pressed virtually. Try disabling ACPI in the BIOS and see if it still happens? Not ideal, but it'll rule out the power button thing.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |