Linux box reboots on random days but at fixed time

TheOtherMudit

Member
Dec 13, 2001
158
0
0
Hi All,

I've a linux box which has been running fine for last 2 yrs or so.
Motherboard: ECS K7S5A
CPU: AMD Athlon XP2000+
HSF: Zalman fin style and fan
Memory: PC133 512mb SDRAM

Since last 1 yr or so (I think after I upgraded to kernel 2.4.20), my system reboots every 6-14 days at/around 7:30am on a weekday There is nothing in the cron, so its not a scheduled reboot Also, after the system comes up, the /var/log/messages file has message about ext3-jfs fixing the file system.

Why weekdays only ? I'm running some programs which do data collection (stocks, option, futures etc) form 6:30AM to 1:30PM Pacific time.

Yesterday, I upgraded my kernel to 2.4.29 and rebooted in the evening. This morning, at 7:29:xx my system rebooted !

Now it may happen again on monday. I'm wondering what I can do to get little bit more information before the system reboots. I'm going to hookup monitor and watch it like a hawk for any messages.

Thanks

Mudit
 

TheOtherMudit

Member
Dec 13, 2001
158
0
0
looked at cron log file. nothing in there.
also, if cron is rebooting it, then it should be a _clean_ boot. When the system starts, it complains about
file system not cleanly unmounted.

so, its not a cron job.
 

foxkm

Senior member
Dec 11, 2002
229
0
0
Have you looked at your motherboard to see if it has bad caps.. Its possible that it is rebooting under some sort of load. ECS is one of the brands that is notorious for bad caps on their motherboards. I have seen a lot of K7S5A boards with this problem.
 

imported_kseskisator

Junior Member
Mar 18, 2005
5
0
0
If it always happens on weekdays at 07:30, it can only be a cronjob.

Maybe you don't have a shutdown -r line in your crontab. Maybe something runs at that time that is hardware intensive and causes the machine to reboot. Go through everything that cron runs at that time (have you checked /etc/cron.daily-weekly etc?), try to run it yourself and see what happens.

See what proccesses run on the machine, and check if you see something fishy. Are you sure your machine has not been compromised? Try to avoid running things as root unless needed.

If you find a kernel bug that only occurs at a specific time on weekdays, boy, that'll be a first
 

TheOtherMudit

Member
Dec 13, 2001
158
0
0
i've gone thru all the processes running starting from 7:25 right till the time of the crash. Everything
looks normal ! No heavy I/O at that time. The normal load starts at 6:30am when the market opens.
At 7:30am, its about 1 hour into processing, but nothing heavy. sar shows that cpu has been idle about 80% of the time.

On monday, I'm going to hookup a monitor and watch it. See if get any kernel oops or something.

I suspect motherboard too ! But it has been working for so long ! Oh well !!!

Thanks

Mudit
 

TheOtherMudit

Member
Dec 13, 2001
158
0
0
Originally posted by: kseskisator
If it always happens on weekdays at 07:30, it can only be a cronjob.

unfortunately, it doesnt happen every weekday !
just happens on a random weekday at 7:30am ..
so may be some stress problem in some hardware etc.
As my processing starts at 6:30am.
Till 6:30am, the cpu is 99.99% idle. After that, sar reports 80% idle.
I've changed the sar time resolution to a minute.
lets see if i find something abnormal during 7:28-7:30 window


 

drag

Elite Member
Jul 4, 2002
8,708
0
0
That is so queer.

Still check your crontab stuff. /etc/crontab will show what time you run jobs and such. On my system (debian) it runs the scripts in the following directories:
/etc/cron.hourly every 17 minutes after the top of the hour.
/etc/cron.daily every 6:25 am
/etc/cron.weekly every sunday at 6:47
and /etc/cron.monthly every 1st of the month at 6:52

all in the morning.

although you probably already knew that.

But I suppose you need to keep your mind open to possiblities.

Could be something stupid like:
1. Every day at 7:30 the cleaning lady plugs their vacuum cleaner into the plug adjacent to your computer..
2. Every day at 7:30 the power company switchs a switch and causes a subtle brown out that nobody notices, but causes your computer to reboot. (US power is suppose to be 120volts, but there are parts of my town were you can get voltages down to about 75volts at the plug and it can cause weird things to happen to people's electronics)
3. Some prankster thinks its funny to hit the reset button every day when he knows nobody is watching.

You see it seems odd to me that it would cause a reboot. If you have a hardware issue or kernel issue bad enough to screw up the computer I figure it will cause a kernel panic, and that will cause the computer to sit their with a debugger until somebody comes by and resests it. Either that or it will cause it to just lock up.

If it's a problem with a program and it freaks out and consumes all the memory or proccessing power then it will simply slow down to a near halt, or the OOM killer will come out and start killing off proccesses.

I'd probably crack open the case and make sure that all the fans are running, too. Once I got a problem with a system going down.. turned out the super-cool copper heatsink I bought for my 1.13ghz Thunderbird (it was a while ago. ) had it's fins too fine. It acted like a filter and I ended up with a compressed mat of dust and fine cat fir smooshed inbetween the fan and the heatsink. It was remarkable.. looked exactly like a thick peice of gray wool sweater material that somebody cut into a little perfect square. Also check the power supply, and the connections to the motherboard and all that happiness.

I could imagine your job kicking in at 6:30 and it takes a half a hour at that level or proccessing to cause the cpu to overheat or the power supply to crap out.

If the caps are bulging and/or look pusy, then time to replace the motherboard, like the others said, too.


 

TheOtherMudit

Member
Dec 13, 2001
158
0
0
the computer is in laundary room in one corner
connected via ups .. so supposedly immune to brown outs

will take out the cpu, memory, fan etc .. will put everything back again.
and then hook up monitor before monday morning .. watch like a hawk at 7:25am ...

see if any random messages come on console
what i'm worried is that if its the software, then i'll have same issues even with the new hardware !

i'm just going crazy !!!
 

Halz

Senior member
Jun 25, 2000
335
0
0
It is very likely there is a 'watchdog' that is rebooting your machine. Google around for what a watchdog is

A recompile of the kernel can remove the watchdog; CONFIG_WATCHDOG. However, there might also be a module that you can remove to disable it temporarily.

In my case, I had watchdog rebooting the system when a fiber ethernet card would 'time-out'. Ofcourse, that drew attention to another problem..
 

sciencewhiz

Diamond Member
Jun 30, 2000
5,885
8
81
change your 6:30 process to 5:15 or some other time, and see if the reboot time changes with it.
 

TheOtherMudit

Member
Dec 13, 2001
158
0
0
halz, hmm.. watchdog reboot .. thats interesting .. would investigate about it.
sciencewhiz, unfortunately, no data/processing can occur before 6:30am (thats when the market opens in pacific time)

FWIW, I opened the case, cleaned inside by canned air spray, cleaned motherboard, moved 512mb sdram to a different
slot, plugged in another (linksys) ethernet card and moved my network connection to linksys instead of onboard realtek.

Lets see what happens on monday (or tuesday or wednesday or ...)

Will keep you posted !

Thanks

Mudit
 

TheOtherMudit

Member
Dec 13, 2001
158
0
0
Well .. no boot today. I was guessing that it may be more network related than cpu. I ran some cpu intensive tasks (backing up and bzipping entire mysql database) which takes almost 1 day on every Saturday. The server never crashed during/after the backup.

But it always crashed 1 hour after market open. The network traffic is probably highest at that time with multiple connections. If the
onboard network interface is flaky, then it can cause crashes. Since I'm using a PCI card for the network, the onboard NIC is out
of the picture.

Lets see if my theory holds.


 

TheOtherMudit

Member
Dec 13, 2001
158
0
0
argghhh.. it happened today at 7:30am ! New kernel and new card ! And today there was no network traffic of any kind !
Still reboot at 7:30am ! There was minimal processing and traffic going on. But strangely, 2 mins before the reboot, all
activity suddenly died. Here is 1 min snapshot from the sar command

07:22:00 AM all 0.03 2.33 1.32 96.32
07:23:00 AM all 0.03 2.15 1.42 96.40
07:24:00 AM all 0.07 2.22 1.53 96.19
07:25:00 AM all 0.17 0.45 0.70 98.68
07:26:00 AM all 0.00 0.00 0.00 100.00 <== all quiet
07:27:00 AM all 0.00 0.03 0.05 99.92 <== quiet
07:28:00 AM all 0.00 0.00 0.00 100.00 <== quiet
----------- here is the reboot ---------
07:30:48 AM all 100.94 100.89 100.92 0.00
07:32:01 AM all 4.09 0.94 2.76 92.21
07:33:00 AM all 0.02 1.30 0.81 97.87
07:34:00 AM all 0.00 1.38 0.90 97.72

here is output from sar -n DEV | grep eth0 -- for that duration --
again no traffic after 7:26am

07:19:00 AM eth0 32.12 30.90 38420.72 2244.61 0.00 0.00 0.00
07:20:00 AM eth0 35.97 34.76 43140.70 2525.52 0.00 0.00 0.00
07:21:00 AM eth0 29.67 28.54 35618.64 2073.91 0.00 0.00 0.00
07:22:00 AM eth0 35.70 34.31 43099.00 2488.52 0.00 0.00 0.00
07:23:00 AM eth0 31.82 30.77 38076.49 2240.74 0.00 0.00 0.00
07:24:00 AM eth0 37.25 35.90 44541.58 2603.58 0.00 0.00 0.00
07:25:00 AM eth0 19.04 18.43 22515.51 1348.92 0.00 0.00 0.00
07:26:00 AM eth0 0.15 0.13 12.20 13.40 0.00 0.00 0.00
07:27:00 AM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
07:28:00 AM eth0 0.13 0.12 10.66 11.40 0.00 0.00 0.00
07:30:48 AM eth0 0.00 0.00 0.01 0.00 0.00 0.00 0.00
07:32:01 AM eth0 16.15 15.53 19213.34 1126.67 0.00 0.00 0.00
07:33:00 AM eth0 24.08 23.26 28740.31 1690.39 0.00 0.00 0.00
07:34:00 AM eth0 25.03 24.21 29474.27 1768.32 0.00 0.00 0.00
07:35:00 AM eth0 32.39 31.29 38942.95 2284.77 0.00 0.00 0.00

====================================================

how do i find out if my box is being compromised at that time ? how i can stop the shutdown command in case
some script is trying to run shutdown or reboot.

 

TheOtherMudit

Member
Dec 13, 2001
158
0
0
dont have console hooked up.
At 7:26am, everything is suddenly quiet. Network traffic drops, number of process/second drops, even
the disk activity stops. As if system is dead or held hostage. Then right at 7:30 it reboots (or crashes).

May be I'll remove/rename reboot and shutdown commands and put fake ones to see if anyone is calling them.
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
Originally posted by: n0cmonkey
Hook up a serial cable.



Null modem serial cable.. Different serial cables have different pin-outs and there is no protections against short curcuits. (I warned a old boss about this, she ignored me and blew out a 5000 dollar RIP device for a poster-size prints printer)
 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
Originally posted by: drag
Originally posted by: n0cmonkey
Hook up a serial cable.



Null modem serial cable.. Different serial cables have different pin-outs and there is no protections against short curcuits. (I warned a old boss about this, she ignored me and blew out a 5000 dollar RIP device for a poster-size prints printer)

I figured he would do his own homework.

I'm a fan of the modular connectors myself.
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
I know, but we aren't just used to having 'unsafe' hardware laying around..

becuase I know there is going to be more then one person (not nessicarially the person who started this thread, of course) with a old Joystick extension cable laying around in some junk drawer going: "ah... that sounds like a good idea, what if I take this and plug this into my computer..."

Remember once the magic blue smoke is released from your electrical componates, it's very hard to put it back in.
 

SinNisTeR

Diamond Member
Jan 3, 2001
3,570
0
0
what version of linux do you have running? have you thought about re-installing? that would help you figure out if its hardware or some weird software glitch.. just a thought
 

TheOtherMudit

Member
Dec 13, 2001
158
0
0
I'm running RH 7.2 (I know its old) since 2001. The kernel is upgraded to 2.4.29. I think my problems have started
happening after I've upgraded mysql server last May. Everything else is constant on the box.

Its so bizzare that it happens like a clock work ! It happened againt yesterday ! Right at 7:30am. I was even running
a script which was doing a ps -ef every second ! Nothing out of the ordinary before crash.

Unfortunately, it'll be hard to hookup a monitor where the box is located. Its in laundary room sitting next to the washer
I'll change the mysql version and see if it happens again within couple of days.

This is my current mysql version
mysql Ver 14.3 Distrib 4.1.1-alpha, for pc-linux (i686)

 

bersl2

Golden Member
Aug 2, 2004
1,617
0
0
Could it be something funky with a hardware timer? Does changing the system time have any effect on the rebooting?

This is the most bizarre behavior I have seen in a while.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |