Home server keeps rebooting. Suggestions?

Geofram

Member
Jan 20, 2010
120
0
76
Here's where I'm at:

I've got Gen 1 i7 that's currently my home server. It's running Windows 2012 Essentials. It has 8 HDDs (one is a SSD, for the OS), a generic old PCI video card (probably 10 years old), and 4 NICS (this was an experiment learning about NIC teaming). The power supply is 550 Wt Antec - I don't remember the specific model.

Recently, in the last week, it's begun to reboot spontaneously. It usually will stay up for a while - at this point I'd say 8 hours is the longest it will go, but sometimes less. When it happens, the Windows logs do not report anything odd. Meaning, if I go look in the logs just before the reboot, there is nothing listed. No errors at all. Just the logs for it coming back up again.

This happened almost at the same time that I swapped from WHS 2011 to Windows 2012 Essentials. I first installed Essentials about a week before this started. I don't *believe* it's the OS.

Things I've checked:
1) Memory. This was my first thought, so I rebooted and ran memtest for a couple of hours. No errors reported.
2) Drivers. This was my second thought. I guessed that some driver might be giving Windows fits, and causing instability. However, no driver changes have affected uptime, and I kind of believe the Windows logs would report a driver error or something before the computer went down.

I'm now looking for suggestions. Power supply possibly? Or some other part going bad? It's not overheating, as I've never seen temperature spikes or any kind of thermal warnings, and I expect it would completely shut down if that was the problem. Any ideas?
 

Zap

Elite Member
Oct 13, 1999
22,377
2
81
Can you run it for a couple days with just the boot drive? This would put a lower load on the power supply. If it suddenly becomes reliable, then the PSU output may be degraded.
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
There should be something in the event log, even if it is just a message stated that the previous shutdown was unexpected. Go to Event Viewer -> Custom Views -> Administrative Events to filter out everything except for Errors and Warnings.
 

Geofram

Member
Jan 20, 2010
120
0
76
There should be something in the event log, even if it is just a message stated that the previous shutdown was unexpected. Go to Event Viewer -> Custom Views -> Administrative Events to filter out everything except for Errors and Warnings.

You're correct; it is noting that the shutdown was unexpected. After some more digging, I did find where it's doing a dump of the error as well. I haven't been able to decode it yet.

However, I did try to remove the NIC teaming, and it did help somewhat. The NICs are all Intel CT cards. Here's what I've seen.

1) A different power supply made no difference. I swapped in a 650 I had laying around extra and the crash still happened.

2) Disabling teaming (set up in the Intel Drivers) made no difference. In this situation I turned off the teaming, and just left the NICs in the system, with only one configured. Crash still happened after a few hours.

3) "Disabling" the other NICs while they are still in the system (from the windows device manager) seemed to help stability quite a bit. The system ran for 12 hours without a crash.

4) Turning them back on and setting up a team using the new Windows 2012 NIC Teaming option resulted in the crash coming back.

5) Taking out all but two of them causes the crash to happen as well; I tried two different cards in two different slots, to try to figure out if one of the NICs was simply going bad. Crash happened in both situations.

At this point, I'm running it with a single NIC, with all the other cards pulled from the system entirely. It's been running for about 24 hours.

So, my takeaway from this is that there is probably a driver problem with the Intel NICs in Server 2012. I find this very odd, since it isn't even specific to having teaming enabled - the crash still happened if I had multiple NICs in the computer that were not configured. But just running off of one seems to make it happy, even though the one NIC is the same model as all the other ones.

I'm just not sure what to make of this overall. Having the NIC team is totally unnecessary in my situation - it was mostly just about learning how to set them up, etc - but now it's got a bug in my ear and I want to figure out why it doesn't work. It worked flawlessly in Windows 2008R2, and seems to just cause crashes in 2012. I mean, I can see Teaming causing problems; I don't see why having them in the computer, not set up in a team, would cause the same.
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
Download WhoCrashed and let that analyze the dumps. At the minimum it should tell you what module was running when the problem occured. If it's the Intel NIC driver, you have your culprit. If it isn't the NIC driver, that doesn't necessarily mean the NIC isn't the culprit; it could be introducing some invalid state that isn't detected until later.

When you did your 3) above, did the system crash after 12 hours or did you stop testing? IF so, my theory is that having the devices enabled causes the driver to be loaded multiple times, thus triggering a bug in the driver. Server 2008R2 loads network drivers differently than 2012 does (necessary because of new MS teaming) and so could trigger a big that wasn't present before.

The dirty little secret about drivers for different versions of Windows is that they really aren't all that different. They share the same core code with little shims put in place to support the different interfaces that different Windows versions expect.
 

Geofram

Member
Jan 20, 2010
120
0
76
Download WhoCrashed and let that analyze the dumps. At the minimum it should tell you what module was running when the problem occured. If it's the Intel NIC driver, you have your culprit. If it isn't the NIC driver, that doesn't necessarily mean the NIC isn't the culprit; it could be introducing some invalid state that isn't detected until later.

When you did your 3) above, did the system crash after 12 hours or did you stop testing? IF so, my theory is that having the devices enabled causes the driver to be loaded multiple times, thus triggering a bug in the driver. Server 2008R2 loads network drivers differently than 2012 does (necessary because of new MS teaming) and so could trigger a big that wasn't present before.

The dirty little secret about drivers for different versions of Windows is that they really aren't all that different. They share the same core code with little shims put in place to support the different interfaces that different Windows versions expect.

I stopped testing at that point, since it had been up longer than I had seen in a long time.

As for where things stand now - it did, eventually ,crash again using just 1 NIC. It just took ~24 hours. My final move was to uninstall the driver from Intel, and let it fall back to the default driver built into Server 2012 (it does have one, thankfully). In fact, in desperation, I've uninstalled ALL drivers except the Intel Chipset driver. This means no Intel RST, and I ditched the driver for my JMicron SATA ports. I wanted a "clean slate" since everything basically does function correctly without the additional drivers.

Results? The crashes have stopped. At least, in the last 48 hours, I have yet to see it go down again. I'm still leaning towards the NIC driver being the culprit. If things continue to work fine I'll probably install drivers back one at a time (very, very slowly) to see if I can find where crashing gets introduced.
 

Geofram

Member
Jan 20, 2010
120
0
76
I just ran WhoCrashed and found the following from the most recent crashes:

On Tue 11/13/2012 8:01:21 PM GMT your computer crashed
crash dump file: C:\Windows\Minidump\111312-24429-01.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x7AD40)
Bugcheck code: 0x133 (0x0, 0x283, 0x282, 0x0)
Error: DPC_WATCHDOG_VIOLATION
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL or above.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem. This problem might be caused by a thermal issue.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.



On Tue 11/13/2012 8:01:21 PM GMT your computer crashed
crash dump file: C:\Windows\memory.dmp
This was probably caused by the following module: hal.dll (hal!HalSetTimeIncrement+0x3F54)
Bugcheck code: 0x133 (0x0, 0x283, 0x282, 0x0)
Error: DPC_WATCHDOG_VIOLATION
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL or above.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem. This problem might be caused by a thermal issue.
The crash took place in a standard Microsoft module. Your system configuration may be incorrect. Possibly this problem is caused by another driver on your system that cannot be identified at this time.

Looks like it can't tell what driver was causing the problems.
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
OK, so it looks like a driver is taking too long to service an interrupt. Intel's drivers are typically pretty high quality, so I wouldn't expect them to make an amateurish mistake like that. JMicron on the other hand...
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |