technical support

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Feb 25, 2011
16,822
1,493
126
Strange. I can reproduce that PSOD with a stand alone ESXi host. I am fairly certain that is a host bug and has nothing to do with vCenter itself and I am a bit doubtful that the upgrade order of vcenter vs hosts would mask that issue. I mean I wouldn't upgrade to vCenter 5.5 until I had put ESXi 5.1u2 on the hosts first via vCenter 5.1.

Dunno, exactly. I just know we were running a 5.1 cluster w/o issues for a long time. Then replaced vCenter with a new box running 5.5 and BOOM. 6 Host KPs in two days.

:shrug:

It said it was supported. Like I said. n00b.
 

saratoga172

Golden Member
Nov 10, 2009
1,564
1
81
Out of curiosity which virtual network driver are you using for the vcenter server? The vmxnet3 driver is supposed to help with network traffic causing CPU spikes.
 

ForumMaster

Diamond Member
Feb 24, 2005
7,797
1
0
So here's the breakdown.

Our vCenter environment is not functioning. It spikes to 100% cpu and just stops working. I've tried all the usual support things, nothing is wrong with the database server (and all other database on that server are fine). It's the vcenter procoess vpxd.exe that is at 100%. I should also point out that I've been using vsphere though versions 3-5 and I'm a current VCP holder (going to take my vacp soon).

This is preventing all backups from running as they need vcenter to initiate the backups. I log a ticket with vmware in the morning two days ago. I get a call shortly after and the tech starts to do his thing. I tend to let techs try what they think is right. I've worked phone tech support and I remember those asshole admins who think they know everything, yet had to dial support. After a few hours he is no closer to the problem, but it's his time to go home, so he transfers me to a new tech in another timezone. This tech of course has to start over, because in IT the other guy is obviously an idiot. He suggests we reinstall vcenter. I allow it because I honestly don't know what the hell is wrong. Nothing is solved.

Fast forward another 2 hours and it's this guys time to go home. I get transferred yet again, this time to india (I think) (around the world in one phone call?). This guy quickly realizes he needs help and involves a senior engineer. The senior engineer checks all the things I checked before calling (but I understand) and then suggests we reinstall vcenter. I'm hesitant because we have done this before. He explains that the last guy probably didn't uninstall it properly. I relent and we reinstall vcenter. This causes no relief. He continues working and eventually settles in that the problem is a new ESXi host we added to the cluster a few days ago. We can't remove it, because we can't login to vcenter. He gets another engineer and we go to the database and SQL out the ESXi host. At first CPU usage drops, but then the problem comes right back. At this point it's almost 3am my time, I've been on a call well over 15 hours. I suggest we shelve it until the morning, obviously they need time to work on the issue. They download some logs and dmp files from vcenter and we call it quits. I go get a burger at steak and shake, it was pretty gross and the only food I had to eat after 11am. Bedtime is around 4:30am.

I get into work about 9am the next day (yesterday), the day of my wedding anniversary. vCenter is still down, my boss not terribly happy. I call vmware and they begin their work again around 10am. New tech as the old tech can't be reached. He again explains that everyone else must be an idiot and begins the process a new. He decides it must also be a host issue and we go though a tedious process of rebooting all esxi hosts. To my surprise (and I'm pretty sure also unrelated) vcenter starts to run slightly better even though the cpu usage stays pegged around 80-90%. He then suspects it's a storage issue and brings in a storage expert. They work on the system until about 5pm (My wife brought me lunch so I could eat!) and nothing was resolved. They downloaded the log bundle for vcenter and all hosts and left me so they could analyse the issue.

The result is I still can't not run backups. I still can not manage my VMs though vCenter and I still have no idea what is causing it. I suspect today will be more of the same.

To the support representatives credit, they have been very polite and have tried very hard to fix my problem. I really appreciate their hard work. At this point I could have rebuilt the server. The reason I haven't is because I don't want this problem to pop up again, I want to know the root cause.

In any case, I made that post simply because I needed to vent to someone and my phone was handy.

damn man., sound like hell. reminds me of way too many support tickets we had. weirdest one i recall with vmware was one involving a RHEL oracle db that was stuck. lvl 1 tech restarted the guest via the vc.

result? the data store became corrupt an we lost data! literally the server "erased" itself. apparently a an uncommon, but known bug to the folks at red hat.
 

ViviTheMage

Lifer
Dec 12, 2002
36,190
85
91
madgenius.com
try installing it on another box, possibly a version of windows lower?

installing it on a dedicated box, not virtual, if you are installing it on a virtual box?

any windows updates get pushed out, that may include drivers?
 

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
I'm just a vCenter n00b, and I understand Linux guys hate reinstalling stuff, but if you'd just installed a new vCenter instance and pointed it at the old DB, wouldn't it have:

1) Probably fixed it (assuming you're right and the DB is fine)
2) Taken you a couple hours, tops.

?

We tried a reinstall multiple times (each tech wanted to do one). It didn't change a thing. More on that in a moment.

Have you upgraded to 5.5 yet? I think those support multiple vCenter servers in an HA thingy.
Yea, this is 5.5. I've done at least 16 upgrades to 5.5 for various clients in my last job.

If you haven't upgraded to 5.5, do your hosts first. Upgrading to vCenter 5.5 with a bunch of 5.1 hosts exposed this bug for us. Random hosts started KPing. It was a couple days of hell.

Hit this bug on the first client I ever upgraded. At the time vmware didn't acknowledge that it was a bug. I found a forum post that suggested changing the adapters to vmxnet3. I bit of powercli and powershell and it was nice and fixed.

I gave up waiting for vmware to fix it. I stood up a fresh vcenter vm with a clean database. Used powershell to create standard port groups for each of the distributed port groups and migrate all vm's to use those standard port groups for networking. Ran another script to setup the VDS on the new vcenter and create all the clusters, folders, etc. Detached the hosts from the old vcenter and attached them to the new. Finally ran another script to put all vm's back to their VDS port groups and appropriate folders.

CPU has stayed low for a few hours now. Everything is good. Time will tell if it stays that way. I still need to re-setup vshield and ops manager.

VMWare got back to me stating there is a UCS driver that needs upgrading for the fiber adapter. The thing is the version it says I need to upgrade to is the version I'm currently on....
 
Last edited:

TheInternet1980

Golden Member
Jan 9, 2006
1,651
1
76
Not a fan of vMWare.

My Hyper-V instances with System Center VMM as the management interface (30 across 5 servers + redundancy boxes) have been rock solid.

Feel your pain though. Have worked on seriously messed up vCenter environments before. Problems with vMWare generally mean a pain in the ass for a while.
 

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
Not a fan of vMWare.

My Hyper-V instances with System Center VMM as the management interface (30 across 5 servers + redundancy boxes) have been rock solid.

Feel your pain though. Have worked on seriously messed up vCenter environments before. Problems with vMWare generally mean a pain in the ass for a while.

To be fair, I've only ever needed to call vmware support twice in the last 10 years (Both because of 5.5 oddly). For the most part the environments I've worked in have been rock solid. My last employer was all hyper-v and it is also pretty nice to use.

Our current environment is 30 hosts and about 425 virtual machines (mostly windows). It's the largest I've ever worked in. The last VMWare environment I worked in was 8 hosts 200 virtual machines (all redhat) and we went from 3.5 all the way to 5.1 without ever having a single instance of downtime or incident with VMWare. I'm going to be helping my replacement there do their 5.5 upgrade soon. I don't anticipate any issues.
 
Last edited:

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
To be fair, I've only ever needed to call vmware support twice in the last 10 years (Both because of 5.5 oddly). For the most part the environments I've worked in have been rock solid. My last employer was all hyper-v and it is also pretty nice to use.

Our current environment is 30 hosts and about 425 virtual machines (mostly windows). It's the largest I've ever worked in. The last VMWare environment I worked in was 8 hosts 200 virtual machines (all redhat) and we went from 3.5 all the way to 5.1 without ever having a single instance of downtime or incident with VMWare. I'm going to be doing helping my replacement there do their 5.5 upgrade soon. I don't anticipate any issues.

Same. The number of clients with Hyper-V issues far out way the VMWare tickets. It is just when VMWare breaks that one time in 10 years, it breaks hard.
 

mvbighead

Diamond Member
Apr 20, 2009
3,793
1
81
I would think running ProcMon and perhaps using procexp to suspend the hung task on the old server might at least lead you to whatever the vCenter box is doing to give you an idea of where the hangup is.

As to the comment about HyperV being rock solid and having troubles with VMWare... I'm in a completely different boat. That is not to say I don't like HyperV as a product, but just that I feel that VMWare has a far better layout in terms of networking and storage.

I setup one HyperV environment that was a simple two host configuration, and last I left it was perfectly fine. But I have dealt far more with VMWare and the configuration is much much easier. (I believe the HyperV boxes I had to rebuild 2-3 times because the network configuration of the host kept getting borked and lost on reboot. I finally realized that configuring the vSwitch and then setting a static IP for accessing the host was the way to go.

All in all, seldom do you see issues with VMWare, but they are not immune to a few bugs.
 

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
I would think running ProcMon and perhaps using procexp to suspend the hung task on the old server might at least lead you to whatever the vCenter box is doing to give you an idea of where the hangup is.

As to the comment about HyperV being rock solid and having troubles with VMWare... I'm in a completely different boat. That is not to say I don't like HyperV as a product, but just that I feel that VMWare has a far better layout in terms of networking and storage.

I setup one HyperV environment that was a simple two host configuration, and last I left it was perfectly fine. But I have dealt far more with VMWare and the configuration is much much easier. (I believe the HyperV boxes I had to rebuild 2-3 times because the network configuration of the host kept getting borked and lost on reboot. I finally realized that configuring the vSwitch and then setting a static IP for accessing the host was the way to go.

All in all, seldom do you see issues with VMWare, but they are not immune to a few bugs.

I did a little of that, it really is vxpd.exe using all the CPU. If you disable the vcenter service, the server runs about 8-10%. The probably hasn't recreated itself yet on the new vcenter. So far we are running great. I'm starting to think it might have been database corruption.
 

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
I did a little of that, it really is vxpd.exe using all the CPU. If you disable the vcenter service, the server runs about 8-10%. The probably hasn't recreated itself yet on the new vcenter. So far we are running great. I'm starting to think it might have been database corruption.

Yeah so as an FYI if you plan to fully reinstall to start fresh, you can actually stop vxpd.exe. Backup your database, run from the command line vxpd.exe -b, wait for it to be done and then start vxpd.exe. -b wipes the database and replaces it with a blank one. Then you need to run a single command to register vcenter with SSO again.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |