DIX, ethernet and 802.3

ktwebb

Platinum Member
Nov 20, 1999
2,488
1
0
Here is the situation. Had a blade chassis running 7 instances of ESX 3.0 panic and crash at the same time. Now I am a virtualization advocate. Been in the VMware world for the last couple of years.

So we sent dump logs, SAN logs, etc.. to vmware to parse and tell us what happened. Took two weeks but this was the reply "The root cause of the crashes on systems Host4, 6, and 7 were due to a problem with the vmkernel that's caused by hardware checksumming and seems to only be hit when the guest is sending out or receiving 802.3 packets. What application is doing this? The workaround for this......"

Now, 802.3 to me is ethernet. Basic wiki type understanding of it. Family of protocols, ranging from 10base2 to GigE. I'm a windows sysadmin so I have no pride factor here. I just need to understand what happened and the blurb above didn't do it for me. So I wrote back. Could you elaborate, my understanding is 802.3 is CSMA/CD, ethernet standard. Help me understand what you wrote, etc....

So the response "Actually, the most common Ethernet protocol is DIX. 802.3 is more obscure"

This of course, did not help me at all and frankly I feel like the guy was tweaking me because he thought I was questioning him. Well it took the guy two weeks but I wasn't. I was just trying to digest.

Anyway, what I am looking for is, based on his emails and responses, some clarification. The VM's are all windows boxes. For reference, right before this event, ALOT of print server VM's were created with IBM LLC2 installed as a protocol. All Server 2003 machines that I suspect is the culprit, but I would like to get my mind around why they are, IF they are. Make sense?

Any help is greatly appreciated.
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
They're blowing smoke up your arse. Sounds like you came across a MAJOR bug.

typical application folks - blame the network.

Nail these farts to the wall. 802.3 encompasses just about all things ethernet. Have them quote the specific IEEE frame type. They are literally blowing smoke up your arse. Push back.

god I hate developers.

re-read your OP. Definately a severe bug. He's full of sh!t.
 

ktwebb

Platinum Member
Nov 20, 1999
2,488
1
0
Well I did find some literature that defines a difference in frame type between DIX or ethernet II and 802.3.

"How to differentiate between an 802.3 frame and an Ethernet II frame?
The value of 'length' field in an 802.3 frame must be less than 1500 and in a Ethernet II frame the value of 'type' field must be more than 1500. Since the 802.3 frame 'length' field and the Ethernet II frame 'type' field are at the same offset from the header, depending on the value present, the frame can be differentiated. "

As I said, I have no pride issue here. I am network challenged. No reason to deny it. So much so that I dont' feel comfortable pushing back as I am not in a position of knowledge to debate it openly. Plus, these blades panic'd in a data center that I don't currently manage. I am in the middle of a merger and the other data center just got these Bladecenter chassis and added the hosts to our virtual center server and I've been helping them configure and so forth. But ultimately this isn't my fight. I was really just asking the guy to be more descriptive, which he clearly was not going to do.

So if I respond to him I should just ask for the specific frame type?

Oh, and thanks Spidey. I was hoping you'd be someone who'd respond.

Also, I logged on to one of the server I suspect as a culprit. Its got an IBM communication server installed and looks to be using SNA on some level. I am ignorant to what it is these servers do exactly other than they are print servers that communicate with the IBM mainframe. SNA over IP instead of DLC from the looks of it. If that provides any "aha's"

 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
The L2 ethertype is part of the 802.3 standard. It's essentially a L2 description of the frame type so that receiving stations know what to do with it. It's crucial to communication.

You mentioned LLC2 - a layer2 protocol with a different L2 ethertype.

You have a bug, most likely a bug in the driver because some dumbass didn't want to follow 802.3 standards. Happens all the time. god I hate developers.

This is THEIR problem. Provide them captures and have them chew on it. If they push back, get your account exec involved. You have a bug.

-edit-
you mentioned LLC2, that means SNA. It's a different L2 ethertype. All the more pointing to the developer not understanding that there is more than one ethertype. A small 4 bit field in the L2 header. I forget the exact length of this field but way back when we ran a half dozen different protocols on the same broadcast domain (IP, IPX, RIP, APPLETALK, etc) framing was extremely important.

You have a bug. they should fix it. In the driver they're assuming the ethertype is constant. It is NOT constant and it is a flag they need to know how to handle.

I hate developers and I SHOULD NOT have to tell them how to write code.
 

ktwebb

Platinum Member
Nov 20, 1999
2,488
1
0
lol. I am more confused than I was. I appreciate you trying.

I think I'll leave it up to the guys that are responsible for the servers that had the panic/crash. If I stay with the company I'll eventually be up there and it'll be my problem but my vacation starts today. I really need to see it to digest it fully anyway. All the literature I've read since I got the email assumes knowlege that I don't have so it simply doesn't make much sense without filling in those blanks. And then you add the virtualization component and the how the host OS handles the the guest network traffic add another layer of complexity. Virtualswitch with portgroups that handle VM traffic, Console traffic, VMotion et al.

I am not going to ask you to explain it in the detail it would take for it to click. Thanks though. I do appreciate the effort.
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
screw that. send some of my post back to them.

leave out the politically incorrect stuff.

They sent you a lame duck answer. Escalate. PM me if I can call directly and can have a case number. I know it seems like I act like a badarase on the forums but i do have deep contacts within most of the major players, if not I can get you in touch with the right folks.

you're reaching out, I am here to help.
 

ktwebb

Platinum Member
Nov 20, 1999
2,488
1
0
And I appreciate it.

There are some factors in play that would take too long to get into in detail but suffice it so say, I don't want to own this problem. I want to assist where and when I can and I'll will definitely give my input backed up by my badarse guru at anandtech. But if the situation was reversed, I wouldn't want the person that is really responsible for following through on this particular issue, to take ownership of a problem I would be responsible for if the scenario was reversed.

There was an attempt to dump it on me the night it happened. I put the kabosh on that. I embrace helping with the setting up of the Vi3 environment. Confuring, tuning and even troubleshooting problems but when push comes to shove, I have 35 ESX hosts and 180 or so VM's to maintain at my location. By taking the lead here I would in essence provide an opportunity for all problems to be directed solely to me. While that might be looked at as a plus in some scenarios you just have to trust me that it wouldn't in this one. If you've been through a big merger, and I suspect you have at some point, you can probably read between the lines about what I am talking about. Politically this is a very unpleasant experience. Until I am compensated properly there is only so much of a load I will accept. =

All that may seem like TMI, but I didn't want you to think I was dismissing your aid. I honestly do appreciate it.

And like I said, I am off for a week starting tonight. If I did decide to take the lead, then I am in it until resolution, which means I would be going in, getting with out network guys, then a conference call with VMWare probably. Probably also getting the TAM involved. Figures this email would come in at 5 pm on the night I go on call. I should have just deleted it. I was only cc'd anyway.

Anyway, thanks spidey. Just what I expected of you.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |