Pages

Thursday, December 9, 2010

Nexus 1000 Interface resetting when only member of Port-Channel

This should be the first of many posts on configuring VMware ESX Hosts to use the Nexus 1000. This is a work in progress. First and foremost, I am not a networking guy and this is a very complicated configuration. The networking admin where I work, Chris Johnson, is very good and has been teaching me as we work out the problems. I'm learning this stuff with you and please post any help you can. I'll try and do the same.

One of the thousand problems I've been working with actually came the host we use to migrate machines on and off the nexus switches. We have some hosts that use the regular Vmware Distributed virtual switch (DVS) and we have test hosts that use the Nexus v1000 Distributed virtual switch.
  • All our VMware ESX hosts use Dual Port 10 Gig FCoE Emulex cards setup as trunks.
  • All hosts also have two 100 MB Nics
    • One is for for the lights out management, (Dell Remote Management Cards and a really crappy IBM knock off).
    • One is for service console not on the distributed virtual switch but the normal Virtual Switch. The service console isn't on the distributed virtual switch because we've had to many problems managing the hosts when the Emulex cards and/or nexus fail.
One host has a single 10G nic on the regular Vmware Distributed virtual switch and its other 10G nic  on Nexus v1000 Distributed virtual switch. This host is for migrating machines on and off the nexus hosts. As a result the the Nexus 10g Nic interface was setup in a port-channel by itself. The result was that the interface kept reseting every minute. It would say up for 30 seconds, then off line for 30 seconds.

Here is how this looked in the logs.

Nexus 1000 Command: show logging last 100
2010 Dec 7 12:12:07 ac02-ns1-01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet6/3 is down (Initializing)
2010 Dec 7 12:12:07 ac02-ns1-01 %ETHPORT-5-SPEED: Interface port-channel4, operational speed changed to 10 Gbps
2010 Dec 7 12:12:07 ac02-ns1-01 %ETHPORT-5-IF_DUPLEX: Interface port-channel4, operational duplex mode changed to Full
2010 Dec 7 12:12:07 ac02-ns1-01 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel4, operational Receive Flow Contol state changed to on
2010 Dec 7 12:12:07 ac02-ns1-01 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel4, operational Transmit Flow Contol state changed to on
2010 Dec 7 12:12:40 ac02-ns1-01 %ETH_PORT_CHANNEL-4-PORT_INDIVIDUAL: port Ethernet6/3 is operationally individual
2010 Dec 7 12:12:40 ac02-ns1-01 %ETHPORT-5-IF_UP: Interface Ethernet6/3 is up in mode trunk
2010 Dec 7 12:13:07 ac02-ns1-01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet6/3 is down (Initializing)
2010 Dec 7 12:13:08 ac02-ns1-01 %ETHPORT-5-SPEED: Interface port-channel4, operational speed changed to 10 Gbps
2010 Dec 7 12:13:08 ac02-ns1-01 %ETHPORT-5-IF_DUPLEX: Interface port-channel4, operational duplex mode changed to Full
2010 Dec 7 12:13:08 ac02-ns1-01 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel4, operational Receive Flow Contol state changed to on
2010 Dec 7 12:13:08 ac02-ns1-01 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel4, operational Transmit Flow Contol state changed to on
2010 Dec 7 12:13:44 ac02-ns1-01 %ETH_PORT_CHANNEL-4-PORT_INDIVIDUAL: port Ethernet6/3 is operationally individual
2010 Dec 7 12:13:44 ac02-ns1-01 %ETHPORT-5-IF_UP: Interface Ethernet6/3 is up in mode trunk
2010 Dec 7 12:14:08 ac02-ns1-01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet6/3 is down (Initializing)
2010 Dec 7 12:14:08 ac02-ns1-01 %ETHPORT-5-SPEED: Interface port-channel4, operational speed changed to 10 Gbps
2010 Dec 7 12:14:08 ac02-ns1-01 %ETHPORT-5-IF_DUPLEX: Interface port-channel4, operational duplex mode changed to Full
2010 Dec 7 12:14:08 ac02-ns1-01 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel4, operational Receive Flow Contol state changed to on
2010 Dec 7 12:14:08 ac02-ns1-01 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel4, operational Transmit Flow Contol state changed to on
2010 Dec 7 12:14:41 ac02-ns1-01 %ETH_PORT_CHANNEL-4-PORT_INDIVIDUAL: port Ethernet6/3 is operationally individual
2010 Dec 7 12:14:41 ac02-ns1-01 %ETHPORT-5-IF_UP: Interface Ethernet6/3 is up in mode trunk

ESX Command: tail /var/log/vmkernel -n 30
Dec  7 11:43:02 nkuvmhost9 vmkernel:
Dec  7 11:43:30 nkuvmhost9 vmkernel: 3:21:34:09.172 cpu8:4531)Need to send MAC Move for Inband Port
Dec  7 11:43:30 nkuvmhost9 vmkernel:
Dec  7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 60 from the ltl 18
Dec  7 11:44:02 nkuvmhost9 vmkernel:
Dec  7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 70 from the ltl 18
Dec  7 11:44:02 nkuvmhost9 vmkernel:
Dec  7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 200 from the ltl 18
Dec  7 11:44:02 nkuvmhost9 vmkernel:
Dec  7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 268 from the ltl 18
Dec  7 11:44:02 nkuvmhost9 vmkernel:
Dec  7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 274 from the ltl 18
Dec  7 11:44:02 nkuvmhost9 vmkernel:
Dec  7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 275 from the ltl 18
Dec  7 11:44:02 nkuvmhost9 vmkernel:
Dec  7 11:44:31 nkuvmhost9 vmkernel: 3:21:35:10.172 cpu8:4104)Need to send MAC Move for Inband Port
Dec  7 11:44:31 nkuvmhost9 vmkernel:
Dec  7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.376 cpu8:4319)Not removing sys vlan 60 from the ltl 18
Dec  7 11:45:02 nkuvmhost9 vmkernel:
Dec  7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.376 cpu8:4319)Not removing sys vlan 70 from the ltl 18
Dec  7 11:45:02 nkuvmhost9 vmkernel:
Dec  7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.376 cpu8:4319)Not removing sys vlan 200 from the ltl 18
Dec  7 11:45:02 nkuvmhost9 vmkernel:
Dec  7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.376 cpu8:4319)Not removing sys vlan 268 from the ltl 18
Dec  7 11:45:02 nkuvmhost9 vmkernel:
Dec  7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.377 cpu8:4319)Not removing sys vlan 274 from the ltl 18
Dec  7 11:45:02 nkuvmhost9 vmkernel:
Dec  7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.377 cpu8:4319)Not removing sys vlan 275 from the ltl 18
Dec  7 11:45:02 nkuvmhost9 vmkernel:
Dec  7 11:45:32 nkuvmhost9 vmkernel: 3:21:36:11.172 cpu8:4531)Need to send MAC Move for Inband Port

It seems we can hide this issue if we remove the interface from the port-channel. 

Warning: Before you can disable Vpc (Virutal port channel) you must remove all but one nic it from the Distubed switch in ESX.

An example on how to remove an interface from a port-channel.
  • > conf
  • > interface ethernet 6/3
  • > no channel-group 21 mode active
But why does it fail in the first place when its the single member of the port-channel?

Post a comment if you have any idea why or over at at my post at http://communities.vmware.com/message/1661219#1661219.


2 comments:

  1. Hi this is Chris J. the network guy. With Cisco's help we discovered the problem.

    Cue the short version:
    If you are going to have the Management VMNIC for the VEMs on a LACP port-channel, you have to at least have 2 ports assigned to the 1000V from a particular host. (For why see the long version) If you fall below this LACP will not come up and the Nexus 1000V will not transmit out the LACP packet even though it says it does.

    The Long Version:
    I am not 100% sure why this happens but here is some of the behavior we see. If you capture outbound packets on the port in question you will never see the packet transmitted. When I watch a VMHost come up and the VM gets started we see both links come up on the NEXUS 5000s in independent mode, which it should when a link in a LACP port-channel should in the time that the connectivity returns and the port goes from a down to an up state but before the port receives it's first LACP packet (or BPDU if you want to be technically correct). Once a port receives its first LACP packet it goes down and comes back up in Port-Channeled mode. With the 1000V this is staggered, so my supposition is that the 1000V is using the other port to get it's configuration as the first port is converting from the non VEM mode using only system VLANS to the VEM mode using the config from the 1000V. Our solution was to move the management VMNIC for the VEM (the one on the VLAN with the L3Control command set on its profile) to an additional nic with only that VLAN the trunk in the port profile. This is not a fully redundant config because if that port fails the nexus 1000V fails on that one host. The only solution I have to that is put in 2 2 port 10 gig cards with 2 going to 1 nexus 5000 and 2 going to the other so if you ever enter a failed state you always have 2 links up in your LACP Port-Channel.

    ReplyDelete
  2. Our Current Working Configs:
    Uplink for all Normal VM traffic to the Nexus 5000s:

    1000V side:
    port-profile type ethernet Uplink-VM-Internal
    vmware port-group
    switchport mode trunk
    switchport trunk native vlan 1
    switchport trunk allowed vlan 1-59,61-3967,4048-4093
    channel-group auto mode active
    no shutdown
    system vlan 1,70,200,268,274-275
    state enabled

    Nexus 5000 side:
    Port Config:
    interface Ethernet1/21
    description B11-1
    switchport mode trunk
    switchport trunk allowed vlan 1-59,61-3967,4048-4093
    spanning-tree port type edge trunk
    spanning-tree bpduguard enable
    spanning-tree bpdufilter enable
    channel-group 21 mode active

    Port-Channel Config:
    interface port-channel21
    switchport mode trunk
    vpc 21
    switchport trunk allowed vlan 1-59,61-3967,4048-4093
    spanning-tree port type edge trunk
    spanning-tree bpduguard enable
    spanning-tree bpdufilter enable
    speed 10000

    interface port-channel21
    switchport mode trunk
    vpc 21
    spanning-tree port type edge trunk
    spanning-tree bpduguard enable
    spanning-tree bpdufilter enable
    speed 10000

    All the above vlans are available, the system vlans are for our storage and the other VMNICs on the machine. This profile is designed to connect to any other LACP port-channel on any other switch. In this case it is connecting to a LACP VPC connection on two nexus 5000s. VLAN 60 is our L3Control VLAN so it is out.

    Now for our control ports:
    1000V:
    Uplink Profile Connecting in our case to a Cisco 3750:
    port-profile type ethernet Uplink-Nexus-1000V
    vmware port-group
    switchport mode trunk
    switchport trunk native vlan 1
    switchport trunk allowed vlan 60
    no shutdown
    system vlan 60
    state enabled

    Port profile for VLAN 60:
    port-profile type vethernet Networking
    capability l3control
    vmware port-group
    vmware max-ports 256
    switchport mode access
    switchport access vlan 60
    no shutdown
    system vlan 60
    state enabled

    3750 Port Config:
    interface FastEthernet1/0/14
    description B4-18
    switchport trunk encapsulation dot1q
    switchport trunk allowed vlan 60
    switchport mode trunk
    spanning-tree portfast trunk
    spanning-tree bpdufilter enable
    spanning-tree bpduguard enable

    ReplyDelete

Please leave a comment; someone, anyone!