One of the thousand problems I've been working with actually came the host we use to migrate machines on and off the nexus switches. We have some hosts that use the regular Vmware Distributed virtual switch (DVS) and we have test hosts that use the Nexus v1000 Distributed virtual switch.
- All our VMware ESX hosts use Dual Port 10 Gig FCoE Emulex cards setup as trunks.
- All hosts also have two 100 MB Nics
- One is for for the lights out management, (Dell Remote Management Cards and a really crappy IBM knock off).
- One is for service console not on the distributed virtual switch but the normal Virtual Switch. The service console isn't on the distributed virtual switch because we've had to many problems managing the hosts when the Emulex cards and/or nexus fail.
One host has a single 10G nic on the regular Vmware Distributed virtual switch and its other 10G nic on Nexus v1000 Distributed virtual switch. This host is for migrating machines on and off the nexus hosts. As a result the the Nexus 10g Nic interface was setup in a port-channel by itself. The result was that the interface kept reseting every minute. It would say up for 30 seconds, then off line for 30 seconds.
Here is how this looked in the logs.
Nexus 1000 Command: show logging last 100
2010 Dec 7 12:12:07 ac02-ns1-01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet6/3 is down (Initializing)
2010 Dec 7 12:12:07 ac02-ns1-01 %ETHPORT-5-SPEED: Interface port-channel4, operational speed changed to 10 Gbps
2010 Dec 7 12:12:07 ac02-ns1-01 %ETHPORT-5-IF_DUPLEX: Interface port-channel4, operational duplex mode changed to Full
2010 Dec 7 12:12:07 ac02-ns1-01 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel4, operational Receive Flow Contol state changed to on
2010 Dec 7 12:12:07 ac02-ns1-01 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel4, operational Transmit Flow Contol state changed to on
2010 Dec 7 12:12:40 ac02-ns1-01 %ETH_PORT_CHANNEL-4-PORT_INDIVIDUAL: port Ethernet6/3 is operationally individual
2010 Dec 7 12:12:40 ac02-ns1-01 %ETHPORT-5-IF_UP: Interface Ethernet6/3 is up in mode trunk
2010 Dec 7 12:13:07 ac02-ns1-01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet6/3 is down (Initializing)
2010 Dec 7 12:13:08 ac02-ns1-01 %ETHPORT-5-SPEED: Interface port-channel4, operational speed changed to 10 Gbps
2010 Dec 7 12:13:08 ac02-ns1-01 %ETHPORT-5-IF_DUPLEX: Interface port-channel4, operational duplex mode changed to Full
2010 Dec 7 12:13:08 ac02-ns1-01 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel4, operational Receive Flow Contol state changed to on
2010 Dec 7 12:13:08 ac02-ns1-01 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel4, operational Transmit Flow Contol state changed to on
2010 Dec 7 12:13:44 ac02-ns1-01 %ETH_PORT_CHANNEL-4-PORT_INDIVIDUAL: port Ethernet6/3 is operationally individual
2010 Dec 7 12:13:44 ac02-ns1-01 %ETHPORT-5-IF_UP: Interface Ethernet6/3 is up in mode trunk
2010 Dec 7 12:14:08 ac02-ns1-01 %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet6/3 is down (Initializing)
2010 Dec 7 12:14:08 ac02-ns1-01 %ETHPORT-5-SPEED: Interface port-channel4, operational speed changed to 10 Gbps
2010 Dec 7 12:14:08 ac02-ns1-01 %ETHPORT-5-IF_DUPLEX: Interface port-channel4, operational duplex mode changed to Full
2010 Dec 7 12:14:08 ac02-ns1-01 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel4, operational Receive Flow Contol state changed to on
2010 Dec 7 12:14:08 ac02-ns1-01 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel4, operational Transmit Flow Contol state changed to on
2010 Dec 7 12:14:41 ac02-ns1-01 %ETH_PORT_CHANNEL-4-PORT_INDIVIDUAL: port Ethernet6/3 is operationally individual
2010 Dec 7 12:14:41 ac02-ns1-01 %ETHPORT-5-IF_UP: Interface Ethernet6/3 is up in mode trunk
ESX Command: tail /var/log/vmkernel -n 30
Dec 7 11:43:02 nkuvmhost9 vmkernel:
Dec 7 11:43:30 nkuvmhost9 vmkernel: 3:21:34:09.172 cpu8:4531)Need to send MAC Move for Inband Port
Dec 7 11:43:30 nkuvmhost9 vmkernel:
Dec 7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 60 from the ltl 18
Dec 7 11:44:02 nkuvmhost9 vmkernel:
Dec 7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 70 from the ltl 18
Dec 7 11:44:02 nkuvmhost9 vmkernel:
Dec 7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 200 from the ltl 18
Dec 7 11:44:02 nkuvmhost9 vmkernel:
Dec 7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 268 from the ltl 18
Dec 7 11:44:02 nkuvmhost9 vmkernel:
Dec 7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 274 from the ltl 18
Dec 7 11:44:02 nkuvmhost9 vmkernel:
Dec 7 11:44:02 nkuvmhost9 vmkernel: 3:21:34:41.376 cpu3:4319)Not removing sys vlan 275 from the ltl 18
Dec 7 11:44:02 nkuvmhost9 vmkernel:
Dec 7 11:44:31 nkuvmhost9 vmkernel: 3:21:35:10.172 cpu8:4104)Need to send MAC Move for Inband Port
Dec 7 11:44:31 nkuvmhost9 vmkernel:
Dec 7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.376 cpu8:4319)Not removing sys vlan 60 from the ltl 18
Dec 7 11:45:02 nkuvmhost9 vmkernel:
Dec 7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.376 cpu8:4319)Not removing sys vlan 70 from the ltl 18
Dec 7 11:45:02 nkuvmhost9 vmkernel:
Dec 7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.376 cpu8:4319)Not removing sys vlan 200 from the ltl 18
Dec 7 11:45:02 nkuvmhost9 vmkernel:
Dec 7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.376 cpu8:4319)Not removing sys vlan 268 from the ltl 18
Dec 7 11:45:02 nkuvmhost9 vmkernel:
Dec 7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.377 cpu8:4319)Not removing sys vlan 274 from the ltl 18
Dec 7 11:45:02 nkuvmhost9 vmkernel:
Dec 7 11:45:02 nkuvmhost9 vmkernel: 3:21:35:41.377 cpu8:4319)Not removing sys vlan 275 from the ltl 18
Dec 7 11:45:02 nkuvmhost9 vmkernel:
Dec 7 11:45:32 nkuvmhost9 vmkernel: 3:21:36:11.172 cpu8:4531)Need to send MAC Move for Inband Port
It seems we can hide this issue if we remove the interface from the port-channel.
Warning: Before you can disable Vpc (Virutal port channel) you must remove all but one nic it from the Distubed switch in ESX.
An example on how to remove an interface from a port-channel.
- > conf
- > interface ethernet 6/3
- > no channel-group 21 mode active
But why does it fail in the first place when its the single member of the port-channel?
Post a comment if you have any idea why or over at at my post at http://communities.vmware.com/message/1661219#1661219.
Hi this is Chris J. the network guy. With Cisco's help we discovered the problem.
ReplyDeleteCue the short version:
If you are going to have the Management VMNIC for the VEMs on a LACP port-channel, you have to at least have 2 ports assigned to the 1000V from a particular host. (For why see the long version) If you fall below this LACP will not come up and the Nexus 1000V will not transmit out the LACP packet even though it says it does.
The Long Version:
I am not 100% sure why this happens but here is some of the behavior we see. If you capture outbound packets on the port in question you will never see the packet transmitted. When I watch a VMHost come up and the VM gets started we see both links come up on the NEXUS 5000s in independent mode, which it should when a link in a LACP port-channel should in the time that the connectivity returns and the port goes from a down to an up state but before the port receives it's first LACP packet (or BPDU if you want to be technically correct). Once a port receives its first LACP packet it goes down and comes back up in Port-Channeled mode. With the 1000V this is staggered, so my supposition is that the 1000V is using the other port to get it's configuration as the first port is converting from the non VEM mode using only system VLANS to the VEM mode using the config from the 1000V. Our solution was to move the management VMNIC for the VEM (the one on the VLAN with the L3Control command set on its profile) to an additional nic with only that VLAN the trunk in the port profile. This is not a fully redundant config because if that port fails the nexus 1000V fails on that one host. The only solution I have to that is put in 2 2 port 10 gig cards with 2 going to 1 nexus 5000 and 2 going to the other so if you ever enter a failed state you always have 2 links up in your LACP Port-Channel.
Our Current Working Configs:
ReplyDeleteUplink for all Normal VM traffic to the Nexus 5000s:
1000V side:
port-profile type ethernet Uplink-VM-Internal
vmware port-group
switchport mode trunk
switchport trunk native vlan 1
switchport trunk allowed vlan 1-59,61-3967,4048-4093
channel-group auto mode active
no shutdown
system vlan 1,70,200,268,274-275
state enabled
Nexus 5000 side:
Port Config:
interface Ethernet1/21
description B11-1
switchport mode trunk
switchport trunk allowed vlan 1-59,61-3967,4048-4093
spanning-tree port type edge trunk
spanning-tree bpduguard enable
spanning-tree bpdufilter enable
channel-group 21 mode active
Port-Channel Config:
interface port-channel21
switchport mode trunk
vpc 21
switchport trunk allowed vlan 1-59,61-3967,4048-4093
spanning-tree port type edge trunk
spanning-tree bpduguard enable
spanning-tree bpdufilter enable
speed 10000
interface port-channel21
switchport mode trunk
vpc 21
spanning-tree port type edge trunk
spanning-tree bpduguard enable
spanning-tree bpdufilter enable
speed 10000
All the above vlans are available, the system vlans are for our storage and the other VMNICs on the machine. This profile is designed to connect to any other LACP port-channel on any other switch. In this case it is connecting to a LACP VPC connection on two nexus 5000s. VLAN 60 is our L3Control VLAN so it is out.
Now for our control ports:
1000V:
Uplink Profile Connecting in our case to a Cisco 3750:
port-profile type ethernet Uplink-Nexus-1000V
vmware port-group
switchport mode trunk
switchport trunk native vlan 1
switchport trunk allowed vlan 60
no shutdown
system vlan 60
state enabled
Port profile for VLAN 60:
port-profile type vethernet Networking
capability l3control
vmware port-group
vmware max-ports 256
switchport mode access
switchport access vlan 60
no shutdown
system vlan 60
state enabled
3750 Port Config:
interface FastEthernet1/0/14
description B4-18
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 60
switchport mode trunk
spanning-tree portfast trunk
spanning-tree bpdufilter enable
spanning-tree bpduguard enable