Cisco VSS: Failure scenarios
In the last article, I explained how to configure the Cisco 6500 in VSS configuration, but how does the VSS reacts during a failure? There are three possible scenarios:
- Link failure within a multichassis Cisco etherchannel link
- Active supervisor engine failure
- VSL failure
Scenario #1: Link failure within a multichassis Cisco etherchannel link
Availability is not affected for those data flows that do not use the failed link. For those traffic flows that use the failed link, the effect consists of the time it takes to detect the link failure and reprogram the indices within the system.
When all link connected to a Cisco 6500 are failed (in this case there is only one link for each 6500), the port bundle is converted from a multichassis Cisco EtherChannel link to a standard Cisco EtherChannel link, and is treated as a single-homed port.
Remember: The supervisor engine on the active virtual switch is also responsible for programming the hardware forwarding information onto all the distributed forwarding cards across the entire Cisco Virtual Switching System. It also programs the policy feature card on the standby virtual switch supervisor engine. For these reasons, both the active and the hot-standby supervisor engine PFCs are active, and are used to perform packet lookups for centralized lookups on each chassis.
For these reasons, if a packet reaches the standby virtual switch there are two different behaviours:
- If a packet is software switched, the packet is sent to the active virtual switch through the VSL.
- If a packet is hardware switched, the packet is managed by the standby virtual switch.
Scenario #2: Active supervisor engine failure
The standby supervisor engine can detect the failure of the active supervisor engine using one of the following methods:
- VSL Protocol (VSLP)
- Cisco Generic Online Diagnostics (GOLD) failure event
- Full VSL link down
Upon detecting the failure of the active supervisor, the hot-standby supervisor engine performs an SSO switchover and assumes the role of the active supervisor.
During the transition, there is a disruption to the traffic that must transition away from the failed chassis. The duration of traffic disruption is determined by the time required to transition the role of the hot-standby supervisor engine to the active supervisor engine, and for the neighboring device to modify its path selection to the newly active chassis.
Scenario #3: VSL failure
The failure of a single VSL link is discovered by the active supervisor engine, either through a link-down event or through the failure of periodic VSLP messages sent across the link to check the VSL link state. Availability is not affected for those data flows that do not use the VSL.
The active supervisor engine discovers the failure of the “entire” VSL either through a link-down event or through the failure of the periodic VSLP messages sent across the member links to check the VSL link status. From the perspective of the active virtual switch chassis, the standby virtual switch is lost. The standby virtual switch chassis also views the active virtual switch chassis as failed and transitions to active virtual switch state through an SSO switchover. This scenario is known as a dual-active scenario and the duplication of this configuration can possibly have adverse effects to the network topology and traffic.
To avoid this disruptive scenario, you should configure one of these methods:
- Enhanced PAgP
- Layer 3 BFD
- Fast Hello
In this case the Fast hello link method is implemented.
Upon detecting the dual-active condition, the original active chassis enters into recovery mode and brings down all of its interfaces except the VSL and nominated management interfaces. This effectively removes the device from the network.
You will see the following messages on the active virtual switch to indicate that a dual-active scenario has occurred:
CiscozineVSS# Jan 23 11:57:37.647: %VSLP-SW1_SP-3-VSLP_LMP_FAIL_REASON: Te1/5/5: Link down Jan 23 11:57:37.647: %VSLP-SW1_SP-2-VSL_DOWN: Last VSL interface Te1/5/5 went down Jan 23 11:57:37.735: %VSLP-SW1_SP-2-VSL_DOWN: All VSL links went down while switch is in ACTIVE role Jan 23 11:57:37.799: %LINEPROTO-SW1_SP-5-UPDOWN: Line protocol on Interface TenGigabitEthernet1/5/5, changed state to down Jan 23 11:57:37.803: %LINEPROTO-SW1_SP-5-UPDOWN: Line protocol on Interface Port-channel1, changed state to down Jan 23 11:57:37.803: %LINK-SW1_SP-3-UPDOWN: Interface Port-channel1, changed state to down Jan 23 11:57:37.807: %LINK-SW1_SP-3-UPDOWN: Interface TenGigabitEthernet1/5/5, changed state to down Jan 23 11:57:37.875: %DUAL_ACTIVE-SW1_SP-1-DETECTION: Fast-hello running on Gi1/2/1 detected dual-active condition Jan 23 11:57:37.875: %DUAL_ACTIVE-SW1_SP-1-RECOVERY: Dual-active condition detected: Starting recovery-mode, all non-VSL and non-excluded interfaces have been shut down CiscozineVSS(recovery-mode)#
The following messages on the standby virtual switch console indicate that a dual-active scenario has occurred:
CiscozineVSS-sdby# Jan 23 11:57:37.647: %VSLP-SW2_SPSTBY-3-VSLP_LMP_FAIL_REASON: Te2/5/5: Link down Jan 23 11:57:37.647: %VSLP-SW2_SPSTBY-2-VSL_DOWN: Last VSL interface Te2/5/5 went down Jan 23 11:57:37.651: %VSLP-SW2_SPSTBY-2-VSL_DOWN: All VSL links went down while switch is in Standby role Jan 23 11:57:37.651: %DUAL_ACTIVE-SW2_SPSTBY-1-VSL_DOWN: VSL is down - switchover, or possible dual-active situation has occurred Jan 23 11:57:37.651: %PFREDUN-SW2_SPSTBY-6-ACTIVE: Initializing as Virtual Switch ACTIVE processor Jan 23 11:57:39.559: %LINK-3-UPDOWN: Interface TenGigabitEthernet2/5/5, changed state to down Jan 23 11:57:39.559: %LINEPROTO-SW2_SP-5-UPDOWN: Line protocol on Interface TenGigabitEthernet2/5/5, changed state to down Jan 23 11:57:40.579: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 1 - Module Type LINE_CARD removed Jan 23 11:57:40.899: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 2 - Module Type LINE_CARD removed Jan 23 11:57:40.991: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 3 - Module Type LINE_CARD removed Jan 23 11:57:41.107: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 5 - Module Type LINE_CARD removed Jan 23 11:58:00.335: %VSLP-SW2_SP-2-VSL_DOWN: All VSL links went down while switch is in ACTIVE role CiscozineVSS#
This is confirmed by the show command:
CiscozineVSS#show switch virtual redundancy My Switch Id = 2 Peer Switch Id = 1 Last switchover reason = active unit removed Configured Redundancy Mode = sso Operating Redundancy Mode = sso Switch 2 Slot 5 Processor Information : ----------------------------------------------- Current Software state = ACTIVE Uptime in current state = 0 minutes Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY, RELEASE SOFTWARE (fc4) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2013 by Cisco Systems, Inc. Compiled Wed 04-Sep-13 13:05 by prod_rel_team BOOT = bootdisk:s72033-adventerprisek9-mz.151-2.SY.bin,12; Configuration register = 0x2102 Fabric State = ACTIVE Control Plane State = ACTIVE Peer information is not available because it is in 'DISABLED' state CiscozineVSS#
When the VSL is restored, the following messages are displayed on the console and the switch in recovery mode (previous active virtual switch) reloads:
Jan 26 13:23:34.877: %DUALACTIVE-1-VSL_RECOVERED: VSL has recovered during dual-active situation: Reloading switch 1 Jan 26 13:23:34.909: %SYS-5-RELOAD: Reload requested Reload Reason: Reload Command.
After the reloading, the VSS is recovered; the control plane remains active on the previous standby virtual switch. To force a switchover use the command:
Enter your email address to receive notifications of new posts.