Jan
29
2015

Cisco VSS: Failure scenarios

In the last article, I explained how to configure the Cisco 6500 in VSS configuration, but how does the VSS reacts during a failure? There are three possible scenarios:

  1. Link failure within a multichassis Cisco etherchannel link
  2. Active supervisor engine failure
  3. VSL failure

Scenario #1: Link failure within a multichassis Cisco etherchannel link

Availability is not affected for those data flows that do not use the failed link. For those traffic flows that use the failed link, the effect consists of the time it takes to detect the link failure and reprogram the indices within the system.

VSS-link-fail-1

When all link connected to a Cisco 6500 are failed (in this case there is only one link for each 6500), the port bundle is converted from a multichassis Cisco EtherChannel link to a standard Cisco EtherChannel link, and is treated as a single-homed port.

VSS-link-fail-2

Remember: The supervisor engine on the active virtual switch is also responsible for programming the hardware forwarding information onto all the distributed forwarding cards across the entire Cisco Virtual Switching System. It also programs the policy feature card on the standby virtual switch supervisor engine. For these reasons, both the active and the hot-standby supervisor engine PFCs are active, and are used to perform packet lookups for centralized lookups on each chassis.

For these reasons, if a packet reaches the standby virtual switch there are two different behaviours:

  • If a packet is software switched, the packet is sent to the active virtual switch through the VSL.
  • If a packet is hardware switched, the packet is managed by the standby virtual switch.

 

Scenario #2: Active supervisor engine failure

The standby supervisor engine can detect the failure of the active supervisor engine using one of the following methods:

  • VSL Protocol (VSLP)
  • Cisco Generic Online Diagnostics (GOLD) failure event
  • Full VSL link down


VSS-Supervisor-Failure-1

Upon detecting the failure of the active supervisor, the hot-standby supervisor engine performs an SSO switchover and assumes the role of the active supervisor.

VSS-Supervisor-Failure-2

During the transition, there is a disruption to the traffic that must transition away from the failed chassis. The duration of traffic disruption is determined by the time required to transition the role of the hot-standby supervisor engine to the active supervisor engine, and for the neighboring device to modify its path selection to the newly active chassis.

 

Scenario #3: VSL failure

The failure of a single VSL link is discovered by the active supervisor engine, either through a link-down event or through the failure of periodic VSLP messages sent across the link to check the VSL link state. Availability is not affected for those data flows that do not use the VSL.

VSS-1-fault

The active supervisor engine discovers the failure of the “entire” VSL either through a link-down event or through the failure of the periodic VSLP messages sent across the member links to check the VSL link status. From the perspective of the active virtual switch chassis, the standby virtual switch is lost. The standby virtual switch chassis also views the active virtual switch chassis as failed and transitions to active virtual switch state through an SSO switchover. This scenario is known as a dual-active scenario and the duplication of this configuration can possibly have adverse effects to the network topology and traffic.

To avoid this disruptive scenario, you should configure one of these methods:

  • Enhanced PAgP
  • Layer 3 BFD
  • Fast Hello

In this case the Fast hello link method is implemented.

VSS-2-fault

Upon detecting the dual-active condition, the original active chassis enters into recovery mode and brings down all of its interfaces except the VSL and nominated management interfaces. This effectively removes the device from the network.

VSS-recovery-1

You will see the following messages on the active virtual switch to indicate that a dual-active scenario has occurred:

CiscozineVSS#
Jan 23 11:57:37.647: %VSLP-SW1_SP-3-VSLP_LMP_FAIL_REASON: Te1/5/5: Link down
Jan 23 11:57:37.647: %VSLP-SW1_SP-2-VSL_DOWN: Last VSL interface Te1/5/5 went down
Jan 23 11:57:37.735: %VSLP-SW1_SP-2-VSL_DOWN: All VSL links went down while switch is in ACTIVE role
Jan 23 11:57:37.799: %LINEPROTO-SW1_SP-5-UPDOWN: Line protocol on Interface TenGigabitEthernet1/5/5, changed state to down
Jan 23 11:57:37.803: %LINEPROTO-SW1_SP-5-UPDOWN: Line protocol on Interface Port-channel1, changed state to down
Jan 23 11:57:37.803: %LINK-SW1_SP-3-UPDOWN: Interface Port-channel1, changed state to down
Jan 23 11:57:37.807: %LINK-SW1_SP-3-UPDOWN: Interface TenGigabitEthernet1/5/5, changed state to down
Jan 23 11:57:37.875: %DUAL_ACTIVE-SW1_SP-1-DETECTION: Fast-hello running on Gi1/2/1 detected dual-active condition
Jan 23 11:57:37.875: %DUAL_ACTIVE-SW1_SP-1-RECOVERY: Dual-active condition detected: Starting recovery-mode, all non-VSL and non-excluded interfaces have been shut down
CiscozineVSS(recovery-mode)#

 

The following messages on the standby virtual switch console indicate that a dual-active scenario has occurred:

CiscozineVSS-sdby#
Jan 23 11:57:37.647: %VSLP-SW2_SPSTBY-3-VSLP_LMP_FAIL_REASON: Te2/5/5: Link down
Jan 23 11:57:37.647: %VSLP-SW2_SPSTBY-2-VSL_DOWN:   Last VSL interface Te2/5/5 went down
Jan 23 11:57:37.651: %VSLP-SW2_SPSTBY-2-VSL_DOWN:   All VSL links went down while switch is in Standby role
Jan 23 11:57:37.651: %DUAL_ACTIVE-SW2_SPSTBY-1-VSL_DOWN: VSL is down - switchover, or possible dual-active situation has occurred
Jan 23 11:57:37.651: %PFREDUN-SW2_SPSTBY-6-ACTIVE: Initializing as Virtual Switch ACTIVE processor
Jan 23 11:57:39.559: %LINK-3-UPDOWN: Interface TenGigabitEthernet2/5/5, changed state to down
Jan 23 11:57:39.559: %LINEPROTO-SW2_SP-5-UPDOWN: Line protocol on Interface TenGigabitEthernet2/5/5, changed state to down
Jan 23 11:57:40.579: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 1 - Module Type LINE_CARD  removed 
Jan 23 11:57:40.899: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 2 - Module Type LINE_CARD  removed 
Jan 23 11:57:40.991: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 3 - Module Type LINE_CARD  removed 
Jan 23 11:57:41.107: %OIR-SW2_SP-6-INSREM: Switch 1 Physical Slot 5 - Module Type LINE_CARD  removed 
Jan 23 11:58:00.335: %VSLP-SW2_SP-2-VSL_DOWN:   All VSL links went down while switch is in ACTIVE role
CiscozineVSS#

 

This is confirmed by the show command:

CiscozineVSS#show switch virtual redundancy 
                  My Switch Id = 2
                Peer Switch Id = 1
        Last switchover reason = active unit removed
    Configured Redundancy Mode = sso
     Operating Redundancy Mode = sso

Switch 2 Slot 5 Processor Information :
-----------------------------------------------
        Current Software state = ACTIVE
       Uptime in current state = 0 minutes
                 Image Version = Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9-M), Version 15.1(2)SY, RELEASE SOFTWARE (fc4)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Wed 04-Sep-13 13:05 by prod_rel_team
                          BOOT = bootdisk:s72033-adventerprisek9-mz.151-2.SY.bin,12;
        Configuration register = 0x2102
                  Fabric State = ACTIVE
           Control Plane State = ACTIVE

Peer information is not available because 
it is in 'DISABLED' state
CiscozineVSS#

 

When the VSL is restored, the following messages are displayed on the console and the switch in recovery mode (previous active virtual switch) reloads:

Jan 26 13:23:34.877: %DUALACTIVE-1-VSL_RECOVERED: VSL has recovered during dual-active situation: Reloading switch 1
Jan 26 13:23:34.909: %SYS-5-RELOAD: Reload requested Reload Reason: Reload Command.

VSS-recovery-2

After the reloading, the VSS is recovered; the control plane remains active on the previous standby virtual switch. To force a switchover use the command:

redundancy force-switchover

 

References:

Summary
Article Name
Cisco VSS: Failure scenarios
Description
How does the VSS reacts during a failure? There are three possible scenarios: Link failure within a multichassis Cisco etherchannel link, Active supervisor engine failure and VSL failure
Author
  • Ram

    Good information! Appreciated…