Leap Second 2015: a critical bug in NXOS

In June 30, 2015 at 23:59:60 UTC, one minute will have 61 seconds when a leap second is added; the reason we have to add a second every now and then, is that Earth’s rotation around its own axis, is gradually slowing down, although very slowly.

This will be the 26th leap second adjustment since 1972, and represents an important consideration for providers of computing, networking, and software solutions.

When the leap second update occurs, several unexpected behaviors can happen on Nexus devices:

N5K (CSCub38654)

When the leap second update occurs N5K NX-OS 5.0, 5.1 5.2 versions run an affected version of the Linux kernel. A Nexus 5010/5500 running affected code could experience a lock up condition.

If the N5K was stuck in hung state, mgmt0, in band management will not be possible. On console messages such as following will be seen

Setting: rc 5 BUG: spinlock lockup on CPU#0, adj_x86s.bin/5926, lock=c793a6e0 pc=c013c560

SMI: System watchdog timed out
NXOS_WATCHDOG: ffff [#1] SMP
Cpu: 0 Watchdog timer NMI occurred (0)
SMI: System watchdog timed out
SMI: System watchdog timed out
If switch is hung in this state, the switch will need to be manually power cycled to recover.

If switch is hung in this state, the switch will need to be manually power cycled to recover.

 

N7K (CSCua77416)

When the leap second update occurs a N7K SUP1 could have the kernel hit what is known a “livelock” condition and  a Supervisor reload or Switchover may occur due to Leap second update.

 

Nexus 1000 (CSCus80369)

When the leap second update occurs the following could be observed;

  1. Nexus 1000 active VSM resets. Standby VSM switches over and becomes active. The previous active now comes up as Standby. All the VEM`s will be in same state and will not have any traffic disruption.
  2. If the standby VSM is not present, the single active VSM will reset and comes up as active and the VEM`s will flap. During this time there will not be any traffic disruption.
  3. No core files for the failed VSM.

 

To avoid this bug:

  1. Remove NTP/PTP configuration on the switch at least two days prior to June 30, 2015 Leap second event date.
  2. Add NTP/PTP configuration back on the switch after the Leap second event date(July 1, 2015).

 

Note: Not only the NX-OS is affected by this bug! Also GSS and other Cisco platform suffer by this bug! Here a complete list.

References:

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.