Unicast flooding due to asymmetric routing

Asymmetric routing is not a problem by itself, but will cause problems when Network Address Translation (NAT) or firewalls are used in the routed path. For example, in firewalls, state information is built when the packets flow from a higher security domain to a lower security domain. The firewall will be an exit point from one security domain to the other. If the return path passes through another firewall, the packet will not be allowed to traverse the firewall from the lower to higher security domain because the firewall in the return path will not have any state information.

Another problem than can occur is unicast flooding where the cause of flooding is that destination MAC address of the packet is not in the L2 forwarding table of the switch.

To fully understand this process remember that the L3Switch has three tables:

  • ARP: Maps an IP address to a MAC address in order to provide IP communication within a Layer 2 broadcast domain. For example, Host B wants to send information to Host A but does not have the MAC address of Host A in its ARP cache. Host B generates a broadcast message for all hosts within the broadcast domain to obtain the MAC address associated with the IP address of Host A. All hosts within the broadcast domain receive the ARP request, and only Host A responds with its MAC address.
  • CAM: As frames arrive on switch ports, the source MAC addresses are learned and recorded in the CAM table. The port of arrival and the VLAN are both recorded in the table, along with a timestamp. If a MAC address learned on one switch port has moved to a different port, the MAC address and timestamp are recorded for the most recent arrival port. Then, the previous entry is deleted. If a MAC address is found already present in the table for the correct arrival port, only its timestamp is updated.
  • Ternary Content Addressable Memory (TCAM) – Not required to understand the unicast flooding behavior.

Remember: In L3 Switches, the default ARP table aging time is 4 hours while the CAM holds the entries for only 5 minutes.

 

How to understand unicast flooding due to asymmetric routing:

Suppose to have a client that wants to download a file from a FTP server. The client has the IP address 10.0.0.100 and it is connected to Ciscozine1 L3Switch which is the default gateway for the Vlan 100 (10.0.0.1). On the other end, there is an FTP server (192.168.0.200) which is connected to Ciscozine2 L3Switch. Ciscozine2 is the default gateway for the Vlan 200 (192.168.0.1).

There are other clients connected to Ciscozine2 on to the Vlan 100 and a server connected to Ciscozine1 on the Vlan200.

Ciscozine1 has defined a Vlan200 interface with 192.168.0.2 ip address, while Ciscozine2 has defined a Vlan100 interface with 10.0.0.2 ip address.

 

 

Suppose that the FTP client is already connected to the FTP server; this means that the CAM and ARP table of the L3 switches contain the information (IP and Mac address) about the FTP Client and FTP server. So what happen if the FTP Client wants download a large file?

 

1. The FTP client sends the ftp request to to its default gateway (Ciscozine1 – 10.0.0.1).

2-3. The Ciscozine1 checks the destination address (192.168.0.200) in the routing table and forwards the packet to the Vlan200 interface.

4-5. Ciscozine1 forwards the packet to Ciscozine2 which delivers the FTP request to the FTP Server.

 

6. The FTP server starts to send the file (requested by the FTP Client) to its default gateway (Ciscozine2 – 192.168.0.1).

7-8. The Ciscozine2 checks the destination address (10.0.0.100) in the routing table and forwards the packet to the Vlan100 interface.

9-10. Ciscozine2 forwards the packet to Ciscozine1 which delivers the packet to the FTP Client.

 

After 5 minutes, the two Layer3 switches remove the FTP Client and FTP Server mac address entries from the CAM, so what happens when the MAC Address table ages out and the FTP client wants download other files?

 

11. The FTP client sends the ftp request to to its default gateway (Ciscozine1 – 10.0.0.1).

12-13-14. The Ciscozine1 checks the destination address (192.168.0.200) in the routing table and forwards the packet to the Vlan200 interface, but the CAM does not contain the FTP Server MAC Address, so Ciscozine1 floods the packets to all Vlan 200 interface (for instance to the server connected to Ciscozine1)!

15. Ciscozine2 delivers the FTP request to the FTP Server.

16. The FTP server starts to send the files to to its default gateway (Ciscozine2 – 192.168.0.1).

17-18-19. The Ciscozine2 checks the destination address (10.0.0.100) in the routing table and forwards the packet to the Vlan100 interface, but the CAM does not contain the FTP Client MAC Address, so Ciscozine2 floods the packets to all Vlan 100 interface (for instance to the Client connected to Ciscozine2)!

20. Ciscozine1 delivers the packet to the FTP Client.

Note: The result is that packets of the data transfer between the FTP Client and the FTP Server will mostly be flooded to Vlan 100 on Ciscozine2 and to Vlan200 on Ciscozine1. This means every connected port in Vlan 100 on Ciscozine2 and in Vlan200 on Ciscozine1 will receive all packets of conversation between 10.0.0.100 and 192.168.0.200. This will cause a complete connectivity outage to the hosts or slow them down considerably.

This flooding is due to asymmetric routing and may stop when the arp table is age out or the Switch sends a broadcast packet (for istance an ARP request). What happens if the ARP table entries in the two L3Switches are age out?

 

21. The FTP client asks to download the file the the FTP Server and it sends to the default gateway (Ciscozine1 – 10.0.0.1).

22-23-24. The Ciscozine1 checks the destination address (192.168.0.200) in the routing table; then it notices that there is no arp entry for the 192.168.0.200 hosts, so the L3Switch sends an ARP request (a broadcast message to the Vlan200 network) to ask “who has 192.168.0.200?”.

25. The FTP server receives the ARP request.

 

26-27. The FTP server sends an ARP reply to Ciscozine2 which forwards it to Ciscozine1.

28. The Ciscozine1 updates the arp and CAM table, so the unicast flooding on the Vlan200 stops and the FTP Client packet is sent to the FTP Server (like previous 3-4-5 steps).

 

29. The FTP Server starts to send the file to the FTP Client using its default gateway (Ciscozine2 – 192.168.0.1).

30-31-32. The Ciscozine2 checks the destination address (10.0.0.100) in the routing table; then it notices that there is no arp entry for the 10.0.0.100 hosts, so the L3Switch sends an ARP request (a broadcast message to the Vlan100 network) to ask “who has 10.0.0.100?”.

33. The FTP Client receives the ARP request.

 

34-35. The FTP client sends an ARP reply to Ciscozine1 which forwards it to Ciscozine2.

36. The Ciscozine2 updates the arp and CAM table, so the unicast flooding on the Vlan100 stops and the FTP Server packet is sent to the FTP Server (like previous 8-9-10 steps).

 

There are different approaches to limit the unicast flooding due to asymmetric routing:

  • An easy approach is normally to bring the ARP timeout and the forwarding table-aging time close to each other to limit the length of unicast flooding. This will cause the ARP packets to be broadcast. Relearning must occur before the L2 forwarding table entry ages out (not discussed in this article).
  • Another approach is to use the unicast blocking feautere (not discussed in this article).
  • A better approach is to implement the network with HSRP.

 

Use HSRP to limite unicast flooding
In this case Ciscozine1 is the active router while Ciscozine2 is the standby router.

With this configuration Ciscozine1 acts as router and switch while Ciscozine2 acts as a switch; the packets sent and received by the FTP Client will use the same path (1-2-3-4-5 and 6-7-8-9-10) because there isn’t asymmetric routing. So during a file transfer the mac address table can not age out because the CAM timestmap is continuously updated. For this reason the unicast flooding can not occur.

 

Note: You can also simulate this behavior using a router plus a switch instead of a layer3 switch.

References: http://www.cisco.com/…/products_tech_note09186a00801d0808.shtml

13 COMMENTS

  1. Hello,

    I’m confused about this part:

    17-18-19. The Ciscozine1 checks the destination address (192.168.0.200) in the routing table and forwards the packet to the Vlan200 interface, but the CAM does not contain the FTP Client MAC Address, so Ciscozine2 floods the packets to all Vlan 100 interface (for instance to the Client connected to Ciscozine2)!

    Probably this was only I typo. I suggest it should be:

    17-18-19. The Ciscozine2 checks the destination address (10.0.0.100) in the routing table and forwards the packet to the Vlan100 interface, but the CAM does not contain the FTP Client MAC Address, so Ciscozine2 floods the packets to all Vlan 100 interface (for instance to the Client connected to Ciscozine2)!

    That is the path from the server to the client – If I understand it correctly.

    Forwarding with the reading of your article – very good one.

    Thanks.

    Marek

  2. Hi Fabio,

    I don’t quite understand why there’s a unicast flooding. Please confirm some points below:

    1) I assume between step no.10 and 11, the connection is idle between FTP client and FTP server. Otherwise, the MAC address table would not age out for FTP client and FTP server MAC address. Is that right?

    2) Step 12-13-14. “The Ciscozine1 checks the destination address (192.168.0.200) in the routing table and forwards the packet to the Vlan200 interface, but the CAM does not contain the FTP Server MAC Address, so Ciscozine1 floods the packets to all Vlan 200 interface (for instance to the server connected to Ciscozine1)!”
    Ok, when Ciscozine1 floods the packets to all Vlan 200 interface, only FTP server will reply. In my understanding, when reply frame from FTP server reaches the Ciscozine1, the Ciscozine1 will repopulate its MAC address table with L2 address of FTP server and which port Ciscozine1 receive the frame. So the flooding would only happens 1 time because the next time a frame wants to go to FTP server from Ciscozine1, it’s already there in MAC address table.
    The principal should be the same for step 17-18-19. Yes there will be unicast flooding, but should only happens 1 time.
    The next unicast flooding will only happen if the connection between FTP client and FTP server become idle for 5 minutes and then they connect again.
    I know your article should be accurate since it refers to Cisco documentation. So please help me see what is wrong with my understanding.

    Thanks,
    Putra

  3. @Putra
    The problem is due to asymmetric routing and to the ARP timers that are bigger than MAC timers.
    See the points 11,12, …, 20; what happen? When the MAC entry expire all packets for the FTP server will be flooded. The FTP server response doesn’t stop the flooding and don’t update the Ciscozine1 CAM table; this because the packets are forwarded by Ciscozine1 from the vlan200 and the response is received on the vlan100! This is the reason that the CAM table is not updated. The flooding will stop only with an ARP request!

  4. Great article and good explanation. If I was to suspect unicast flooding issues on a network, what evidence would I see in a packet capture or otherwise?

  5. Hi,

    The FTP traffic is not the best for this example (in my opinion), due to TCP ack mechanism. Better example will be some UDP traffic (for example snmp traffic from agent to collector).

    And HSRP is not the best solution – what if ciscozine1 will be the active router for vlan100 and ciscozine2 will be the active router for vlan200 ;-) ?

    Thanks,

    MK

  6. Hi,
    the asymmetric routing occurs independently by reliable(TCP) or unreliable (UDP) protocol. The problem is caused by the wrong design.

    About your second question, my answer is ‘it depends by how the hsrp process is configured’: with tracking feature, preempt and good priority value no problem will occur.

  7. Hi Fabio

    but in a real world, where there is an high probability about having hosts speaking with other hosts, on same VLAN, but residing on different sides of asymmetry (for instance in different DCs), whole problem could not be mitigate by the fact CAM tables are renewed anyway (by intra-VLAN traffic)?

    What do you think?

    Thank you very much

    Stefano

  8. Hi Stelax,

    as I said in the article

    There are different approaches to limit the unicast flooding due to asymmetric routing:

    * An easy approach is normally to bring the ARP timeout and the forwarding table-aging time close to each other to limit the length of unicast flooding. This will cause the ARP packets to be broadcast. Relearning must occur before the L2 forwarding table entry ages out (not discussed in this article).
    * Another approach is to use the unicast blocking feautere (not discussed in this article).
    * A better approach is to implement the network with HSRP.

    In my opinion the better approach to limit unicast blocking is:

    * designing the network dividing the L3 domains between each DCs.

    * defining all the active router of the HSRP istances on the same router.

    Fabio

  9. Normally if you put your sniffer into any switched port (i mean an “access port” configured with the vlan-id you are investigating) you should only see broadcasts, multicasts and traffic related to your specific ip (as a source or a destination). When unicast-flooding is occuring you will also capture unicast traffic where source and destination ip are both an unicast ip address but neither is your ip address. In my experience, the worst case has happened with syslog traffic (udp), where usually the syslog server never “says a word” and the source(s) (device that send log information) and the destination (syslog server) are on different subnets. In that situation the best action is probably to lower the arp timeout on the Layer 3 device, setting it to a time near your switches’s cam-table timeout, for example five minutes.

  10. When a packet traverses a firewall, it is NAT’d using a globally unique IP address. Whether that packet is symmetrically or asymmetrically returned, it returns to a globally unique IP. Why would it return to another firewall? Where this becomes a problem is when enterprises deploy IPv6 with multi-homed firewalls. IPv6 implies not using NAT. So, if you egress one stateful firewall and return via another..you’re screwed. If you use NAT, that won’t be a problem.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.