Tuesday, February 17, 2009

Internet Outage

The company I work for provides virtual server hosting services. So, when the Internet goes down, it's gonna be a bad day at the office.

Yesterday, our provider's Internet connectivity went down, for about 1 1/2 hours. An outage of this duration causes us to lose customers. So, it was a very bad Monday!

Our data center provider has not been very forthcoming with information on the cause of the outage. But, I believe, it was related to the BGP overflow caused by an ISP in the Czech.

The unfortunate consequence is that the arm chairs quarterbacks, like myself, get to determine that the cause of this issue was absolutely avoidable. Isolated to Cisco IOS, two things could have prevented this catastrophic outage for us, and our clients.
  1. Maintain patching levels by running a newer IOS on the routers.
  2. Implement the 'bgp maxas-limit command. http://tinyurl.com/cygzbz
The offending information:

%BGP-6-ASPATH: Long AS path 3549 3257 29113 47868 47868 47868 47868
47868 47868 47868 47868 47868 47868 47868 47868 47868 47868 47868
47868 47868 47868 47868 47868 47868 47868 47868 47868 47868 47868
47868 47868 47868 47868 47868 47868 47868 47868 47868 47868 47868
47868 47868 47868 47868 47868 47868 47868 47868 47868 47868 47868
47868 47868 47868 47868 47868 47868 47868 47868 47868 47868 47868
47868 47868 47868 47868 47868 47868 47868 47868 47868 47868 47868
47868 47868 47868 47868 47868 47868 47868 received from xxx.xxx.xxx.xxx:
Has more than 255 AS


This issue was resolved in IOS release 12.1(3a)E1 (http://tinyurl.com/db6v2d)
A Border Gateway Protocol (BGP) UPDATE contains Network Layer Reachability Information (NLRI) and attributes that describe the path to the destination. Each path attribute is a type, length, value (TLV) object.

The type is a two-octet field that includes the attribute flags and the type code. The fourth high-order bit (bit 3) of the attribute flags is the Extended Length bit. It defines whether the attribute length is one octet (if set to 0) or two octets (if set to 1). The extended length bit is used only if the length of the attribute value is greater than 255 octets.

The AS_PATH (type code 2) is represented by a series of TLVs (or path segments). The path segment type indicates whether the content is an AS_SET or AS_SEQUENCE. The path segment length indicates the number of autonomous systems in the segment. The path segment value contains the list of autonomous systems (each autonomous system is represented by two octets).

The total length of the attribute depends on the number of path segments and the number of autonomous systems in them. For example, if the AS_PATH contains only an AS_SEQUENCE, then the maximum number of autonomous systems (without having to use the extended length bit) is 126 [= (255-2)/2]. If the UPDATE is propagated across an autonomous system boundary, then the local Abstract Syntax Notation (ASN) must be appended and the extended length bit used.

This problem was caused by the mishandling of the operation during which the length of the attribute was truncated to only one octet. Because of the internal operation of the code, the receiving border router would not be affected, but its iBGP peers would detect the mismatch and issue a NOTIFICATION message (update malformed) to reset their session.

The average maximum AS_PATH length in the Internet is between 15 and 20 autonomous systems, so there is no need to use the extended length. The failure was discovered because of a malfunction in the BGP implementation of another vendor. There is no workaround. This problem is resolved in Release 12.1(3a)E1. (CSCdr54230)
Source information was obtained from the following locations:
Renesys – Reckless Driving on the Internet - http://tinyurl.com/cd2eq9
Merit Networks – North American Network Operators Group - http://tinyurl.com/dmexp2
Data Center Knowledge – Router Snafu: A ‘Global Internet Meltdown’ - http://tinyurl.com/atmo6h
Slashdot - One Broken Router Takes Out Half the Internet - http://tinyurl.com/bsqhk6

No comments: