February 16th Upstream Network Issue

Posted by Blue Box Group on Mon Feb 16 08:50:00 UTC 2009


One of our upstream network providers had a problem this morning that caused a failure in their routers. We immediately shut down our announcements through their network and are waiting on BGP to reconverge for those routes to disappear. This is only effecting customers who were previously connecting through that path.

This is unrelated to the February 12th outage. All of our internal systems are functioning normally.

Update: 9:02am - Routes are now properly updated across the global routing table.

Update 9:13am - The root cause of this event, a mis-configured router from Europe sending bad data into the global BGP table, is being discussed on NANOG (North American Network Operators Group). The event is apparently causing a number of transit providers trouble this morning.

Update 9:28am - Here are some examples of the effects of this type of update. Our routers stayed operational but upstream routers bounced traffic during this event. That BGP session bounce is what caused the BGP recovergence and why we shut down our connection with that provider.

Update 9:35am - More news is coming out on NANOG and through our communications channels with our upstream provider that this issue affected some major backbone transit providers.

Update 6:42pm - The story has finally been picked up by Slash Dot. You can read about it here: http://tech.slashdot.org/article.pl?sid=09/02/16/2233207

Update 7pm - For those of you who are curious, here’s what our bandwidth usage looked like during the period. You can see the highlighted sections where we shut down our BGP session with the broken upstream provider, and you can see the dips in traffic where other providers throughout the internet had problems with their routers.

Fortunately, our routers stayed online and were not brought down by the cisco bug that caused the most of the damage. They continued to route all traffic they received just fine.

Update 10:15pm - Two more interesting articles have been posted about this event. One was posted by Renesys and one by Arbor Networks.

  • Blue Box Tech Support