Lessons Learned from Recent Major Outages

Today’s more interconnected business world makes infrastructure and cloud outages all the more impactful. Here’s a recap of recent outages and their root causes.

Salvatore Salamone, Managing editor

August 1, 2022

2 Min Read
Error 503
Alamy

In 1988, one broken power line kicked off a series of events that cut off phone service to over 50,000 Chicago-area businesses, hospitals, Chicago's O'Hare and Midway airports, and consumers for more than two weeks. At the time, that event, the Hinsdale Central Office Fire was called the greatest telecommunications disaster ever.

Yet even the impact of the largest pre-Internet/cloud event ever does not compare to what happens on a regular basis these days with cloud outages.

The nature of today’s more interconnected business world makes cloud infrastructure and service disruptions more damaging. In the past, an outage was typically restricted to a small geographical area, and there were relatively easy ways to minimize the impact. For example, a cable cut would disrupt service to those on that one circuit. Many companies would routinely protect themselves by using services from two providers, such as a leased T1 line from one and an ISDN from another. If the primary line was down due to a cable cut, a site could still run core traffic over the lower speed link until service was restored.

Putting an Outage’s Impact into Perspective

 

CloudFlare, June 2022

The provider suffered a roughly one-hour outage impacting many companies and sites, including Discord, Shopify, Fitbit, and Peloton. Traffic in 19 of CloudFlare’s sites was impacted due to a change to the network configuration in those locations that caused the outage.

Related:Hosepipes on Roofs Are Keeping UK’s Data Centers Cool

Microsoft Azure and M365 Online, June 2022

East coast companies that accessed services via Microsoft’s Virginia data center suffered a 12-hour outage. The cause of the outage, according to Microsoft, was "an unplanned power oscillation in one of our data centers” … “Components of our redundant power system created unexpected electrical transients, which resulted in the Air Handling Units (AHUs) detecting a potential fault, and therefore shutting themselves down pending a manual reset.” Customers with always-available or zone-redundant services in that region were not impacted.

...

Read the full article on our sister site, InformationWeek.

About the Author

Salvatore Salamone

Managing editor, Network Computing

Salvatore Salamone is the managing editor of Network Computing. He has worked as a writer and editor covering business, technology and science; written three business technology books; and served as an editor at IT industry publications including Network World, Byte, Bio-IT World, Data Communications, LAN Times and InternetWeek.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like