Technical Details of Facebook Outage

Facebook was offline for more than two hours today after a configuration change created a feedback loop that overwhelmed a database cluster. The only way to fix the problem was to take the web site offline.

Rich Miller

September 24, 2010

1 Min Read
DataCenterKnowledge logo in a gray background | DataCenterKnowledge

Facebook was down for more than two hours Thursday afternoon, marking its longest outage in about four years. The Facebook Engineering blog has posted a detailed explanation of what happened."The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition," writes Facebook's Robert Johnson. "An automated system for verifying configuration values ended up causing much more damage than it fixed."

In short: A configuration change created a feedback loop that overwhelmed a database cluster. The only way to fix the problem was to take the whole cluster offline - which meant downtime for web site. Read the Engineering blog for more details.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like