How The New York Times Handled Unprecedented Election-Night Traffic Spike
Election of 2012 brought the site to its knees; its current CTO wasn't going to let it happen again last November.
When he woke up the morning of October 21, 2016, Nick Rockwell did the same thing he had done first thing every morning since The New York Times hired him as CTO: he opened The Times’ app on his phone. Nothing loaded.
The app was down along with BBC, CNN, Fox News, The Guardian, and a long list of other web services, taken out by the largest DDoS attack in history of the internet. An army of infected IP cameras, DVRs, modems, and other connected devices – the Mirai botnet – had flooded servers of the DNS registrar Dyn in 17 data centers, halting a huge number of sites and mobile apps that depended on it for letting their users’ computers know how to find them online.
The outage had started only about five minutes before Rockwell saw the blank screen on his phone. His team kicked off a standard process that was in place for such outages, failing over to The Times’ internal DNS hosted in two of its four data centers in the US. The mobile app and the main site were back online about 45 minutes after they had gone down.
While going through the fairly routine recovery process, however, something was really worrying Rockwell. The thing was, he didn’t know whether the attack was directed at many targets or at the Times specifically. If it was the latter, the effect could be catastrophic; its internal DNS wouldn’t hold against a major DDoS for more than five seconds. “It would’ve been incredibly easy to DDoS our infrastructure,” he said in a phone interview with Data Center Knowledge.
See also: How to Survive a Cloud Meltdown
His team had been a few months deep into fixing the vulnerability, but they weren’t finished. “We were OK [in the end], but we were vulnerable during that time.” The process to fix it started as they were preparing for the 2016 presidential election. Election night is the biggest event for every major news outlet, and Rockwell was determined to avoid the 2012 election night fiasco, when the site went down, unable to handle the spike in traffic.
Handling Spikes at the Edge
One of the steps the team decided to take in preparation for November 2016 was to fully integrate a CDN (Content Delivery Network). CDN operators, such as Akamai, CloudFlare, or CDN services by cloud providers Amazon, Microsoft, and Google, store their clients’ most popular content in data centers close to where many of their end users are located – so-called edge data centers -- from where “last-mile” internet service providers deliver that content to its final destinations. A CDN essentially becomes a highly distributed extension of your network, adding compute, storage, and bandwidth capacity in many metros around the world.
See also: How Edge Data Center Providers are Changing the Internet's Geography
That a CDN had not been integrated into the organization’s infrastructure came as a big surprise to Rockwell, who joined in 2015, after 10 months as CTO at another big publisher, Condé Nast. While at Condé Nast, he switched the publisher from a major CDN provider to a lesser-known CDN by a company called Fastly. He has since become an unapologetically big fan of the San Francisco-based startup, which now also delivers content to people reading The New York Times online.
Being highly distributed by design puts CDNs in good position to help their customers handle big traffic spikes, be it legitimate traffic generated by a big news event or a malicious DDoS attack. (Rockwell said he did wonder, as the Dyn attack was unfolding, whether it was a rehearsal for election night.)
Fastly ensured that on the night Donald Trump beat Hillary Clinton, The Times rolled without incident through a traffic spike of unprecedented size for the publisher: an 8,371 percent increase in the number of people visiting the site simultaneously, according to the CTO. The CDN has also mostly absorbed the much higher levels of day-to-day traffic The Times has seen since the election as it covers the Trump administration.
A Fast, Transparent CDN
In addition to making it lightning-fast by using SSD storage only, the six-year-old startup, which this year crossed the $100 million annualized revenue run-rate threshold, designed its platform to give users a detailed picture of the way their traffic flows through its CDN, as well as lots of control of their edge infrastructure. Artur Bergman, Fastly’s founder and CEO, said the platform enables a user to treat the edge of their network the same way they treat their own data centers or cloud resources.
In your own data center you have full control of your tools for improving your network’s security and performance (things like firewalls and load balancers), Bergman explained in an interview with Data Center Knowledge. While you maintain that level of control in the public cloud, you don’t necessarily have it at the edge, he said. Traditionally, CDNs have offered customers little visibility into the inner workings of their networks, so even differentiating between a legitimate traffic spike and a DDoS attack has been hard to do quickly. Fastly gives users log access in real-time so they can see exactly what is happening to their edge nodes and make critical decisions on the fly.
The startup today unveiled an edge cloud platform, designed to enable developers to deploy code in edge data centers instantly, without having to worry about scaling their edge infrastructure as their applications grow. It also announced a collaboration with Google Cloud Platform, pairing its platform with the giant’s enterprise cloud infrastructure services around the world.
The Times is Migrating to the Cloud
GCP is one of two cloud providers The New York Times is using. The other one is Amazon Web Services. Today, the publisher’s infrastructure consists of three leased data centers in Newark, Boston, and Seattle, and one facility it owns and operates on its own, located in the New York Times building in Times Square, Rockwell said. The company uses a virtual private cloud by AWS and some of its public cloud services in addition to running some applications in the Google Cloud.
This setup is not staying for long, however. Rockwell’s team is working to shut down the three leased data centers, moving most of its workloads onto GCP and AWS, with Fastly managing content delivery at the edge. Google’s cloud is also going to play a much bigger role than it does today. The plan is to run apps that depend on Oracle databases in AWS, while everything else, save for a few exceptions (primarily packaged enterprise IT apps), will run in app containers on GCP, orchestrated by Kubernetes.
As he works to sort out what he in a conference presentation referred to as the “jumbled mess” that is The Times’ current infrastructure, Rockwell no longer worries about DDoS attacks. Luckily for his team, there was no major cyber attack on The Times between the day he came on board and the day Fastly started delivering the publisher’s content to its readers. Whether there was one after the CDN was integrated is irrelevant, as far as he's concerned. “It’s no longer something I have to think about.”
About the Author
You May Also Like