Netflix Explains Christmas 2012 Outage - Blames Amazon Cloud
On Netflix's tech blog, they explain that "Netflix streaming was impacted on Christmas Eve 2012 by problems in the Amazon Web Services (AWS) Elastic Load Balancer (ELB) service that routes network traffic to the Netflix services supporting streaming. The postmortem report by AWS can be read here."
They explain that "The problems at AWS caused a partial Netflix streaming outage that started at around 12:30 PM Pacific Time on December 24 and grew in scope later that afternoon. The outage primarily affected playback on TV connected devices in the US, Canada and Latin America. Our service in the UK, Ireland and Nordic countries was not impacted.
Netflix uses hundreds of ELBs. Each one supports a distinct service or a different version of a service and provides a network address that your Web browser or streaming device calls. Netflix streaming has been implemented on over a thousand different streaming devices over the last few years, and groups of similar devices tend to depend on specific ELBs. Requests from devices are passed by the ELB to the individual servers that run the many parts of the Netflix application. Out of hundreds of ELBs in use by Netflix, a handful failed, losing their ability to pass requests to the servers behind them. None of the other AWS services failed, so our applications continued to respond normally whenever the requests were able to get through.
The Netflix Web site remained up throughout the incident, supporting sign up of new customers and streaming to Macs and PCs, although at times with higher latency and a likelihood of needing to retry. Over-all streaming playback via Macs and PCs was only slightly reduced from normal levels. A few devices also saw no impact at all as those devices have an ELB configuration that kept running throughout the incident, providing normal playback levels.
At 12:24 PM Pacific Time on December 24 network traffic stopped on a few ELBs used by a limited number of streaming devices. At around 3:30 PM on December 24, network traffic stopped on additional ELBs used by game consoles, mobile and various other devices to start up and load lists of TV shows and movies. These ELBs were patched back into service by AWS at around 10:30 PM on Christmas Eve, so game consoles etc. were impacted for about seven hours. Most customers were fully able to use the service again at this point. Some additional ELB cleanup work continued until around 8 am on December 25th, when AWS finished restoring service to all the ELBs in use by Netflix, and all devices were streaming again.
Even though Netflix streaming for many devices was impacted, this wasn't an immediate blackout. Those devices that were already running Netflix when the ELB problems started were in many cases able to continue playing additional content.
Christmas Eve is traditionally a slow Netflix night as many members celebrate with families or spend Christmas Eve in other ways than watching TV shows or movies. We see significantly higher usage on Christmas Day and increased streaming rates continue until customers go back to work or school. While we truly regret the inconvenience this outage caused our customers on Christmas Eve, we were also fortunate to have Netflix streaming fully restored before a much higher number of our customers would have been affected.
Click here for the full technical explanation.