3. Be resilient
Even with monitoring and alerts that cover the entire operations of Netflix, failures will still happen. That's why the company has built a platform for monitoring its service and fixing mistakes. The Simian Army is a series of open source tools that have been developed internally by Netflix that test the fault tolerance of the company's operations. Chaos Monkey is one that randomly kills various services to test failure at the application layer. Chaos Gorilla is another that brings down an entire AZ to test for high availability. Chaos Kong is a service in development that Netflix hopes to use to eventually test an entire region shutting down. Tseitlin says that Netflix is so concerned with testing and monitoring that it jokingly refers to itself as a monitoring company that occasionally delivers movies.
Another aspect of being resilient is in the way the company distributes responsibility to its workers. The company relies heavily on developers to build out the Simian Army and cloud services. Whenever a developer builds something, they're responsible for keeping it up. While this may sound like a "devops" model which is the idea of developers provisioning their own infrastructure resources Netflix instead embraces what Tseilin calls a "distributed ops" model. Each developer is responsible for the entire life cycle of the code and applications they create. Developers write the programs, run them and are responsible for keeping them up to date.
While Netflix has moved almost all of the company's customer-facing services to the public cloud already, it still has more work to do. On the road map is to move all the company's in-house, back-end services to the cloud as well. That process has already started with a migration form Exchange to Google Apps for email. It transitioned form Concur to Workday for expense management and a traditional internal file sharing to Box, Tseitlin says.
Billing and payments are still mostly in Netflix-controlled data centers to comply with Payment Card Industry (PCI) standards. If all goes well, that may change soon though. Netflix wants to be all in the cloud if it can be: "The goal is to not run data centers at all," Tseitlin says.