Netflix is increasingly turning to Python over Java to power certain aspects of its video-streaming service, such as generating and processing alerts, boosting resilience, securing transactions, producing deployable AMIs (Amazon Machine Images), and for managing and automatic Cassandra clusters.
Roy Rapoport, monitoring engineering manager at Netflix, has revealed that Python is giving Java a run for its money among developers at Netflix, citing the language's "rich batteries-included standard library, succinct and clean yet expressive syntax, large developer community, and wealth of third-party libraries."
Netflix has developed a RESTful Web application called CAG (Central Alert Gatway) that's capable of grabbing the hundreds of thousands of daily alerts generated by the company's telemetry system and intelligently disseminating or suppressing them on a case-by-case basis. Some alerts, for example, are automatically dispatched to the company's notification system to page on-call engineers. Some are suppressed if the proper individuals have been notified. In some cases, CAG automatically performs remediation actions, such as rebooting or terminating potentially unhealthy AWS (Amazon Web Services) EC2 instances.
The company uses a tool called Chaos Gorilla -- a cousin to its open source Chaos Monkey -- to test resiliency at a large scale. Chaos Gorilla integrates with Asgard and Edda to simulate the loss of an entire availability zone in a given region. "This sort of failure mode -- an AZ (Amazon Availability Zone) either going down or simply becoming inaccessible to other AZs -- happens once in a blue moon, but it's a big enough problem that simulating it and making sure our entire ecosystem is resilient to that failure is very important to us," wrote Rapoport.
On the security front, Netflix employs Security Monkey and Howler Monkey. The former is designed to track configuration history and to generate alerts about changes to EC2 security-related policies. The latter's purpose is to discover and track SSL certificates in Netflix's environments and domain names and to alert the proper recipients as SSL certificate expiration dates draw near. According to Rapoport, the tool has helped to eliminate instances of production outages due to SSL expirations over the past 18 months.
The company employs Chronos to handle most of its change-control process. The tool integrates with Netflix's Simian Army (the aforementioned Monkeys) and Asgard to automatically track changes, including event types like deployments, security events, and other automated actions. "Chronos accepts events via a REST interface and allows humans and machines to ask questions like, 'What happened in the last hour?' or, 'What software did we deploy in the last day?" according to Rapaport.
Netflix also employs Python to transform applications into deployable Amazon Machine Images with a tool called Aminator. The tool attaches a foundation image to a running EC2 instance, preps it, installs packages into the image, and turns the resultant image into a complete Netflix application. "Simple in concept and execution, but absolutely critical to our success," Rapoport wrote. "Pre-staging images and avoiding post-launch configuration really helps when launching hundreds or thousands of instances."
Netflix has created a slew of Python-based modules for managing and maintaining its Cassandra clusters. The modules use REST APIs to communicate with other Netflix tools for such tasks as managing instances within AWS as well as within Cassandra. Rapoport says "these activities include creating clusters using Asgard, tracking our inventory with Edda, monitoring Eureka to make sure clusters are visible to clients, managing Cassandra repairs and compactions, and doing software upgrades."
What's more, the company's Cassandra squad uses a Python package called JenkinsAPI to for configuring jobs and gleaning monitoring and maintenance jobs in Jenkins, Pycassa for access operational data stored in Cassandra, Boto for communicating with AWS services like S3 storage, and Paramiko to create an SSH connection to instances without having to create subprocesses.
Netflix's list of Python tools doesn't end there. The company's data science and engineering teams use a RESTful Web service called Sting designed to slice and dice large in-memory datasets and produce useful visualizations of the data. "Our data science teams use Sting to analyze and iterate against the results of Hive queries on our big data platform. "While a Hive query may take hours to complete, once the initial dataset is loaded in Sting, additional iterations using OLAP-style operations enjoy sub-second response times," according to Rapaport.
This story, "Why Netflix is embracing Python over Java," was originally published at InfoWorld.com. Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest developments in business technology news, follow InfoWorld.com on Twitter.