Three teams dive deep
Three Twitter teams are using Cascading in combination with programming languages: The revenue team uses Scala, the publisher analytics team uses Clojure, and the analytics team uses Jython.
The revenue team helps advertisers determine which ads are most effective by analyzing the contents of ads, Twitter topics, and so on to help increase conversion rates. They wrote Scalding, the open source Scala API for Cascading, so that developers can write in Scala and run on Hadoop.
The publisher analytics team helps webmasters understand how Twitter users are engaging with information on brands, websites, and online influencers. They built and open-sourced Cascalog, a Clojure-based language that uses Cascading as the job execution engine.
The analytics team's mission is to understand Twitter user activity. They needed a way to make it easier to perform detailed and complex analysis on users following other users or users who follow the same people. They created PyCascading, a Python wrapper for Cascading to control the full data processing workflow from Python.
In all these cases, Cascading shields developers from the underlying complexity of writing, optimizing, and executing MapReduce jobs. It allows each team to deliver highly complex information and functionality needed by the business quickly and efficiently. The end result of all this amounts to important insights Twitter can use to continuously improve its wildly popular service.
This article, "Twitter's programmers speed Hadoop development," was originally published at InfoWorld.com. Read more of Andrew Lampitt's Think Big Data blog, and keep up on the latest developments in big data at InfoWorld.com For the latest business technology news, follow InfoWorld.com on Twitter.