For Twitter, making sense of its mountains of user data was big enough of a problem that it purchased another company just to help get the job done.
Twitter's success is dependent entirely on how well it exploits the data its users generate. And it has a lot of data to work with: It hosts more than 200 million accounts, which generate 230 million Twitter messages a day.
[ Explore the current trends and solutions in BI with InfoWorld's interactive Business Intelligence iGuide. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. ]
Last July, the social networking giant purchased BackType, a company with software called Storm that could parse live data streams, such as millions of Twitter feeds. After the acquisition, Twitter released the source code of Storm, having no interest in commercializing the product itself.
Storm is valuable for Twitter for its own operations specifically because it can be useful in identifying emerging topics as they are unfolding, in real time, on the company's service. For instance, Twitter uses the software to calculate how widely Web addresses are shared across multiple Twitter users in real-time.
Such a job "is a really intense computation, which could involve thousands of database calls and millions of follower records," said Nathan Marz, Twitter lead engineer for Storm, who explained the technology in December at a New York conference held by Big Data software vendor DataStax.
Using a single machine, computing the reach of a Web address could take up to 10 minutes. But spread across 10 machines, Marz explained, it could execute in as little as a few seconds. For a company that makes money selling ads against emerging trends, the faster operation can be crucial.
Like Twitter, organizations are finding that they have a great deal of data on hand, and that the data could potentially be used to maximize profits and improve efficiencies -- if they can organize and analyze it quickly enough. This pursuit, made possible by a number of new technologies that are mostly open source is often referred to as big data.
"It absolutely gives us a competitive advantage if we can better understand what people care about and better use the data we have to create more relevant experiences," said Aaron Batalion, CTO for online shopping service LivingSocial, which uses technologies like the Apache Hadoop data processing platform to glean more information about what their users want.
"The days are over when you build a product once, and it just works," Batalion said. "You have to take ideas, test them, iterate them, use data and analytics to understand what works and what doesn't in order to be successful. And that's how we use our big data infrastructure."
Big data getting bigger
Last May, consulting firm McKinsey and Company issued a report that anticipated how organizations would be deluged with data in the years to come. They also predicted that a number of industries -- including health care, public sector, retail, and manufacturing -- would benefit by analyzing their rapidly growing mounds of data.
Collecting and analyzing transactional data will give organizations more insight into their customers' preferences. It can be used to better inform the creation of products and services, and allow organizations to remedy emerging problems more quickly.