Unlike traditional caching, which requires duplicating storage resources and flushing writes to the back-end storage, distributed storage tiering promises both higher application performance and lower storage costs. The server owns the data and the bulk of the I/O processing, reducing SAN performance requirements and stretching your SAN dollar.
The price of these benefits is, per usual, increased complexity. We'll learn more about the promise and challenges of distributed storage tiering as EMC's Project Lightning and other vendor initiatives come to light. --Doug Dineley
4. Apache Hadoop
Two years ago we picked MapReduce as the top emerging enterprise technology, mainly because it promised something entirely new: analysis of huge quantities of unstructured (or semi-structured) data such as log files and Web clickstreams using commodity hardware and/or public cloud services. Over the past two years, Apache Hadoop, the leading open source implementation of MapReduce, has found its way into products and services offered by Amazon, EMC, IBM, Informatica, Microsoft, NetApp, Oracle, and SAP -- not to mention scores of startups.
Hadoop breaks new ground by enabling businesses to deploy clusters of commodity servers to crunch through many terabytes of unstructured data -- simply to discover interesting patterns to explore, rather than to start with formal business intelligence objectives. But it must be remembered that Hadoop is basically a software framework on top of a distributed file system. Programs must be written to process Hadoop jobs, developers need to understand Hadoop's structure, and data analysts face a learning curve in determining how to use Hadoop effectively.
Early on, tools were developed to make exploiting Hadoop easier for developers. Apache Hive provides SQL programmers with a familiar SQL-like language called HiveQL for ad hoc queries and big data analysis. And Apache Pig offers a high-level language for creating data analysis programs that are parallel in nature, often a requirement for large processing jobs.
IBM was among the first to provide tools on top of Hadoop that let analysts extract value almost right away. Its InfoSphere BigInsights suite includes BigSheets, which enables users to explore data and build processing jobs without writing code, all using a spreadsheetlike interface.
And Hadoop solutions from startups are popping up everywhere. Cloudera, Hortonworks, and MapR combine their own Hadoop distros with enterprise-oriented management tools. Karmasphere Studio is a specialized IDE that allows developers to prototype, develop, debug, and monitor Hadoop jobs, while Karmasphere Analyst is a GUI tool that enables data analysts to generate SQL queries for Hadoop data sets and view the output in charts and graphs. Another startup, Datameer, offers Datameer Analytics Solution, which also sports a spreadsheet-style user interface.
Where will this all lead? As Hadoop solutions proliferate, businesses will have access to unprecedented insight derived from unstructured data -- in order to predict the behavior of Web customers, optimize workflows, and with the aid of data visualization tools, discover patterns in everything from medical histories to common search terms. The best thing about the new wave of Hadoop analytics is that we're only beginning to discover where it may lead. --Eric Knorr
3. Advanced synchronization
Apple and Microsoft may have wildly different strategies, but they agree on one thing: It's time to say good-bye to single-user environments, where each PC or other device is a separate island from the rest of the user's computing world. In fact, both companies are moving to a cloud-enabled fabric of user activities spread across devices and applications.
In October, Apple's iOS 5 debuted alongside iCloud, a cloud-based syncing service that keeps bookmarks, documents, photos, and "key value" data (such as state information) in sync across a user's iOS devices, Macs, and -- to a lesser extent -- Windows PCs. Microsoft's forthcoming Windows 8 takes the concept even further, keeping not just data but application state in sync across Windows 8 PCs and tablets and probably Windows Phone smartphones; as you pick up a device, whatever you were working on with any other device is ready for you to continue with your activity.