IBM opened the doors to its Almaden Research Center this week to show what its scientists are working on, including some advanced technologies for storage and data analysis.
Located at the southern tip of Silicon Valley, Almaden claims to be the birthplace of the distributed relational database and the first data mining algorithms. Fiddling with bits and bytes to improve how they're stored and analyzed continues to be a focus, although the labs also work on areas like nanotechnology, spin physics, and human-computer interaction.
[ Managing backup infrastructure right is not so simple. InfoWorld's expert contributors show you how to get it right in this "Backup Infrastructure Deep Dive" PDF guide. ]
The projects on show this week included Panache, a file system for use across wide area networks; Sage, a tool for moving data to different storage tiers automatically; and Cobra, which helps companies figure out what people are saying about them in online forums.
Panache is a clustered file system that provides applications with high-speed access to a large, central pool of data even if the applications are far away, in data centers in different parts of the country or on different continents, for example.
"Customers are asking us to give them a way, when data is created at one site, to make it available in other geographically distributed locations, so that users at those locations can access the data as if it were local," said Bruce Hillsberg, director of the Storage Systems research group.
The file system uses advanced caching techniques to make sure the data at each location is kept consistent. It has push and pull characteristics that replicate changes efficiently across multiple nodes in a wide area network, so that conflicts don't arise when changes are made to the data caches at individual nodes.
IBM says it could have several uses. Engineers working on a project in different countries can access the same set of data and make changes to it locally without worrying about the cached versions getting out of synch.
It could also reduce the time it takes to replicate virtual machines between data centers, researchers here said. Applications running inside a virtual machine access data from a virtual LUN, typically stored as a file in the data center. When a new virtual machine is configured or restarted from a failure, the OS image and its virtual LUN have to be transferred between sites, causing delays before the application is ready for use.
Panache can maintain a cache of the OS and its virtual LUN at the remote site, so it's there when needed. IBM researchers say this would greatly reduce the time and complexity of configuring new virtual machines and moving them across a wide area network. It could also help companies to reduce data center costs. Instead of hosting 20,000 virtual machines in one large data center, the faster migration capabilities would provide the option of hosting the VMs across 20 smaller data centers.