As enterprises seek to move into the big data world -- digitizing paper documents and saving email communications, Word docs, Excel files and all sorts of other unstructured data with the hopes of mining them for actionable business intelligence -- they need to address a big problem up front: storage.
"Enterprises have suddenly accumulated petabytes of information," says Nick Kirsch, director of product management for EMC Isilon. "They're faced with a similar challenge: They've got all this information, how do they make use of it and how do they store it in a scalable architecture?"
[ Doing storage virtualization right is not so simple. InfoWorld's expert contributors show you how to get it right in this "Storage Virtualization Deep Dive" PDF guide. ]
One possibility is to scale vertically (scale up). The idea is to make your existing storage nodes larger, faster, and/or more powerful by replacing your existing storage devices with new, higher-capacity devices. Consolidating storage infrastructure in such a way is attractive, as it simplifies management and reduces the amount of floor space and power consumed. But it's not without problems: It can't span multiple locations easily, it doesn't have much inherent overall resiliency, and large, high-performance storage devices can get expensive in a hurry. And when dealing with the ever-increasing flood of information, the biggest problem is that today's storage devices can get only so big.
"You can build a bigger and bigger single unit controller," says Kirsch. "But at some point you can't build that system any bigger; you have to add a second system. You could end up with hundreds of separate units you need to manage."
Instead, Kirsch says scaling horizontally (scale out) with NAS is the way to go. A scale-out NAS architecture forgoes expensive, high-capacity storage devices for commodity storage components combined into an aggregate storage pool. Instead of making nodes bigger, you add nodes as necessary. The downside is that you can very quickly wind up with a much more complex management environment. But it can span multiple locations and it has a great deal of inherent resiliency. And, perhaps most important from the perspective of managing big data, you can add storage rapidly and cheaply.
"I think the biggest thing that we see, the biggest complaint when it comes to storage is that it's really easy to manage a single unit, but when you have two or more units, it becomes complicated," Kirsch says.
For big data, NAS is preferable to SAN, Kirsch says, because SAN is not built for unstructured data and file sharing. In order to use SAN with network protocols like NFS or CIFs/SMB, you would have to deploy file servers in front of the SAN, resulting in additional management complexity and affecting scalability.
The five tenets of scale-out NAS
Simplicity comes first in Kirsch's five tenets of what CIOs should look for in scale-out NAS architecture:
Simple to scale. "This next generation architecture that they're looking to move to needs to be simple to scale," Kirsch says. "If I have a 1TB drive, that's a volume that I can manage, I can protect, and I can replicate. Why can't I manage 15 petabytes with that same simplicity? It shouldn't be more complicated just because it's bigger." Scale-out NAS architectures can tackle this problem with software management and a virtualization/abstraction layer that makes the nodes behave like a single system.