The six levels of primary data storage

A 20,000-foot view of data storage, from networked desktop hard disks to monster SANs, reveals that the same issues emerge again and again

As our data continues to grow at a nearly exponential clip, storage vendors have responded with ever cheaper and more capable products. But the push for higher capacity and lower prices has muddied the waters.

Not long ago, if you were buying a multiterabyte storage device, it would almost definitely have been a highly reliable, high-performance, enterprise-class SAN. Today, you can jam that same amount of storage into a tower desktop for a tiny fraction of the cost. As a result, many storage products are being marketed as "SAN" storage when they aren't much better than a desktop in terms of performance and reliability.

[ Cut straight to the key news for technology development and IT management with our once-a-day summary of the top tech news. Subscribe to the InfoWorld Daily newsletter. ]

More than ever, it's important to have a solid understanding of what forms primary storage can take and what differentiates them. In rough terms, the primary storage ladder can be broken down into six distinctive rungs. Who you are and what you do will determine your best option.

Primary data storage, rung 1: Peer to peer

Users: 2 to 10

Cost: Bupkus

Redundancy: None

The concept of peer-to-peer primary storage should be familiar to just about anyone who owns a computer. Essentially, each user's workstation stores his or her own data. In the event that data needs to be shared, technology built into the operating system allows others to see that data. It's cheap and incredibly simple.

For individuals and very small businesses, this is often the best option. Given that there are over 5 million businesses in the United States alone with less than 10 employees, peer-to-peer storage makes up a huge percentage of all data storage. But as a business grows, managing multiple, unreliable islands of storage can become increasingly difficult. Most desktop operating systems also don't offer much in the way of unified security, so this model is difficult to support securely beyond a few users.

Primary data storage, rung 2: The file server

Users: 10 to hundreds

Cost: $2,000 to $5,000

Redundancy: Low

Examples: Microsoft Windows Server, Buffalo TeraStation III

The next logical step beyond decentralized, workstation-based primary storage is combining all of that shared data onto a single, dedicated server. By doing this, companies can standardize their data protection and security models across all of their mission-critical data. Centralizing the data also makes it cheaper to invest in redundancy -- whether redundant disk arrays or power supplies.

Most file servers are exactly that: an industry-standard server with a general-purpose server operating system and lots of direct-attached disk dedicated to sharing files. However, many low-end NAS devices fall into this category as well. As this kind of NAS device becomes increasingly prevalent in businesses of all sizes, it's important to note that they are essentially the same as a file server.

At a certain point, though, a business will outgrow a single file server or NAS device. Usually, the most common approach is to add more file servers. As this practice continues, the same problems plaguing peer-to-peer storage emerge again. Instead of maintaining a single pool of storage, you're now tasked with managing many of them. Similarly, the exposure to data loss through hardware failure is multiplied as the number of devices increases.

File servers and NAS devices are also poorly suited for storing block-level structured data such as databases and email. These applications are usually built on their own servers with their own direct-attached storage, which further compounds the storage management challenge.

Primary data storage, rung 3: Low-end SAN (a file server by any other name)

Users: 10 to hundreds

Cost: $2,000 to $20,000

Redundancy: Low

Examples: Microsoft Windows Storage Server derivatives, Overland SnapServer

In an effort to address the challenge of managing both structured and unstructured corporate data simultaneously, many storage vendors have come out with low-end SAN devices that allow both block- and file-level data to be stored on the same device. The benefit to using this kind of device is that all of a company's data -- file shares, databases, email, virtualization infrastructures, and so on -- can be combined into the same storage pool and managed and protected together.

But these devices, though technically SANs (most of them support iSCSI to allow remote, block-level storage access), are really nothing more than a standard server with different software in place to allow the device to serve iSCSI requests in addition to file serving. In general, they offer no more redundancy than a normal server, nor do they scale beyond a normal server in terms of performance.

In short, these devices may allow you to efficiently manage all of your storage needs, but they lack the performance, scalability, and reliability of enterprise-class SANs.

Primary data storage, rung 4: Enterprise-class SAN

Users: 50 to thousands

Cost: $20,000 to millions

Redundancy: High

Examples: EMC Clariion/Symmetrix, Netapp FAS, Dell EqualLogic, IBM DS, HP EVA/XP

Instead of using industry-standard server hardware and software, enterprise-class SANs employ highly redundant, dual-controller architectures, boasting such features as mirrored caches and redundant interconnect interfaces. Similarly, enterprise-class SANs are also highly scalable -- supporting a much higher level of capacity and far greater performance than their low-end brethren.

This field of devices includes not just the typical block-level SAN, but also higher-end, multicontroller NAS devices that are capable of serving both block- and file-level data with the same redundancy and performance. In addition, these devices allow storage admins to mix different capacities and speeds of physical storage media (both disks and SSD), making it possible to present the right type of storage to each storage consumer while still maintaining a unified management architecture.

Only a few years ago, the entry level for this type of device was well above $50,000. That price tag has fallen precipitously. As a result, the number of enterprises that can afford to own a SAN has sharply increased.

Primary data storage, rung 5: Network-based storage virtualization

Users: Thousands to tens of thousands (and beyond)

Cost: Sky's the limit

Redundancy: Cadillac

Examples: EMC Invista, HP SVSP, NetApp V-series

As scalable and redundant as enterprise-class SANs are, the largest enterprises will eventually outgrow a single SAN platform and need to field multiple SANs to achieve the levels of performance and reliability they require. As this happens, the same inefficiencies -- in terms of both capacity and management -- rear their heads once more. To combat this problem, large enterprises often employ network-based storage virtualization to unify heterogeneous SAN storage platforms together into a single logical infrastructure.

Essentially, storage virtualization involves the introduction of an abstraction layer between storage consumers (both individual users and servers of all shapes and sizes) and physical storage devices. This abstraction layer permits much greater freedom in managing very large storage infrastructures by allowing administrators to transparently replicate and migrate data without storage consumers being aware of it. Storage virtualization also provides nearly limitless capacity and performance scalability.

Primary data storage, rung 6: Wild card -- the cloud

Users: Variable

Cost: Variable

Redundancy: Variable

Examples: Amazon S3, Mosso/Rackspace Cloud Files

The newest entrant to the primary storage field is not so much a new form of storage hardware or software, but an entirely different storage delivery model. Instead of buying a storage device that's suited to your organization's needs and then inevitably upgrading it in phases as you grow, the promise of cloud-based storage is that it allows you to pay for the storage you're using when you're using it and to elastically scale without limits.

Though cloud-based storage is not widely used by enterprises, few doubt that it will mature and ultimately play a huge role in the future of storage. Current challenges include convincing customers that cloud-based alternatives are reliable enough to support the mission-critical needs of the enterprise -- service-level agreements tend to be less than reassuring -- and surmounting the security and regulatory hurdles that arise when sensitive data is stored with a third party.

This article, "The six levels of primary data storage," originally appeared at InfoWorld.com. Read more of Matt Prigge's Information Overload blog and follow the latest developments in network storage and information management at InfoWorld.com.

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Join the discussion
Be the first to comment on this article. Our Commenting Policies