One alternative to buying expensive storage-area networks or other hardware-based dedicated storage is to deploy open source storage software on existing server hardware. For this test, we evaluated three such open source storage products: GlusterFS 3.3, Ceph 0.72, and Apache Hadoop 2.2.0.
All three did a good job, but as you might expect, there is a tradeoff when it comes to using open source storage. This is a DIY project: The documentation might not be as comprehensive as you might like, installation can be tricky, GUI-based management tools might not be available, and if anything goes wrong, you’re pretty much on your own.
We liked GlusterFS for its hashing algorithm, which for the most part eliminates the bottleneck and single point of failure risk associated with products that use centralized management. However, GlusterFS, which is being developed by Red Hat, lacks GUI-based management tools.
Ceph also impressed us with its algorithm model. We also liked how Ceph provided object, block and file storage in one system. However, while Ceph is an interesting product to keep an eye on, it’s not ready for prime time deployment in the enterprise. The vendor does not yet recommend CephFS (the file system) for production environments.
Apache Hadoop is a popular, full-featured product with a nice, web-based management console. Our concern with Hadoop HDFS is the potential bottleneck and single point of failure of the centralized server that stores the metadata. Currently there are ways to manually failover to a secondary metadata server and the vendor is working to make failover an automatic feature, but at the time of publication this feature was not yet available.