Bossies 2014

Bossie Awards 2014: The best open source data center and cloud software

InfoWorld's top picks of the year in open source platforms, infrastructure, and management software

The best open source data center and cloud software

The best open source data center and cloud software

Slowly but surely, the shape of the data center is changing -- and maybe not so slowly. The push is on to close the distance between development and production, and to bridge the gap between private data centers and the cloud. These open source projects are leading the way.

Nginx

Nginx

A Web server engineered for high speed and low memory usage, Nginx now powers about 20 percent of all websites. Not only a fast Web server, it works as a reverse proxy, application accelerator, and media delivery system. Its success has spawned a commercial edition, Nginx Plus, which adds enterprise features such as load balancing, streaming media services, and high-end cache controls to the mix. The commercial version is available either through a direct license from Nginx or in the form of an AWS appliance at a potentially lower cost. But the open source core of Nginx remains free to use for all.

-- Serdar Yegulalp

OpenStack

OpenStack

OpenStack is the cloud platform that everyone loves to criticize. Its greatest strength -- and weakness -- is the support from large IT vendors like HP, IBM, VMware, and Cisco. With each company competing to contribute and to control, the outcome is messy and complex. At the same time, the support of the major vendors shows their desperation for a competitor to Amazon and Google. OpenStack will take some time to mature and stabilize, but its place as the dominant private cloud of tomorrow seems assured.

-- Greg Ferro

AppScale

AppScale

One of the most striking developments in the cloud world has been the re-creation of proprietary cloud architectures -- the AWS APIs, for instance -- via open source. One of the latest projects, AppScale, reproduces Google App Engine using nothing but open source components. What's more, it runs on a wide variety of existing cloud hosts, ranging from Amazon EC2 to Microsoft Azure to, yes, Google Compute Engine. Apps written in Python, Go, PHP, and Java are all supported. AppScale (the company) has promoted AppScale (the project) as a way to back up apps from Google App Engine or to provide an easy exit strategy from Google's cloud. But it's clear the project has a lot to offer even outside of that.

-- Serdar Yegulalp

Eucalyptus

Eucalyptus

With Eucalyptus you can build, deploy, and manage a hybrid cloud infrastructure on your own servers and on AWS, and shift workloads seamlessly between your private cloud and Amazon for on-demand "cloud bursting." This year's Version 4.0 release brought smoother integration into both enterprise networks and AWS, among other improvements. Eucalyptus also finally scrapped the old admin interface in favor of a nice heads-up dashboard.

New separation and registration of user-facing services from the Eucalyptus cloud controller database brings improved scalability when reaching into the Amazon cloud. Multiple hosts can now run these front-end components concurrently, in an active/active configuration, to eliminate bottlenecks for Amazon services such as IAM and EC2.

The addition of AWS CloudFormation support brings a consistent, template-driven approach to deployment. And the new Edge Networking Mode lets you ditch the cluster controller for a stand-alone node controller, allowing Eucalyptus to fit more seamlessly into an existing network infrastructure. A new S3 object storage gateway brings support of distributed object stores such as Ceph, Swift, and RiakCS, along with S3 itself.

-- James R. Borck

Ansible

Ansible

Ansible is a streamlined data center orchestration tool that does not rely on agents, but rather communicates with remote managed devices via SSH for Linux and Unix systems, and via PowerShell for Windows systems. This agentless approach eases installation and integration.

Built on Python, Ansible uses "playbooks" to manage configuration parameters, and offers a large assortment of pre-built playbooks for common configuration and management tasks. Custom modules can be written in just about any language, with standard JSON output used to interface with the Ansible framework.

AnsibleWorks offers a subscription-based Web UI, the Ansible Tower, that can be used to manage portions of the solution.

-- Paul Venezia

Salt

Salt

Salt approaches data center automation and orchestration from a speed and scalability perspective. It's also quite extensible through a well-designed hierarchical configuration layout, built on "states" and "pillars," that allows for both wide and fine-grained control over managed systems.

Salt scales out extremely well, using multiple levels of masters and tiering arrangements that can encompass environments of just about any size. It can operate with or without an agent, assuming SSH access is available to managed systems, and it can manage Linux and Windows servers. Salt has a large number of existing configuration modules for common tasks, and custom modules can be written in PyDSL or Python.

-- Paul Venezia 

Cloud Foundry

Cloud Foundry

Cloud Foundry was created by VMware to streamline deployment for application developers, application operators, and cloud operators. Cloud Foundry was open sourced in 2011 under the Apache 2.0 license, with the pitch to developers that they could code in the languages and Web frameworks of their choice without worrying about the IT environment. It's a promise the project has kept.

Overall, Cloud Foundry is a strong platform as a service in its open source form, and in both proprietary forms from Pivotal: online as Pivotal Web Services, and in the enterprise as Pivotal CF. In addition, you'll find good proprietary PaaS offerings based on Cloud Foundry that are available from other foundation members, including ActiveState.

-- Martin Heller

OpenShift

OpenShift

OpenShift was designed to provide rapid self-service deployment of common languages, databases, Web frameworks, and applications. One of its current differentiators is that continuous integration (using Jenkins) is a standard part of its workflow; another is that it automatically scales applications across nodes. It's useful for developers, devops, testing, and production deployment (see the InfoWorld review), and it can run in a public or private cloud, or on-premises.

Origin is the bleeding-edge, community-supported, free open source version. It has daily updates and runs on your hardware using Fedora as the underlying OS. It's not really intended for production environments, but can provide a good, fast, free development environment that runs on a laptop or desktop. OpenShift is also available in Online and Enterprise versions from Red Hat.

-- Martin Heller

Ceph

Ceph

A unified yet modular storage system, Ceph serves up the full range of object, block, and POSIX file system storage while keeping things simple by allowing you to deploy only the components you need. Over the past year, Ceph integrated erasure coding -- algorithmic parity replication -- that cuts hardware durability demands nearly in half. Plus, new cache tiering combines fast front-end caching with that erase-coded back end, delivering both cost-effectiveness and faster throughput.

Thanks to Red Hat's acquisition of Inktank this year, Inktank's Calamari -- a browser-based front end to Ceph, formerly available only with Ceph Enterprise -- is now free open source as well. Calamari brings dashboards for node and cluster health, along with real-time graphs for IOPS, available storage, and pool status. 

-- James R. Borck 

Elasticsearch

Elasticsearch

Elasticsearch is a text search engine based on the popular Apache Lucene, providing a distributed, near real-time query engine accessible via JSON/REST APIs. Most modern search solutions utilize index files to provide fast querying of the document corpus. Doing this in parallel has several challenges, such as keeping indices in sync and handling failure.

Elasticsearch neatly marries Hadoop's underlying MapReduce infrastructure to a distributed search index broken up into shards. With replication, this means that there are multiple index shards to read from, meaning that searches are parallel, fast, and resilient to failure. Although a relatively young project in the text search ecosystem, Elasticsearch has earned its stripes. Multitenancy features make it a great choice for large environments.

-- Steven Nunez

Logstash

Logstash

Everyone has a problem with log files. Until the advent of big data tools, the solution was to keep them until you ran out of space, then toss them. These days log files are a valuable resource, in domains ranging from traditional systems and application monitoring to the SIEM (security information event management) space. Combined with Elasticsearch, Logstash becomes an interesting alternative to a Splunk or a commercial security forensics tool. There's a host of plug-ins to tap into common data sources. Throw Kibana into the mix, and you have a complete stack for solutions like fraud detection, intrusion detection, or monitoring the health of your Hadoop cluster.

-- Steven Nunez

Docker

Docker

Fast becoming the world's favorite devops tool, Docker passed the 1.0 milestone this year and welcomed the likes of Google, Red Hat, VMware, and even Microsoft (which supports Docker on Azure) to the bandwagon.

A command-line-driven toolkit for building, managing, and sharing Linux containers, Docker brings the Java-like "build once, run anywhere" promise to all Linux-based application stacks, allowing the same server images to be pushed from developer's desktop to staging to production -- or to any Docker-compatible Linux server running anywhere. Likewise, operators can easily push standard OS and database images to developers. Unlike virtual machines, Linux containers share the underlying OS, eliminating redundant OS bloat and resource-intensive hypervisors to deliver lean, zippy containers that are not only more portable but dramatically more efficient. With Docker, applications can be containerized in minutes, containers can be spun up in an instant, and images can be distributed and shared via the Docker Hub

-- James R. Borck

CoreOS

CoreOS

Where does Docker go from here? To become a key piece of the OS, if you ask CoreOS. CoreOS uses Docker to create a slim-and-trim version of Linux in which all of the applications -- and a good chunk of the OS's own infrastructure -- are managed as Docker containers, and where fleets of CoreOS deployments automatically work together as a cluster. Updates can be applied far more seamlessly, and nodes can update themselves across a cluster in a rolling fashion with little intervention. Since its release, CoreOS has become a featured distribution on Google Cloud Platform, Amazon EC2, Rackspace Cloud, and many other cloud platforms.

-- Serdar Yegulalp

Panamax

Panamax

Docker containerization makes application deployment a snap -- as long as your application lives in only one or two containers. Deployment gets tricky with multicontainer scenarios. Enter Panamax from CenturyLink.

Panamax -- itself a container running CentOS in a VirtualBox VM -- provides browser-based tools for custom container construction. Panamax lets you search for Docker templates and images, graphically configure services, then park your container as a template in GitHub for rapid reuse. Panamax will even display inter-container connections in a nice little graph.

To be sure, Panamax is beta product. It can't yet target multiple hosts, it lacks authentication, and not all Docker options are supported. But Panamax offers a welcome view into how simplified modeling will ease container deployment and management in the not-too-distant future.

-- James R. Borck

Kubernetes

Kubernetes

In the art of container management, Google's experience is unsurpassed -- its services have run containerized for years. Kubernetes is bringing the wealth of that experience to Docker containers.

Originally focused on multicontainer applications in "micro services" architectures, Kubernetes traffics in tightly coupled groups of containers called "pods," using controllers to monitor those pods and meet availability requirements through container replication, restarting, and rescheduling.

A built-in query engine uses labels to quickly filter containers for easy monitoring, as well as to group them into templates for fast replication. Additional components support lifecycle management, load balancing, and a RESTful API.

-- James R. Borck

Mesos

Mesos

Efficient resource management across distributed systems is no easy feat -- static partitioning is plain wasteful. The Apache Mesos cluster manager abstracts away the hard-edged details of the data center (CPU, memory, storage, and so on) and presents a unified pool of available resources. Policy-driven Mesos masters sit between schedulers and cluster slave nodes, determining how many resources to offer each application and orchestrating fault tolerance and optimization.

Mesos' dynamic resource sharing means doing more with less hardware, which can translate into huge cost savings in large environments. Mesos has been underpinning Twitter's scalability for years now, and it continues to improve. This year's updates brought ACL-based authentication, container-level network monitoring, and, my favorite, the ability to use Docker container images in tasks. In short, Mesos lets you automate production-grade scaling of containerized apps.

-- James R. Borck

MariaDB

MariaDB

With MySQL firmly under Oracle's thumb, it made sense to create an independent fork of the ubiquitous database. Hence MariaDB, a drop-in, binary-compatible replacement for MySQL that's designed to work as well as possible with existing MySQL databases, but taking its own evolutionary path. Some of the new features added to MariaDB include a broader range of storage engines (such as the NoSQL Cassandra engine), a thread-pooling system akin to the one found only in the enterprise (for-pay) edition of MySQL, and security/compliance features also not available in the open source version of MySQL.

-- Serdar Yegulalp

PostgreSQL

PostgreSQL

PostgreSQL has been around for many years, but is gaining popularity now due to its stability and rich set of features. The latest stable release, PostgreSQL 9.3, supports JSON documents, allowing Postgres to be used in a "document oriented" way for the first time. PostgreSQL 9.4 improves on JSON with the new type "jsonb," which stores JSON as a data structure, allowing the column to be indexed fully. Another great new feature in PostgeSQL 9.3 is materialized views. Unlike views, materialized views are stored as tables, so they can be indexed and optimized.

-- Travis Wellman

Mule ESB

Mule ESB

Mule ESB is still kicking it in application integration after all these years. A strong set of connectors reaches out to system, app, and data endpoints wherever they might live, be it in your network or in the cloud. The Anypoint Studio IDE simplifies the development of back-end logic with visual data mapping and drag-and-drop configuration. A large stock of third-party connectors and APIs helps Mule meet most modern integration challenges. And solid tools for syncing and scheduling, transformation mapping, and accessing modern data assets (including Hadoop, Cassandra, and MongoDB) make up for the somewhat light process management and data modeling capabilities. Although this year saw mostly bug fixes and productivity enhancements, new templates for third-party players opened the door to Salesforce.com and SAP integrations.

-- James R. Borck