Dominated by the RDMS paradigm, databases were a pretty sleepy area of technology for many years. NoSQL changed all that. Although the technology behind most NoSQL databases isn't new, a wild variety of approaches and products has left many customers struggling to catch up (see Andrew Oliver's classic "Which freaking database should I choose" for a lively primer).
One of the key benefits touted by NoSQL is the ability to scale easily compared to RDBMS databases. But how exactly does that work? For the answer, we turn this week to Rahim Yaseen, senior vice president of engineering at Couchbase, vendor of a popular, documents-style NoSQL database. He also touches on the complexities of querying using a document database. -- Paul Venezia
Scaling out and querying large datasets with NoSQL
Today, we're at an inflection point where organizations are looking for ways to manage, store, and capitalize on a ballooning influx of data. With ultralarge data sets, organizations must determine the solution that best fits their needs to scale their technology and their business.
One key database decision is whether to scale out or to scale up -- but what does that mean, anyway?
In a scale-out architecture, associated with NoSQL databases, a distributed set of nodes known as a cluster is used as the basic architecture. It provides highly elastic scaling capability, enabling you to add nodes to handle load on the fly. This is the opposite of a scale-up architecture associated with the RDBMs, which adds more resources to a single, larger machine.
A key concept in scaling out is "shared nothing." An ideal scale-out architecture is based on a shared-nothing architecture, where all nodes are peers and there is no single shared resource that serves as a bottleneck. In addition to all nodes being independent, all the data must be evenly distributed or partitioned across these nodes through a process called sharding. This is an important process and can be accomplished either manually or through an automated system.
Manual vs. autosharding
To understand the difference between manual and autosharding, consider the registration process at a typical conference. When you walk into the registration area, you may be asked to go to the registration booth that corresponds to the first initial of your last name to check in. For instance, A through D might check in at booth No. 1, E through H at booth No. 2, and so on. This is an example of scaling via manual sharding.