If you have multiple nodes in your cluster, Couchbase spreads the documents over all the nodes in the cluster. Say you have a three-node Couchbase cluster with one bucket that has three documents in it; each node might have one document. This is done via a hashing algorithm that's part of the Couchbase client -- but there can be more than one replica of this data. You specify the number of replicas when creating your bucket. After the bucket is created, you cannot change the number of replicas, so be sure to choose the number you really want.
Another consideration for replicas is that the replica data is also stored in memory. This means you use the memory on the first and second nodes. This is not necessarily a bad thing -- in the event of a failure, the data is available almost instantaneously. The catch is that you have to turn on auto-failover and specify an interval for a node to be considered down; the data won't be available for reads until this auto-failover takes place. The default is 30 seconds, during which time your application has to deal with having certain documents from a bucket be unavailable. In our configuration, we had a single testing server, so we had merely one node.
Couchbase multinode setup is fairly easy and requires next to nothing to maintain. It doesn't require anything complicated like Zookeeper or extra configuration nodes. In case of a failed node, the Couchbase server can be auto-configured to initiate a failover, which means the failed node is removed from the cluster and read-write access is still available for other nodes.
For performance reasons, Couchbase manages the application's working set in memory up to the amount of memory specified for the bucket. If the amount of data exceeds the amount of memory, the oldest documents are evicted from memory, though they're still on disk. This makes for a speedy system. It also means you don't have to deal with a separate caching layer -- it is built-in cache.
The data is stored in the bucket as a key-value pair. You can store entire sets of objects this way because of the flexibility of JSON. Everything we stored for the application was stored as JSON. This requires marshaling (and unmarshaling) the JSON data from and to the objects in the application. We used Jackson mapper for this; it is widely used and allowed us more flexibility for circular references and the like.
The data is also schema-less. What this means is that if you want to add another field to a document, you just add it. You don't have to worry about all the documents that already exist. They are more than happy to exist without the new field, and changes to the schema are painless and quick.
Couchbase has additional features that set it apart from other document databases -- and other NoSQL databases, for that matter. It is a distributed key-value store, the data manager is written in C/C++, and the cluster manager is written in Erlang. By having a large amount of built-in
mapreduce functions, many simple operations become very easy to implement. This also provides a great reference for writing our own