When to use a CRDT-based database

Learn why, when, and how to use conflict-free replicated data types to bring strong eventual consistency to your geo-distributed applications

1 2 Page 2
Page 2 of 2
  1. Test cases when network connectivity is on and latency between the nodes is low: Your test cases must have more emphasis on simulating conflicts. Typically, you do this by updating the same data across different nodes many times. Incorporate steps to pause and verify the data across all nodes. Even though the database replicas synchronize continuously, testing the eventual consistency will necessitate pausing your test and checking the data. For validation, you will want to verify two things: That all of the database replicas have the same data, and that whenever a conflict occurred, the conflict resolution happened as per the design.
  2. Test cases for partitioned networks: Here you typically execute the same test cases as earlier, but in two steps. In the first step, you test the application with a partitioned network – i.e., a situation where the databases are unable to synchronize with each other. When the network is split, the database doesn’t merge all the data. Therefore, your test case must assume that you are reading only a local copy of the data. In the second step, you reconnect all the networks to test how the merge occurs. If you are following the same test cases as in the previous section, the eventual final data must be the same as in the previous set of steps.

How to use CRDTs in your applications

In this section we demonstrate how you can use CRDTs in your applications. Here we cover four sample implementations for CRDTs: counters, distributed caching, collaboration using shared session data, and multi-region data ingest.

CRDT use case: Counters (polling, likes, hearts, emoji counts)

Counters have many applications. You may have a geo-distributed application that is collecting the votes, measuring the number of “likes” on an article, or tracking the number of emoji reactions to a message. In this example, the application local to each geographical location connects to the nearest database cluster. It updates the counter and reads the counter with local latencies.

redis crdt figure 6 Redis Labs

Figure 6. Sample illustration of a counter data type.

Data type

PN-Counter

Pseudocode

// 1. Counter shared across all users
void countVote(String pollId){
     // CRDT Command: COUNTER_INCREMENT poll:[pollId]:counter
}
// 2. Read the global count
long getVoteCount(String pollId){
// CRDT Command: COUNTER_GET poll:[pollId]:counter
}

Test cases for connected network

Increment count at one region

  1. Run your app in all regions; increment the counter at one location
  2. Stop the counter
  3. Your app at all regions reflects the latest counter value

Increment count at multiple regions

  1. Run your app in all regions; increment the counter at all locations
  2. Stop the counter; note the individual counts by region
  3. Your app must be consistent across all regions, with all regions showing the same count

Test cases for partitioned network

Increment count at multiple regions

  1. Isolate CRDT database replicas
  2. Run your app in all regions; increment the counter at all locations
  3. Stop the counter; note the individual counts by region
  4. Your app shows only the local increments
  5. Reconnect the networks
  6. All locations show the updated count; your app adjusts to this behavior

CRDT use case: Distributed caching

The caching mechanism for a distributed cache is the same as the one used in local caching: Your application tries to fetch an object from the cache. If the object doesn’t exist, the app retrieves from the primary store and saves it in the cache with an appropriate expiration time. If you store your cached object in a CRDT-based database, the database automatically makes the cache available across all the regions. In this example, the poster image for each movie cached locally gets distributed to all the locations.

redis crdt figure 7 Redis Labs

Figure 7. Step 1: The poster image is stored in the cache locally.

redis crdt figure 8 Redis Labs

Figure 8. Step 2: The CRDT-based database synchronizes the cache across all the regions.

Data type

Register. Verify the conflict resolution semantic implemented by your CRDT-based database. If your database supports time-to-live (TTL) on the objects, ensure your objects are expired across all the replicas.

Pseudocode

// 1. Cache an object as a string
void cacheString(String objectId, String cacheData, int ttl){
     // CRDT command: REGISTER_SET object:[objectId] [cacheData] ex [ttl]
}

// 2. Get a cached item as a string
String getFromCache(String objectId){
     // CRDT command: REGISTER_GET object:[objectId]
}

Test cases for connected network

Run your app in all regions; set a new cache object in all regions. Verify that your application can pull cached objects from the local clusters, even if the objects were cached at other locations.

Test cases for partitioned network

Simulate a network partition, set values, and reconnect. Verify that your application is designed to handle the following scenarios:

  1. Updating the same key results in last writer wins
  2. The keys upon merge get the longest value of TTL

CRDT use case: Collaboration using shared session data

CRDTs were initially developed to support multi-user document editing. Shared sessions are used in gaming, e-commerce, social networking, chat, collaboration, emergency responders, and many more applications. In this example, we demonstrate how you can develop a simple wedding registry application. In this application, all the well-wishers of a newly married couple add their gifts to a shopping cart that is managed as a shared session.

The registry application is a geo-distributed application with each instance connecting to the local database. To begin a session, the owners of the registry invite their friends from across the world. Once the invitees accept the invitation, they all get access to the session object. Later, they shop and add items to the shopping cart.

redis crdt figure 9 Redis Labs

Figure 9. Step 1: The couple invites their international friends to their wedding registry.

redis crdt figure 10 Redis Labs

Figure 10. Step 2: Invitees add their items to the shopping cart.

redis crdt figure 11 Redis Labs

Figure 11. Step 3: The owners of the wedding registry now have all the items that they can check out.

Data types

2P-Set and a PN-counter to hold the items of a shopping cart, plus a 2P-Set to store active sessions.

Pseudocode

void joinSession(String sharedSessionID, sessionID){
     // CRDT command: SET_ADD sharedSession:[sharedSessionId] [sessionID]
}

void addToCart(String sharedSessionId, String productId, int count){
     // CRDT command:
     //          ZSET_ADD sharedSession:[sharedSessionId] productId count  
}

getCartItems(String sharedSessionId){
     // CRDT command:
     //          ZSET_RANGE sharedSession:sessionSessionId 0 -1
}

Test whether your app reflects the CRDT rules:

  1. Weights behave like a counter; adding the same object twice results in two objects in the sorted set
  2. Add wins over remove

Test cases for connected network

Run your app in all regions, add objects to the distributed Sorted Set data structure. Weights behave like a counter; adding the same object twice results in two objects in the Sorted Set. Verify whether your application conforms to these CRDT semantics.

Test cases for partitioned network

Simulate a network partition. Add and remove objects from the local cluster. Reconnect and verify whether your application is designed to handle the CRDT semantics of a Sorted Set.

CRDT use case: Multi-region data ingest

Lists or queues are used in many applications. In this example, we demonstrate how you can implement a distributed job order processing solution. As shown in Figure 12, the job order processing system maintains active jobs in a CRDT-based List data structure. The solution collects jobs at various locations. The distributed application at each location connects to the nearest database replica. This reduces the network latency for write operations, which in turn allows the application to support a high volume of job submissions. The jobs are popped out of the List data structure from one of the clusters. This assures that a job is processed only once.

redis crdt figure 12 Redis Labs

Figure 12. Sample illustration of multi-region data ingest.

Data types

CRDT-based list. The list data structure is used as a FIFO queue.

Pseudocode

pushJob(String jobQueueId, String job){
     // CRDT command: LIST_LEFT_PUSH job:[jobQueueId] [job]    
}

popJob(String jobQueueId){
     // CRDT command: LIST_RIGHT_POP job:[jobQueueId]
}

Test cases for connected network

Run your app in all regions, add objects to the List data structure. CRDT merges the objects added to the list. Verify whether your application conforms to the CRDT semantics of the List data structure.

Caution: If your application pops an object out of the list at one cluster, then your job processor is assured of processing one job. However, if you pop objects out of all the clusters, then two or more locations may end up processing the same job. If your application requires that you process a job only once, then you will need to implement your own locking solution at the application layer.

Test cases for partitioned network

Simulate a network partition. Add jobs to the local cluster. Reconnect and verify whether your application is designed to handle the CRDT semantics of a List.

CRDT-based databases enable you to build highly engaging geo-distributed applications. In this article, we covered four use cases that apply CRDTs: global counters, distributed caching, a shared session store, and multi-region data ingest. By leveraging a CRDT-based database in these and other scenarios, you can focus on your business logic and not worry about data synchronization between the regions. More than anything, CRDT-based databases deliver local latency for engaging applications, yet promise strong eventual consistency even when there is a network breakdown between data centers.

Roshan Kumar is a senior product manager at Redis Labs. He has extensive experience in software development and technology marketing. Roshan has worked at Hewlett-Packard and many successful Silicon Valley startups including ZillionTV, Salorix, Alopa, and ActiveVideo. As an enthusiastic programmer, he designed and developed mindzeal.com, an online platform hosting computer programming courses for young students. Roshan holds a bachelor’s degree in computer science and an MBA from Santa Clara University.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Copyright © 2018 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2