Why Redis beats Memcached for caching

Memcached is sometimes more efficient, but Redis is almost always the better choice

Memcached or Redis? It's a question that nearly always arises in any discussion about squeezing more performance out of a modern, database-driven Web application. When performance needs to be improved, caching is often the first step taken, and Memcached or Redis are typically the first places to turn.

These renowned cache engines share a number of similarities, but they also have important differences. Redis, the newer and more versatile of the two, is almost always the superior choice.

The similarities

Let's start with the similarities. Both Memcached and Redis serve as in-memory, key-value data stores, although Redis is more accurately described as a data structure store. Both Memcached and Redis belong to the NoSQL family of data management solutions, and both are based on a key-value data model. They both keep all data in RAM, which of course makes them supremely useful as a caching layer. In terms of performance, the two data stores are also remarkably similar, exhibiting almost identical characteristics (and metrics) with respect to throughput and latency.

Both Memcached and Redis are mature and hugely popular open source projects. Memcached was originally developed by Brad Fitzpatrick in 2003 for the LiveJournal website. Since then, Memcached has been rewritten in C (the original implementation was in Perl) and put in the public domain, where it has become a cornerstone of modern Web applications. Current development of Memcached is focused on stability and optimizations rather than adding new features.

Redis was created by Salvatore Sanfilippo in 2009, and Sanfilippo remains the lead developer of the project today. Redis is sometimes described as "Memcached on steroids," which is hardly surprising considering that parts of Redis were built in response to lessons learned from using Memcached. Redis has more features than Memcached and is, thus, more powerful and flexible.

Used by many companies and in countless mission-critical production environments, both Memcached and Redis are supported by client libraries in every conceivable programming language, and it's included in a multitude of packages for developers. In fact, it's a rare Web stack that does not include built-in support for either Memcached or Redis.

Why are Memcached and Redis so popular? Not only are they extremely effective, they're also relatively simple. Getting started with either Memcached or Redis is considered easy work for a developer. It takes only a few minutes to set up and get them working with an application. Thus, a small investment of time and effort can have an immediate, dramatic impact on performance -- usually by orders of magnitude. A simple solution with a huge benefit; that's as close to magic as you can get.

When to use Memcached

Because Redis is newer and has more features than Memcached, Redis is almost always the better choice. However, Memcached could be preferable when caching relatively small and static data, such as HTML code fragments. Memcached's internal memory management, while not as sophisticated as that of Redis, is more efficient in the simplest use cases because it consumes comparatively less memory resources for metadata. Strings (the only data type supported by Memcached) are ideal for storing data that's only read, because strings require no further processing.

That said, Memcached's memory management efficiency diminishes quickly when data size is dynamic, at which point Memcached's memory can become fragmented. Also, large data sets often involve serialized data, which always requires more space to store. While Memcached is effectively limited to storing data in its serialized form, the data structures in Redis can store any aspect of the data natively, thus reducing serialization overhead.

The second scenario in which Memcached has an advantage over Redis is in scaling. Because Memcached is multithreaded, you can easily scale up by giving it more computational resources, but you will lose part or all of the cached data (depending on whether you use consistent hashing). Redis, which is mostly single-threaded, can scale horizontally via clustering without loss of data. Clustering is an effective scaling solution, but it is comparatively more complex to set up and operate.

When to use Redis

You'll almost always want to use Redis because of its data structures. With Redis as a cache, you gain a lot of power (such as the ability to fine-tune cache contents and durability) and greater efficiency overall. Once you use the data structures, the efficiency boost becomes tremendous for specific application scenarios.

Redis' superiority is evident in almost every aspect of cache management. Caches employ a mechanism called data eviction to make room for new data by deleting old data from memory. Memcached's data eviction mechanism employs a Least Recently Used algorithm and somewhat arbitrarily evicts data that's similar in size to the new data.

Redis, by contrast, allows for fine-grained control over eviction, letting you choose from six different eviction policies. Redis also employs more sophisticated approaches to memory management and eviction candidate selection. Redis supports both lazy and active eviction, where data is evicted only when more space is needed or proactively. Memcached, on the other hand, provides lazy eviction only.

Redis gives you much greater flexibility regarding the objects you can cache. While Memcached limits key names to 250 bytes and works with plain strings only, Redis allows key names and values to be as large as 512MB each, and they are binary safe. Plus, Redis has five primary data structures to choose from, opening up a world of possibilities to the application developer through intelligent caching and manipulation of cached data.

Beyond caching

Using Redis data structures can simplify and optimize several tasks -- not only while caching, but even when you want the data to be persistent and always available. For example, instead of storing objects as serialized strings, developers can use a Redis Hash to store an object's fields and values, and manage them using a single key. Redis Hash saves developers the need to fetch the entire string, deserialize it, update a value, reserialize the object, and replace the entire string in the cache with its new value for every trivial update -- that means lower resource consumption and increased performance.

Other data structures offered by Redis (such as lists, sets, sorted sets, hyperloglogs, bitmaps, and geospatial indexes) can be used to implement even more complex scenarios. Sorted sets for time-series data ingestion and analysis is another example of a Redis data structure that offers enormously reduced complexity and lower bandwidth consumption.

Another important advantage of Redis is that the data it stores isn't opaque, so the server can manipulate it directly. A considerable share of the 180-plus commands available in Redis are devoted to data processing operations and embedding logic in the data store itself via server-side Lua scripting. These built-in commands and user scripts give you the flexibility of handling data processing tasks directly in Redis without having to ship data across the network to another system for processing.

Redis offers optional and tunable data persistence designed to bootstrap the cache after a planned shutdown or an unplanned failure. While we tend to regard the data in caches as volatile and transient, persisting data to disk can be quite valuable in caching scenarios. Having the cache's data available for loading immediately after restart allows for much shorter cache warm-up and removes the load involved in repopulating and recalculating cache contents from the primary data store.

Data replication too

Redis can also replicate the data that it manages. Replication can be used for implementing a highly available cache setup that can withstand failures and provide uninterrupted service to the application. A cache failure falls only slightly short of application failure in terms of the impact on user experience and application performance, so having a proven solution that guarantees the cache's contents and service availability is a major advantage in most cases.

Last but not least, in terms of operational visibility, Redis provides a slew of metrics and a wealth of introspective commands with which to monitor and track usage and abnormal behavior. Real-time statistics about every aspect of the database, the display of all commands being executed, the listing and managing of client connections -- Redis has all that and more.

When developers realize the effectiveness of Redis’ persistence and in-memory replication capabilities, they often use it as a first-responder database, usually to analyze and process high-velocity data and provide responses to the user while a secondary (often slower) database maintains a historical record of what happened. When used in this manner, Redis can also be ideal for analytics use cases.

Redis for analytics

Three analytics scenarios come immediately to mind. In the first scenario, when using something like Apache Spark to iteratively process large data sets, you can use Redis as a serving layer for data previously calculated by Spark. In the second scenario, using Redis as your shared, in-memory, distributed data store can accelerate Spark processing speeds by a factor of 45 to 100. Finally, an all too common scenario is one in which reports and analytics need to be customizable by the user, but retrieving data from inherently batch data stores (like Hadoop or an RDBMS) takes too long. In this case, an in-memory data structure store such as Redis is the only practical way of getting submillisecond paging and response times.

When using extremely large operational data sets or analytics workloads, running everything in-memory might not be cost effective. To achieve submillisecond performance at lower cost, Redis Labs created a version of Redis that runs on a combination of RAM and flash, with the option to configure RAM-to-flash ratios. While this opens up several new avenues to accelerate workload processing, it also gives developers the option to simply run their “cache on flash.”

Open source software continues to provide some of the best technologies available today. When it comes to boosting application performance through caching, Redis and Memcached are the most established and production-proven candidates. However, given its richer functionality, more advanced design, many potential uses, and greater cost efficiency at scale, Redis should be your first choice in nearly every case.

Itamar Haber (@itamarhaber) is chief developer advocate at Redis Labs, which offers Memcached and Redis as fully managed cloud services for developers. His varied experience includes software product development and management and leadership roles at Xeround, Etagon, Amicada, and M.N.S Ltd. Itamar holds a Master of Business Administration from the joint Kellogg-Recanati program by Northwestern and Tel-Aviv Universities, as well as a Bachelor of Science in Computer Science.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

To comment on this article and other InfoWorld content, visit InfoWorld's LinkedIn page, Facebook page and Twitter stream.
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.