What is distributed caching?
Distributed caching is a valuable tactic for improving the speed and performance of your applications. But what is distributed caching exactly, and how does distributed caching work?
What is the definition of distributed caching?
Caching is a basic concept in computing in which frequently accessed data is stored in a nearby location (called the "cache") so that it can be retrieved more quickly. Distributed caching is a caching technique in which the cache is distributed across multiple servers or machines.
Distributed caching has several important benefits, including:
- Scalability: Having a very large cache on a single machine quickly becomes slower and more impractical. On the other hand, storing data in multiple locations allows each cache to remain small and lightweight, so that it can be easily searched for the relevant item.
- Availability: Distributing the cache across multiple locations helps improve the cache's availability. Even if one machine goes down, most of the cache remains available on other machines. You can also create backups for each machine so that the entire cache will be continuously accessible.
How does distributed caching work?
The concept of distributed caching is easy enough to understand, but how does distributed caching work in practice?
One important question is how the data will be partitioned between caches in multiple locations. There are several viable cache partitioning strategies:
- Data entries may easily lend themselves to partitioning, based on a unique identifier that falls within a certain range. For example, a database of students can be partitioned by the first digit of the student's ID number.
- Another partitioning strategy uses a hash function to transform the data into a number, and then assign it to a cache location based on modular arithmetic. For example, suppose that we have three caches and the hash function returns the number 93024921. We calculate 93024921 modulo 3 = 0, which means that we should store this data in the zeroth (i.e. first) cache.
Effectively implementing distributed caching is more challenging than a cache on a single machine. For one, you need to deal with the issue of cache invalidation. Updating or deleting information in one machine's local cache also requires you to update the in-memory caches on all other parts of the distributed system. Otherwise, the cache will lose coherence, i.e. requests to different parts of the cache will return different results.
Second, you need to address the question of latency and timeouts. The different components of a distributed cache need to communicate with each other quickly and efficiently. If the network is slow, you need to determine how long to wait for the machine to respond (i.e. how long before the request times out).
Distributed caching use cases
The use cases of distributed caching include:
- Website and application performance: Unexpected spikes in demand can cause websites and applications to slow down or even crash. Distributed caching helps mitigate or prevent these performance issues.
- Relational database access: Querying a very large relational database can quickly become inefficient, especially when performing complicated operations. Caching frequent queries and commonly accessed data can significantly improve the speed of database access.
Distributed caching in Redis
Redis is an open-source, in-memory data structure store that is used to implement NoSQL key-value databases, caches, and message brokers. Because Redis loads data in memory, it is significantly faster than other database solutions that load data from disk, which makes it a good choice for use cases where high performance is essential - such as distributed caches.
However, as mentioned above, implementing a distributed cache yourself is a significant investment of time, effort, and knowledge. What's more, Redis isn't automatically compatible with popular programming languages such as Java, which can pose a steep learning curve for developers who are new to the Redis platform. For this reason, many Java developers choose to install a third-party Redis Java client such as Redisson.
Redisson includes many familiar Java objects, constructs, and collections for Redis. The simplest way to implement a cache is with Java's Map data structure. Redisson provides the RMap interface, which has two relevant subinterfaces: RMapCache for general-purpose caches, and RLocalCachedMap with support for local caching.
Below is a demonstration of how to use RMapCache in Redisson:
RMapCache<String, SomeObject> map = redisson.getMapCache("anyMap"); map.put("key1", new SomeObject(), 10, TimeUnit.MINUTES, 10, TimeUnit.SECONDS);
Redisson also includes support for popular caching frameworks such as Spring Cache and JCache.