What is a Data Grid?

With more and more data at their fingertips, and more intensive data processing jobs to perform, organizations need to find technical solutions that can keep up with this complexity. Using a data grid can dramatically improve the efficiency and performance of highly demanding data processing jobs. But what is a data grid exactly, and how do data grids work?

What are data grids?

A data grid is a distributed computer system in which individual machines communicate and coordinate in order to process very large jobs more quickly and efficiently. The members of a data grid may be located in remote geographical locations, and may be clustered together at one or more sites.

Data grids are used when it's not viable to store large amounts of data on a single server and perform the data processing in that location. By enforcing common naming, authentication, and data management policies, data grids offer more structure and consistency, helping to coordinate the actions of every computer in the grid.

"Grid computing" is a term that refers to the use of data grids. Distributed computing and grid computing are similar but distinct concepts. In distributed computing, each computer in the distributed system acts together as a unified whole, and is under the control of a centralized resource manager. In grid computing, however, individual computers have more autonomy and run their own resource manager software. However, in practice the two terms are sometimes used interchangeably.

How do data grids work?

The nodes in a data grid each run software that acts as a resource manager, directing them to perform various activities and computations. These activities are usually subtasks that, when performed in combination, serve as part of a larger, more complicated tasks. In this way, the data grid takes advantage of distributed and parallel processing to dramatically speed up highly complex data processing jobs.

Data grids that operate in random access memory (RAM) are referred to as in-memory data grids (IMDGs). The separate computers in an IMDG pool their RAM together to enable very high-speed data processing. Unlike standard data grids, IMDGs typically consist of nodes that are located in the same data center, since network latency can substantially slow the system down.

With an IMDG, each computer in the grid maintains its own view of the data and data structures in its RAM. However, this view is also accessible to the other computers in the grid, so that the entire system enjoys complete visibility into the current contents of memory.

Note that IMDGs are distinct from in-memory databases (IMDBs). IMDGs are best for high-activity use cases, where data may be accessed or modified at any moment. On the other hand, IMDBs are better for use cases where most data is at rest in storage, and only a small subset of data will be queried and/or processed at any given time.

Data grids in Redis

Redis is an open-source, in-memory data structure store used to implement NoSQL key-value databases, caches, and message brokers. Although Redis can support in-memory databases, however, it does not include data grid functionality out of the box. Nor is Redis automatically compatible with popular programming languages such as Java.

To address this issue, many Java developers use libraries such as Redisson, a third-party Redis Java client with features of an in-memory data grid. Redisson includes implementations of many familiar Java objects, collections, and constructs for Redis. In particular, Redisson allows users to run Java task processing in parallel, using Redis-based distributed implementations of ExecutorService and ScheduledExecutorService.

You can build a data grid in Redis using RExecutorService, which is Redisson's implementation of the ExecutorService interface. RExecutorService is capable of executing Callables, Runnables, and lambda expressions. Below is a simple example of how to run RExecutorService in Redisson:

RExecutorService executor = redisson.getExecutorService("myExecutor");
executor.registerWorkers(options);

The options argument to the registerWorkers() method defines the configuration for this RExecutorService object. The possible options include the number of workers, the length of time until task timeout, and any listener methods that are invoked once the task has completed.

Redisson also supports scheduled task execution with the RScheduledExecutorService. Tasks defined with this service will be scheduled to execute in the future one or more times. Instantiating an RScheduledExecutorService object is almost identical to the above example:

RScheduledExecutorService executor = redisson.getExecutorService("myExecutor");
executor.registerWorkers(options);

Similar terms

Glossary of Terms