A data (or database) cache is a high-performance data storage layer that stores a subset of transient data so that future requests for that data are provided faster than by accessing the primary storage location of the data. In the world of edge computing, the “primary data” resides on the public cloud, and the edge device is somehow an intermediary of that data, sometimes providing decoupled data processing.
We already understand the use of edge devices as points of data processing that are closer to the producer of the data. The key advantage here is performance.
[ Also on InfoWorld: Amazon, Google, and Microsoft take their clouds to the edge ]
If the data does not have to be sent to back-end processing systems, such as on public clouds, then it can be processed immediately on the edge device. This is helpful when performance could be critical, such as shutting down a jet engine that is drastically overheating. You don’t want to check with a centralized cloud system to determine a course of action for that.
Another approach to edge architecture comes from the notion that an edge device can serve as a remote data cache as well. This is a bit different than partitioning; a partition has its own independent database or data store, as well as decoupled processing occurring on that data. A data cache is simply intermediate storage for data normally stored centrally. The data cache’s single purpose is to provide better performance and reliability.
For example, say you have an edge device that controls a factory robot. It’s connected to a centralized data and processing engine hosted on a public cloud. In this case, the edge device relies on the centralized system for the production and consumption of data, as well as to provide processing of that data.
Although the edge device controlling your factory robot does not have an independent database or data store, it does host a data cache. The most-accessed data is stored locally and is directly accessible by the edge device with almost no latency.
This is helpful when the network in the factory is less than reliable. However, there is not a core requirement of full-blown databases existing on the edge devices for this particular use case.
The advantage here is lower cost of operations and edge storage. By deciding not to place a decoupled database on the edge device, you don’t have to maintain that database or worry about sync issues with the centralized database. Moreover, the edge devices can be much smaller and cheaper—something to think about if you’re deploying thousands of them.
Security is much easier as well. If you’re storing data centrally, you can focus on security there. This does not mean that the caching system should be exposed, but it’s much easier to deal with than a complete database with more attack vectors.
The key idea here is optimization. Using edge differently, such as leveraging data caches on those edge devices, makes sense when you can save money and time, as well as reduce risks. It’s not the right architecture every time, but it is another tool to make sure you’re doing your best to serve the business.