LRU Cache — A Cache Data Structure
Optimize performance by storing often-used data in memory and evicting data Least Recently Used
How many movies have you really watched in the past year? Imagine just how many people have gotten through this pandemic with the help of a multitude of streaming services available. While we all binge on movie after movie, show after show, our networks have been congested with this massively heavy load of data. This congestion leads to connection errors and slow load time, or what an article, written by Bell Labs back in 2000, refers to as ‘client latency start-up’².
“Caching (pronounced “cashing”) is the process of storing frequently used data in memory, file system or database which saves processing time and load if it would have to be accessed from original source location every time. In computing, a cache is a high-speed data storage layer which stores a subset of data, so that future requests for that data are served up faster than is possible by accessing the data’s primary storage location. Caching allows you to efficiently reuse previously retrieved or computed data”¹. To sum it up: Caches are meant for fast storage and easy, quicker look-up and retrieval of data.
This blog will cover an LRU Cache — Least Recently Used Cache. Because Cache’s can only be so big, where the size limit is determined at inception, an LRU Cache keeps track of the least recently used data, and once the data storage limit size is reached, it actually must evict the LRU element in order to load the next element, if the retrieved item wasn’t already in our cache. To build our LRU Cache, there are two Data Structures best suited for the job; Hash Map & Doubly Linked Lists.
Let’s look at a visualization of what’s actually happening with the LRU Cache and Movie retrieval.
In our example above, we’ll set our data to a key of a letter, with a value of the movie. Our cache only has space to hold 6 movies, even though in our example there are 7 movies listed. We initially build our cache to be a data structure with 6 spaces in memory. As our clients visit our site to stream movies, the movie enters the cache to be held in memory (which in reality happens usually in the allotted space in RAM — random access memory). The first six movies watched enter our cache, no problem and as more and more clients stream the movies, these movies actually even move around in order in our cache. For our example, above A was watched more times then E, which was watched more times than D, etc. If our cache looks something like this with 6 movies in it: A-E-D-F-C-G, what happens, now when a client streams B? Well we want to evict the movie that has been streamed the lease, so we can load now load in B. It’s the only way: We’re sorry to see you leave G, but you weren’t popular enough for us to remember for now.
If our data is in the form of a Hash Map, with key/value pairs, our cache needs to be built in with the structure of a Doubly Linked List, with the least streamed movie at the tail, more often streamed movies closest to the head, and the movie at the head being THE most recently streamed movie. The Doubly Linked aspect of our list is important so that we can switch around movies that already exist in our cache, as they are called on to stream.
According to Interview Cake: “An LRU cache is an efficient cache data structure that can be used to figure out what we should evict when the cache is full. The goal is to always have the least-recently used item accessible in O(1) time.”³
I want to say that this was NOT an easy thing for me to build. At this stage of my development, I would not have been able to build this on my own, and would like to credit user @junishwu on LeetCode for this beautiful code:
In the code above, classes and methods are build for DoublyLinkedLists, Nodes, and an LRUCache. The code accounts for most recent remaining at the head of the List and the least recently used being at the tail, and the first evicted once the structure gets to capacity and more relevant data needs to be added.
This LRU Cache question can be found on LeetCode here.
There is a way to solve this using just a hash map, although the Double Linked List and Hash Map have a better Big O: For node access and deletion there exists a Time Complexity of O(1) and Space Complexity O(2).
Hash Map Code:
- Cache Scope and Object Store in Mule 4 | Apisero| “https://www.apisero.com/cache-scope-and-object-store-in-mule-4/”
- Design and Implementation of a Caching System for Streaming Media over the Internet | Ethendranath Bommaiah, Katherine Guo, Markus Hofmann, Sanjoy Paul | “http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34.8680&rep=rep1&type=pdf”
- LRU Cache | “https://www.interviewcake.com/concept/java/lru-cache”