Musings on Javascript Data Structures: Part 2 — Hash Tables
Data Structures: Hash Tables — from a Javascript perspective.
Hash Table is a data structure that stores data in an associative manner¹. This specific data structure uses hashing to implement these associative arrays (or mapping of {key: value} pairs)². Please note that in other languages, you may see these associative arrays referred to as dictionaries, maps, or hashes. In JavaScript, we interact with associative arrays all of the time working with data: they are quite clearly known as Objects. Maps take objects one step further, not only storing {key: value} pairs, but then also cataloging the original order of insertion of keys.
In this post, I will build a simple Hash function and then implement a HashTable class to construct a new instance of a HashTable. This class will have 3 functions, as well, to insert, search and remove items from the HashTable. This is, in a way, a high level overview showing what exactly happens under the hood of Javascript Objects on the whole. Beiatrix’ Youtube video on Hash Tables (see the link below), was the most straight forward, easy to understand breakdown of this concept: I will say right away, this entire blog is a reference to that vid and what I learned from her on this topic.
Hash Function
First things, first, we will build the Hash function that all {key: value} pairs will be sent through to be hashed. Two things to know about the Hash function: #1. Once you put anything through to be hashed, you can never put that hashed return value through the function to obtain your original value, hence why hashing is so incredible for security uses (reminds of Ruby gem: bcrypt and hashing passwords to get password_digests). Hashing is irreversible. #2. No matter how many times you put the same input into a Hash function, the function will return the same hashed value every single time. Hashing is deterministic.
Let’s go through lines 2–8 in the function above. Line 2: we are setting a variable to an anonymous function, which will accept the parameters: ‘key’ & ‘size’ (*Key will be of type string and Size is of type number). An important aspect of Hash Tables, is that they have placeholders called buckets or slots that hold the {key: value} pairs. We’ll want to predetermine the max capacity of slots. This is where ‘size’ factors in to our Hash function. On line 10, we determined that our hash table would have 5 buckets, but on line 25, we determined 100 buckets to hold our values. On line 3, we need this function to return a hashed key, so we initialize the variable to 0 and in our for…loop we update that value for every character in the string of our key. We want to iterate through each character in the string and break out of the for loop after as soon as we have operated on every character. The operation that we perform on each character is adding (+=) to the hashedKey’s value: key.charCodeAt(i). According to MDN docs: The charCodeAt()
method returns an integer between 0
and 65535
representing the UTF-16 code unit at the given index³. Each character has its own UTF-16 code: side-note, capitalized and lower-case case letters each have different numeric value different from the other, ie: “A” => 65, “a” => 97. When we have updated the value of hashedKey for each character in the string, we will return the hashedKey value %(modulo) size of the bucket. Reminder, modulo(%) in JS is different then /(divided). Modulo returns the remainder of the division operation, ie: 12/5 => 2.4, but 12%5 => 2.
Above, please don’t be confused I was merely showing the difference in the return value of changing the allowed size of the bucket.
Hash Table
The next step, after we have defined our Hash function is to build our Hash Table class. There is a way to build this functionality without constructing a class, and an example of this can be seen in Beau Carnes’ video here⁴.
We want to start off our HashTable class by building our constructor for creating and initializing a new HashTable object. Again, we need to predetermine the number of slots that our associative array will have. On line 12 we determine there will be 25 slots and on line 13, we instantiate a new Array of size 25 buckets. ES6 gave us Maps: The Map
object holds key-value pairs and remembers the original insertion order of the keys. Any value (both objects and primitive values) may be used as either a key or a value ⁵. Side note: If working with another primitive value in our hash above, you must first change that value to a string at the start of your Hash function. Using Maps in our for…lop of lines 15–17, we will ‘populate’² each bucket with a Map in the case of collisions, I’ll talk about collisions below. Now we want to create our insertion, searching and deletion functions directly in this class.
INSERT
We will first want to build our insert function so that after we instantiate a new HashTable, we can insert our {key: value} pairs into the table. Set the index to the return value of hashing our key with the size of our Array of buckets, set the Array of buckets at that index to the value of the {key: value} pair. .set() is a built in function of Map.
SEARCH
Our search function will take the parameter of key. Again, we will set the index to the return value of hashing our key with the size of our Array of buckets. Our return value will be finding the key at that index of the Array of buckets, and returning our value of that key.
REMOVE
The remove function starts out the same as our insert and search functions by setting our index. We then define variable of deleted, which is the value of the key at that index in the Array of buckets. We delete the {key: value} pair, and return the value that was deleted.
Here is the entire Hash Table implemented:
I put this implementation in REPL and gave it a set of {key: value} pairs of vegetables and their colors…cause hey, why not?!?! Hopefully this will show what is exactly going on from instantiating a new HashTable to inserting pairs, to searching for a key, and then removing a key.
COLLISIONS
Two different inputs CAN return the same hash result, and this is called a collision. This can be seen in the image above, where tomatoes and squash both live at index 1 in our Array. In our code, specifically, JavaScripts Map function takes care of these instances for us. It makes sure not to over-write the value of key of tomato, but instead add squash, so that two objects can live simultaneously at index 1. There are other ways of achieving this using other data structures (for more info feel free to look into Separate Chaining), but Map is pretty amazing for this.
TIME COMPLEXITY OF HASH TABLES
Hash Tables, are incredibly fast for adding, finding and deleting objects in our data, as you can see in the functions of insertion, searching and removing (above). All 3 of these functions give us an average Big O(1) for time complexity. Obviously, our space complexity will be O(n) because the more elements in our data, the more space we take up in a linear degree. On the other hand, Hash Tables aren’t the useful for searching when you don’t know the key and especially are not to be used for sorting.
___________________________________________________________________
- Tutorials Point | Samuel Sam | ‘https://www.tutorialspoint.com/Hash-Table-Data-Structure-in-Javascript’
- Hash Tables — Data Structures in JavaScript | beiatrix | ‘https://www.youtube.com/watch?v=QuFPIZj55hU’
- String.prototype.charCodeAt()| ‘https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt’
- Hash Tables — Beau teaches JavaScript | Beau Carnes — Free Code Camp | ‘https://www.youtube.com/watch?v=F95z5Wxd9ks’
- Map | ‘https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map’