Hash Tables
Overview
Hash Tables allow you to do very quick lookup (random access) for unordered datasets.
They go by several other names:
Maps/Hash Maps(C/C++)Dictionaries(Python)Associative Arrays(Bash)
Hash Functions
Hash functions are mathematical operations that take strings as input, and provide numbers as output.
- Insert a unique
stringand get a uniquenumber - Output is consistently linked to input (the function gives the same output every time)
- Different inputs shouldnt map to the same output (
collision)
A hash table simply uses a hash function to map strings into positions of an array.
Usecases
-
Need fast lookup for unordered (non-sequential) data
-
Preventing duplicate entries
-
Use as a cache (remember recently used data)
-
Model relationships between one object and another
Hash Collisions
We said that hash functions always map different keys to different slots.
In reality, it's really hard to make a hash function where every input gives a different output.
Dealing with Collisions
The simplest way to deal with a collision is to replace the entry at a location with a linked list. Each overlapping entry would be contained in the list.
Doing this too much can severly decrease the performance of the table, since you end up with a linked list, right?
Hash Table Performance
| Average | Worst | |
|---|---|---|
| Search | O(1) |
O(n) |
| Insert | O(1) |
O(n) |
| Delete | O(1) |
O(n) |
To improve performance, we need to limit collisions:
- Keep a low
Load Factor - Use a good
Hash Function(it should spread items around very evenly)
Load Factor
Load factor: (number_of_filled_slots / total_number_of_slots)
It's generally a good idea to keep the load factor under 0.7
When you have to increase the size of the array, you should double the size.
This will ammortize the resizing complexity