[This page is auto-generated; please prefer to read the article’s PDF Form.]
Every hashtable uses a hash-function to map an input key to an integer in {0,1,2,…,m − 1}, where m is the hashtable’s size. For the Division Method of hashing, which simply takes mod m of the input integer key, we find that it is generally recommended to use m which is a prime number. In this article, we discuss one reason behind this recommendation.
The original form of input keys can be any data-type including integer, char, string, struct/object containing multiple members, array of another type etc. But they are generally transformed into a non-negative integer for inputting into the hash-function. We will refer this integer form as key k ≥ 0. Note that during implementation, this key transformation process may be combined with the hash-function computation. But for this discussion, we will say that an integer key k is the input to the hash-function.
For any integer n ≥ 0, we will use ℤn to refer to the set of integers {0,1,2,…,n − 1}.
One of the most desired properties of a hash-function is that it should distribute the input keys among m slots as uniformly as possible. This helps in reducing the number of collisions among keys. But designing such function can be difficult, just because the input keys are not known in advance. In fact, for any choice of the hash-function, the actual input keys may only consist of all possible keys which map to a few slots. But still, we can try our best in selecting the function based on a very rough idea about the set of all possible keys.
There is no single recommendation about hash-functions, which can work best for all sets of keys. And as we will see below, the recommendation to use a prime m for Division Method too need not work the same in all situations.
[Next]
Copyright © 2021 Nitin Verma. All rights reserved.