[This page is auto-generated; please prefer to read the article’s PDF Form.]
Now we will understand how choosing m to be a prime number becomes important in such linear-pattern situations.
Consider m input keys in form b + ai, with i taking values of any m consecutive integers. The hash-values for these keys are:
| (1) |
We know that taking mod m of any m consecutive integers (here, i) will simply give: {0,1,2,…,m − 1}. So, the hash-values taken by these m keys are simply:
|
To know further about these values, we refer to an earlier article titled Multiples of an Integer Modulo Another Integer, specifically its Corollaries 3 and 4, which are:
Corollary 1. For any integers a,b,n with n > 0, a mod n≠0, and g = gcd(a,n), the set of integers A = {(ai+b) mod n : i ∈ ℤn} consists of values b,b + g,b + 2g,…,b + ((n∕g) − 1)g after taking their mod n.
Corollary 2. For any integers a,b,n with n > 0, a and n coprime (i.e. g = gcd(a,n) = 1), the set of integers A = {(ai+b) mod n : i ∈ ℤn} is same as ℤn = {0,1,2,…,n − 1}.
Say gcd(a,m) is g. So the hash-values of these m keys are the below m∕g distinct values, all taken mod m:
|
Now consider the different values of g:
1. g = 1: in this case, the hash-values of the m keys will be all distinct and due to corollary 2, consist of the set: {0,1,2,…,m − 1}. So this will be the most uniform distribution for these m keys. g = 1 means a and m are coprime.
Since the actual input keys of the hashtable may exhibit multiple such patterns with different a and it is not always possible to predict these patterns, knowing a may not be possible. So, to make a and m coprime, the simpler way is to pick m to be a prime. That would ensure that gcd(a,m) = 1 for all a, except when a is a multiple of m.
2. g = m: it means, a is a multiple of m. In this case, the hash-values of all m keys will be the same, b mod m.
Even if m is prime, there is one situation where it will not be effective. That is when a itself is a multiple of m, which makes g = m. So, while we choose a prime m, we should try to avoid a value which divides any possible a of the key patterns.
3. 1 < g < m: note that, for any g, the m input keys will only take m∕g distinct hash-values. That is, if g > 1, they will be distributed over only a portion (1∕g) of the total m slots. Such situation can arise when m is a composite (not prime) and a has a common factor with m, giving gcd(a,m) > 1.
In above analysis, we considered m input keys having form b + ai, with i among any m consecutive integers. These m keys were found to map to m∕g distinct slots, 1 ≤ g ≤ m. What can we say about any set S of keys in the same form (b + ai), but i taking arbitrary integer values? Due to equation (1), we know that h(k) values of these keys will depend upon (i mod m), not i. But, any (i mod m) value comes only from {0,1,2,…,m− 1}. So, the hash-values of these S keys will be nothing but some subset of the hash-values taken by keys with i in {0,1,2,…,m − 1}, which is what we discussed above. That means, the keys of S can only map within the m∕g slots found above, and so we should try to keep g = 1 as discussed.
■
Copyright © 2021 Nitin Verma. All rights reserved.