Extendible hashing notes

 

The main elements of the structure are a directory, which points to buckets that contain the keys and data. A bucket is a structure containing an array of records, a count of the number of records actually occupied, and a bucket depth, which indicates the number of significant bits used to address into this bucket.

 

The directory has size a power of 2. The depth of the directory is the number of bits being used currently to address into the directory. That is, 2directory_depth is the size of the directory.

 

The hash function is used as follows: directory_depth many bits are selected from hash(key), and these are used to index into the directory. Often these are the low order bits of hash(key), but with a good hash function, one could use the high order bits.

 

In the example shown below, the directory depth is 3. The depth of Bucket #0 is 1, the depth of Bucket #1 is 2, and the depth of Buckets #2 and #3 is 3. The search for a key proceeds as follows. If we find that Key #2 hashes to the bit string 101..., we index to directory entry 101, and follow the pointer to Bucket #1. The number of bits we use of the hash value is the directory depth.

 

 

If we insert Key #10, which hashes to 010..., we wind up in Bucket #0, which is full. Since the bucket depth is less than the directory depth, we do not need to add directory entries, but we do split the bucket. We create a new Bucket #4. Now Buckets #0 and #4 will have bucket depth 2. All keys that hash to 00... must be placed in Bucket #0, and all keys that hash to 01... must be placed in Bucket #4. The directory entries must be revised accordingly.

 

Thus, assume that Key #1 hashes to 001..., Key #4 hashes to 011..., Key #8 to 000..., and Key #10 to 011.... We move the keys that hash to 01... to Bucket #4. The changes are indicated in red below.

 

Now consider the following example, in which the directory depth is 2. If Key #6 hashes to 11.., then we must split Bucket #4.

Since the bucket depth equals the directory depth, we will have to double the directory. There will be a new Bucket #3. Keys that hash to 110... will be placed in Bucket #2, and keys that hash to 111... will be placed in Bucket #3. The other directory entries must be duplicated. For example, keys that hash to either 100... or 101... will still be in Bucket #1. The picture that follows shows the result.