Huwebes, Hunyo 28, 2012

An Introduction to Hashing

Hashing a convenient technique that is used to search for a key in an almost constant time. Hashing basically places the keys of a record to an array H[1...m-1] using a hash function. A hash function must be chosen well because it can affect the performance of the algorithm. A good hash function is something that distributes the keys almost evenly on the table, yet it is easy to compute since excessive computation can slow down the algorithm. The modulo operation is often used here.

    One of the problems of hashing is collisions. Collisions happen when two or more keys are hashed on the same cell. Collisions are inevitable when your m<n where m is the size of the table and n is the number of keys. This follows from the pigeonhole principle. Two ways of preventing collisions are the open hashing and the closed hashing.

    Open hashing makes linked list whenever a collision occur. Basically, especially when the hash function spreads the keys evenly, the hash table in open hashing are a collection of linked list, but this time each set has smaller numbers as opposed to the original collection if one is to make them into a linked list.

    Closed hashing doesn’t use linked lists, instead the keys are stored in the table itself, thus it is necessary for m to be at least the same size of n. The simplest form of closed hashing is the linear probing. The idea is if a collision occurs, the computer would check for the next cell, if it is free then the key is written there other wise it checks the next one and so on. The array in here is a circular one.

    A problem that may occur during linear probing is clustering. Hash tables that are nearly full are the most commonly vulnerable to this. A cluster is a sequence of contiguously occupied cells. This results in the computer spending more time finding for a free cell. One of the solution for this is double hashing, another form of closed hashing, where one use another hash function s(K) to determine a fixed increment for the probing sequence to be used after a collision at location l = h(K).

References
Levitin, Anany. The Design and Analysis of Algorithms. 2nd ed.

Walang komento:

Mag-post ng isang Komento