Professional Documents
Culture Documents
Lec3 BigO Search Hash
Lec3 BigO Search Hash
Algoritma
Lecture 3: Big-O, Searching and Hashing.
Objective:
To introduce Big-O Notation concepts.
To introduce the searching algorithm.
To introduce the hash algorithm.
To assure that students are able to apply the concepts in the program.
To assure that students are able to increase their expertise on the
topics.
Big-O Notation
With the speed of computers today, we are not
concerned with an exact measurement.
location = indeks;
if (target == list[indeks])
found = true;
else
found = false;
return found;
}
Sequential Searching
There are other 3 methods for sequential searching
algorithm:
Sentinel searching
Probability searching
2. Set index = 0;
3. Loop (target data not equal with array[index]
index ++;
4. If index <= last_index // target found
found=true;
location = index;
5. Else
found = false;
location = last_index
6.Return found;
Probability Searching
Array is ordered with the most probable search
elements at the beginning of the array and the least
probable at the end.
16 4 8 16
50 6 25 50
256 8 128 256
1000 10 500 1000
10,000 14 500 10,000
100,000 17 50,000 100,000
1,000,000 20 500,000 1,000,000
Knowing and understanding the concept of Hash
Using the Hash algorithm
Introduction
The aim of searching using the hash technique is to
retrieve data with only one attempt.
Hash search takes the key as input to a hash
function, and is used to locate data.
Hash is the process of transforming a key to an
address where the key maps to an address in a list.
Data are stored in an array. The hash algorithm is
used to transform the key to index i.e. the address of
the data that is kept in the list.
The population size for the key for a hash list is
greater than the size of the list that stores the data.
Hashing is the process of chopping up the key and mixing it up in various ways
in order to obtain an index which will be uniformly distributed over the
range of indices -- hence the 'hashing'.
Keywords
Synonyms –Set of keys that hash to the same
location
Collision
Happens when data is inserted into a list that contains two or
more synonyms.
Hashing algorithm produces an address for an insertion key that
is already occupied.
Home Address – Address produced by the hashing
algorithm.
Prime Area – memory containing home addresses
Probe –the process of calculating an address, and
testing for success
Example of a Hashing Process
Suppose there is an array sized 50 that stores
students information by the last 4 digit of their
matric number.
Hence, there is a probability of 200 keys for each
element in the list. (10000/50).
As there are many keys for each of index location
for the array, more than one student key may be
hashed into the same array location (synonyms).
This situation may cause collision.
The way to handle collision is by moving the key
and its data to another available address.
Diagram: Hashing Process
Collision resolution
C A B
[4] [8] [16]
Collision resolution
1. hash(A) 3. hash(C)
2. hash(B)
C and B collide
B and A collide at 16
at 8
Hashing Methods
Hashing
Methods
Subtraction Folding
Digit Pseudorandom
Extraction Generation
Hashing Methods (con’t)
Direct Method
Key is the data address. No algorithmic manipulation is needed.
Subtraction Method
Key transformation is implemented by offsetting the key by
subtracting a constant.
Modulo Division Method
Divides the key by the list size (preferably a prime number) to get
the remainder, add with 1, and uses the remainder for address
Digit Extraction Method
Selected digits are extracted from the key and used as address.
Hashing Methods (con’t)
Midsquare Method
Key is squared and address is selected from the middle of the
squared number.
Folding Method
The key value is divided into three parts whose size matches the
size of the required address. Then, the left and right parts are
added with center part to get the address value.
Rotation Method
Rotate last character to the front of the key to get the address.
Often used in combination with other hashing methods.
Pseudorandom Method
Key is used as seed for a pseudorandom number generator. The
resulting random number is scaled into possible address range.
Hashing Algorithm
This algorithm transforms a key into an address
/* Pre : Key is the key to be hashed
size is the digit number for the key
Post: Addr contains the data address after hashing
*/
1.0 looper = 0
2.0 addr = 0
Hash key:
3.0 loop (looper < saiz)
3.1 if(key[looper] not space)
3.1.1 addr = addr + key[looper]
3.1.2 rotate addr 12 bits right
3.2 end if
3.3 looper = looper + 1
4.0 end loop
Test for negative address:
5.0 if(addr < 0)
5.1 add = absolute(addr)
6.0 end if
7.0 addr = addr modulo maxAddr
8.0 end hash
39
Collision Resolution
Using the Direct and Subtraction method
guarantees no collision.
Other methods cause collision.
Collision resolution methods can be used directly
with hashing methods.
There are 3 collision resolution methods
Open Addressing
Linked Lists
Buckets Hashing
Collision Resolution (con’t)
Open Addressing
4 methods: linear probe, quadratic probe, pseudorandom rehashing
and key-offset rehashing.
linear probe stores data into the next open address.
quadratic probe does some operations on the squared collision
probe number.
pseudorandom rehashing uses pseudorandom generator function
to rehash the address
key-offset rehashing uses off-set to rehash all addresses.
Linked List
Uses separate areas to store collisions, and chains synonyms
together in a linked list
41
Collision Resolution (con’t)
Bucket Hashing
Bucket is used to store hashing operations that cause collision
based on the size of the bucket.
E.g. if the bucket size is 5, collisions are postponed until bucket is
full, i.e. as long as the 6th collision doesn’t happen.
If the array size is 100 and the bucket size is 10, there will be 10
buckets.
This method has some weaknesses because it does not resolve the
entire collision problem.
If a bucket is full, collision will happen at the next key that enters
the same address.
42
The End
Q&A
43