Lec3 BigO Search Hash

DITP 2113 Struktur Data &
Algoritma
Lecture 3: Big-O, Searching and Hashing.
Objective:
 To introduce Big-O Notation concepts.
 To introduce the searching algorithm.
 To introduce the hash algorithm.
 To assure that students are able to apply the concepts in the program.
 To assure that students are able to increase their expertise on the
topics.
Big-O Notation
With the speed of computers today, we are not
concerned with an exact measurement.
Let say 2 algorithms executed:

 First algorithm: 15 iterations
 Second algorithm: 25 iterations
 Both are so fast, we cannot see the differences
But if the second algorithm produces 1500 iterations,

therefore we should consider the used of first algorithm.
Big-O Notation
Since the exact measurement is not needed, Big-O
notation will be used to see the result only.
Big-O notation as in ‘on the order of’ and written as

O(n) – ‘on the order of n’.
Thus, if an algorithm used is quadratic,

its efficiency is: O(n2)
Big-O Notation
 Big-O notation can be derived from f(n) using the
following steps:
 Set the coefficient of the term to 1
 Keep the largest term in the function and discard the others.
Terms are ranked from lowest to highest as below:
log2n < n < nlog2n < n2 < n3 … nk < 2n < n!

Example 1
• For calculate Big-O notation
f(n) = n (n + 1) /2
= ½ n2 + ½ n
• Step 1: Remove all coefficient and gives us:

n2 + n
• Step 2: Remove the smaller factor and gives us:

n2
So, Big-O notation stated as:

O(f(n)) = O(n2)
Example 2
For calculate Big-O notation
f(n)=ajnk+aj-1nk-1+…+a2n+a1n+a0
Step 1: Remove all coefficient and gives us:

f(n) = nk + nk-1 + … + nk + n + 1
Step 2: Remove the smaller factor and gives us:

nk
So, Big-O notation stated as:

O(f(n)) = O(nk)
 Sequential searching
 Binary searching
 Understand the meaning of searching in data structure
 Understand the searching algorithm
 Understand the differences of searching technique for
sequential and binary
 Understand the differences of searching technique
efficiency.
Introduction
Searching process always been used in most of
the computer operation.
This process is used to access important
information for specific application.
The algorithm used to search is depends on the
particular structure of data group.
Normally, the data are placed in array.
There are 2 basic searching for array:
 Sequential search
 Binary search
 Others than array, data can be placed in terms of
element which it will use binary tree search method.
Sequential Searching
Usually used whenever the list of data is not
ordered.
Frequently used for small lists or lists that are not

searched often (e.g. – of the data less than 16).
Start searching for the target at the beginning of

the list and continue until find the target.
This approach gives 2 possibilities either we find

the data or no data in the list and reach the end of
the list.
a[0] a[1] a[2] a[3] a[4]
4 36 21 14 91
For example, if we want to find value 14, searching

process start at index 0, then 1 and then 2 before
finding the 14 in the fourth element (index 3)
If we want to find 66, searching process start at

index 0 until index 4 (end of list) and the value 66
is not in the list, then the searching will end at the
last index in the list which are index 4.
Sequential Searching Algorithm
1. Set index = 0;
2. Loop (index less than size – 1 and target data not equal
with array[index])
index ++;
3. Set location_found = index;
4. If target data equal with array[index]
found = true;
5. Else
found = false;
6. Return found;
E.g. Sequential Searching Program
bool seqSearch (int list[ ], int last, int target, int&
lokasi)
{
int indeks;
indeks = 0;
bool found = false;
while (indeks < last && target != list [indeks] )
indeks ++;
location = indeks;
if (target == list[indeks])
found = true;
else
found = false;
return found;
}
There are other 3 methods for sequential searching
algorithm:
Sentinel searching
Probability searching
Ordered list searching

Sentinel Searching
Put the target data at the end of the array placing
as sentinel
Optimize the loop of sequential searching

algorithm to one condition only.
Loop (target data not equal with array[index])
index ++;
Sentinel Searching Algorithm
1. array[last_index + 1] = target // as sentinel data
2. Set index = 0;
3. Loop (target data not equal with array[index]
index ++;
4. If index <= last_index // target found
found=true;
location = index;
5. Else
found = false;
location = last_index
6.Return found;
Probability Searching
Array is ordered with the most probable search
elements at the beginning of the array and the least
probable at the end.
To ensure the probability searching is correct over

time, in each search, we exchange the located element
with the element immediately before it in the array.
Takes time to get efficient result.

Probability Searching Algorithm
1. Set index = 0, temp = 0;
2.Loop (index less than size – 1 and target data
not equal with array[index]
index ++;
3. If target data is equal with array[index]
found = true;
If index > 0
temp = array[index – 1];
array[index-1] = array[index];
array[index] = temp;
4.Else
found = false;
5.Location = index;
6.Return found;
Ordered List Searching
For small size of array and the value in ascending order.
Searching end if target value is less than or equal to the

current value in array.
Commonly used for a linked list.

Ordered List Searching Algorithm
1. int index;
2. If target data less than array[last_index]
1. Set index = 0;
2. Loop (target data is bigger than array[index]
index++;
3. Else
Set index = last_index
4. If target data = array[index]
found = true;
5. Else
found = false;
6. Location = index;
7. Return found;
Binary Searching
Sequential searching process is very slow especially when
big amount of data in array.
If an array of 1000 elements/data, we must do 1000

comparison in worst case.
If the array is not sorted, sequential search is the only

solution.
But, if the array is sorted, we can use more efficient

algorithm called binary search.
Binary

Searching
In general, binary searching will divide the data
in array into 2 part (Part 1 and Part 2)
Binary searching start by testing the data in the
element at the middle of the array to determine
if the target is in the first (left) or second (right)
half of the list.
If target data is less than data in middle of

array, searching will be done on first half of the
list only (do not check the second half)
And if target data is larger than middle of the
array, searching will be done on second half
only.
Binary Searching
To find middle of value, use formula:
Mid_index = (first_index + last_index) / 2
Let say the ordered array below contains 6 data,
therefore
middle = (0 + 5) / 2 = 2
a[0] a[1] a[2] a[3] a[4] a[5]

4 14 21 36 91 256
So, middle value is at index 2 that contain value of

21. (Reminder: the middle result formula must not
be rounded)
Binary Searching
Let say target data is 36.
Since 21 < 36, so the data compare will be done at

second half (right)
To test this second half, index must be set:

start_index = current_mid_index + 1
Therefore, start_index now is 2 + 1 = 3

Binary Searching
Next find the end of second half
mid_index = (3 + 5) / 2 = 4
Data in index 4 is 91, since 36 < 91, searching process will

be held at first half (left).
Repeat the process to find middle of index. But for this

case, last_index need to be changed:
last_index = current_mid_index – 1
So, last_index now is 4 - 1 = 3

Binary Searching
Next, find the middle index
Mid_index = (3 + 3) / 2 = 3
Compare data in index 3 with target data

a[3] = 36
Therefore, end of searching and 36 found in

index 3.
E.g: Part of Binary Searching Program
while (start <= end)
{
middle = (start + end) / 2;
if (target_data>array[middle])
start = middle + 1;
else if(target_data<array[middle])
end = middle - 1;
else
start = end + 1;
}
E.g: Binary Searching Program
bool binarySearch ( int list[ ], int end, int target, int&
locn )
{
int first, mid, last;
first = 0;
last = end;
while ( first <= last )

{
mid = (first + last) / 2;
if( target > list [mid] )
first = mid + 1;
else if( target < list [mid] )
last = mid - 1;
else
break;
}
locn = mid;
return ( target == list[mid] ); }
Analyzing Searching Algorithm
Sequential searching is include in linear algorithm.
Therefore, Big-O notation for sequential searching

is O(n) where the worst case for sequential search
process is total data to be compared.
While binary searching will make division in
searching process, therefore binary searching is
included in logarithmic algorithm. Big-O notation
for binary searching is O(log2n).
This make binary search as the most efficient
seaching technique and easy to use.
Case Comparisons
Sequential Sequential
Binary
Size Searching Searching
Seaching
(average) (worst case)
16 4 8 16
50 6 25 50
256 8 128 256
1000 10 500 1000
10,000 14 500 10,000
100,000 17 50,000 100,000
1,000,000 20 500,000 1,000,000
 Knowing and understanding the concept of Hash
 Using the Hash algorithm
Introduction
 The aim of searching using the hash technique is to
retrieve data with only one attempt.
Hash search takes the key as input to a hash
function, and is used to locate data.
Hash is the process of transforming a key to an
address where the key maps to an address in a list.
Data are stored in an array. The hash algorithm is
used to transform the key to index i.e. the address of
the data that is kept in the list.
The population size for the key for a hash list is
greater than the size of the list that stores the data.
Hashing is the process of chopping up the key and mixing it up in various ways
in order to obtain an index which will be uniformly distributed over the
range of indices -- hence the 'hashing'.
Keywords

Synonyms –Set of keys that hash to the same
location
Collision
 Happens when data is inserted into a list that contains two or
more synonyms.
 Hashing algorithm produces an address for an insertion key that
is already occupied.
Home Address – Address produced by the hashing
algorithm.
Prime Area – memory containing home addresses
Probe –the process of calculating an address, and
testing for success
Example of a Hashing Process
Suppose there is an array sized 50 that stores
students information by the last 4 digit of their
matric number.
Hence, there is a probability of 200 keys for each
element in the list. (10000/50).
As there are many keys for each of index location
for the array, more than one student key may be
hashed into the same array location (synonyms).
This situation may cause collision.
The way to handle collision is by moving the key
and its data to another available address.
Diagram: Hashing Process
Collision resolution
C A B
[4] [8] [16]
Collision resolution
1. hash(A) 3. hash(C)
2. hash(B)
C and B collide
B and A collide at 16
at 8
Hashing Methods
Hashing
Methods
Direct Modulo Midsquare Rotation

Division
Subtraction Folding
Digit Pseudorandom
Extraction Generation
Hashing Methods (con’t)
Direct Method
 Key is the data address. No algorithmic manipulation is needed.
Subtraction Method
 Key transformation is implemented by offsetting the key by
subtracting a constant.
Modulo Division Method
 Divides the key by the list size (preferably a prime number) to get
the remainder, add with 1, and uses the remainder for address
Digit Extraction Method
 Selected digits are extracted from the key and used as address.
Hashing Methods (con’t)
Midsquare Method
 Key is squared and address is selected from the middle of the
squared number.
Folding Method
 The key value is divided into three parts whose size matches the
size of the required address. Then, the left and right parts are
added with center part to get the address value.
Rotation Method
 Rotate last character to the front of the key to get the address.
Often used in combination with other hashing methods.
Pseudorandom Method
 Key is used as seed for a pseudorandom number generator. The
resulting random number is scaled into possible address range.
Hashing Algorithm
 This algorithm transforms a key into an address
/* Pre : Key is the key to be hashed
size is the digit number for the key
Post: Addr contains the data address after hashing
*/
1.0 looper = 0
2.0 addr = 0
Hash key:
3.0 loop (looper < saiz)
3.1 if(key[looper] not space)
3.1.1 addr = addr + key[looper]
3.1.2 rotate addr 12 bits right
3.2 end if
3.3 looper = looper + 1
4.0 end loop
Test for negative address:
5.0 if(addr < 0)
5.1 add = absolute(addr)
6.0 end if
7.0 addr = addr modulo maxAddr
8.0 end hash
39
Collision Resolution
Using the Direct and Subtraction method
guarantees no collision.
Other methods cause collision.
Collision resolution methods can be used directly
with hashing methods.
There are 3 collision resolution methods
 Open Addressing
 Linked Lists
 Buckets Hashing
Collision Resolution (con’t)
Open Addressing
 4 methods: linear probe, quadratic probe, pseudorandom rehashing
and key-offset rehashing.
 linear probe stores data into the next open address.
 quadratic probe does some operations on the squared collision
probe number.
 pseudorandom rehashing uses pseudorandom generator function
to rehash the address
 key-offset rehashing uses off-set to rehash all addresses.
 Linked List
 Uses separate areas to store collisions, and chains synonyms
together in a linked list
41
Collision Resolution (con’t)
Bucket Hashing
 Bucket is used to store hashing operations that cause collision
based on the size of the bucket.
 E.g. if the bucket size is 5, collisions are postponed until bucket is
full, i.e. as long as the 6th collision doesn’t happen.
 If the array size is 100 and the bucket size is 10, there will be 10
buckets.
 This method has some weaknesses because it does not resolve the
entire collision problem.
 If a bucket is full, collision will happen at the next key that enters
the same address.
42
The End
Q&A
43

Lec3 BigO Search Hash

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec3 BigO Search Hash

Uploaded by

Copyright:

Available Formats

DITP 2113 Struktur Data &

Let say 2 algorithms executed:

But if the second algorithm produces 1500 iterations,

Big-O notation as in ‘on the order of’ and written as

Thus, if an algorithm used is quadratic,

log2n < n < nlog2n < n2 < n3 … nk < 2n < n!

• Step 1: Remove all coefficient and gives us:

• Step 2: Remove the smaller factor and gives us:

So, Big-O notation stated as:

Step 1: Remove all coefficient and gives us:

Step 2: Remove the smaller factor and gives us:

So, Big-O notation stated as:

Frequently used for small lists or lists that are not

Start searching for the target at the beginning of

This approach gives 2 possibilities either we find

For example, if we want to find value 14, searching

If we want to find 66, searching process start at

Ordered list searching

Optimize the loop of sequential searching

To ensure the probability searching is correct over

Takes time to get efficient result.

Searching end if target value is less than or equal to the

Commonly used for a linked list.

If an array of 1000 elements/data, we must do 1000

If the array is not sorted, sequential search is the only

But, if the array is sorted, we can use more efficient

If target data is less than data in middle of

a[0] a[1] a[2] a[3] a[4] a[5]

So, middle value is at index 2 that contain value of

Since 21 < 36, so the data compare will be done at

To test this second half, index must be set:

Therefore, start_index now is 2 + 1 = 3

Data in index 4 is 91, since 36 < 91, searching process will

Repeat the process to find middle of index. But for this

So, last_index now is 4 - 1 = 3

Compare data in index 3 with target data

Therefore, end of searching and 36 found in

while ( first <= last )

Therefore, Big-O notation for sequential searching

Direct Modulo Midsquare Rotation

You might also like