You are on page 1of 49

SEARCHING TECHNIQUES

Syllabus
Searching: Concept of Searching, Sequential search, Index
Sequential Search, Binary Search. Concept of Hashing &
Collision resolution Techniques used in Hashing.
What is Searching?
Searching in data structure refers to the process of
finding the required information from a collection of
items stored as elements.
A Search is said to be successful or unsuccessful
depending upon whether the element that is being
searched is found or not.
Introduction to Searching Algorithms
Not even a single day pass, when we do not have to search
for something in our day to day life, car keys, books, pen,
mobile charger and what not.
Same is the life of a computer, there is so much data stored
in it, that whenever a user asks for some data, computer has
to search it's memory to look for the data and make it
available to the user.
Around 80% of CPU cycle wasted in searching process.
Well, to search an element in a given list, there are three
popular algorithms available:
Linear Search
Binary Search
Index Sequential Search
Linear Search
Linear search is a very basic and simple search algorithm.
In Linear search, we search an element or value in a given
list by traversing the list from the starting, till the desired
element or value is found.
It compares the element to be searched with all the
elements present in the list and when the element
is matched successfully, it returns the index of the element
in the array, else it return -1.
Let the searched element is 33
Linear Search Example
 If you are asked to find the name of a person having phone number
say “1234” with the help of telephone directory.
 Since the telephone directory is sorted by name not by phone
number, we have to go through each and every number of the
directory.

Advantage:
 If the first number in the directory is the number you were
searching for, then lucky you!!
 Since you have found it on very first page, now it is not important
for you that how many pages are there in the directory.
Disadvantage:
 If the number you are searching for is the last number of directory
or it is not in the directory at all.
 In that case you have to search the whole directory.
Features of Linear Search Algorithm
It is used for unsorted and unordered small list of
elements.
It has a time complexity of O(n), which means the time is
linearly dependent on the number of elements, which is
not bad, but not that good too.
It has a very simple implementation.
Linear Search is suitable if the list contains few elements.

https://www.cs.usfca.edu/~galles/visualization/Search.html
Linear Search ( Array A, Value x)
Step 1: Set i to 1
Step 2: if i > n then go to step 7
Step 3: if A[i] == x then go to step 6
Step 4: Set i to i + 1
Step 5: Go to Step 2
Step 6: Print “Element x Found at index i” and go to step 8
Step 7: Print “Element not found”
Step 8: Exit
Implementing Linear Search
int linearSearch(int a[], int n, int val) {
for (int i = 0; i < n; i++) // Going through array sequentially
{
if (a[i] == val)
return i+1;
}
return -1;
}
int main() {
int a[] = {70, 40, 30, 11, 57, 41, 25, 14, 52}; // given array
int val = 41; // value to be searched
int n = sizeof(a) / sizeof(a[0]); // size of array
int loc = linearSearch(a, n, val); // Store result
if (loc == -1)
printf("\nElement is not present in the array");
else
printf("\nElement is present at %d position of array", loc);
return 0;
}
Time and Space Complexity
Best case complexity: O(1)
Worst case complexity: O(n)
Average complexity: O(n)
Space complexity: O(1)
Average performance: O(n/2)
GATE CS 1996
The average number of key comparisons done in a
successful sequential search in a list of length n is:
log n
(n-1)/2
n/2
(n+1)/2
ISRO CS 2011
Number of comparisons required for an unsuccessful search
of an element in a sequential search, organized, fixed
length, symbol table of length L is:
(L+1)/2
L
L/2
L+1
Binary Search Algorithm
Binary Search is applied on the sorted array or list of
large size.
It's time complexity of O(log n) makes it very fast as
compared to other searching algorithms.
The only limitation is that the array or list of elements must
be sorted for the binary search algorithm to work on it.
Implementing Binary Search Algorithm
Following are the steps of implementation of Binary Search:
Step-1: Find the middle index of the list.
Step-2: If the target value is equal to the middle element of the array,
then return the index of the middle element.
Step-3: If not, then compare the middle element with the target value,
 If the target value is greater than the number in the middle index,
then pick the elements to the right of the middle index, and start with
Step-1.
 If the target value is less than the number in the middle index, then
pick the elements to the left of the middle index, and start with Step-
1.
Step-4: When a match is found, return the index of the element
matched.
Step-5: If no match is found, then return -1
Time Complexity
ISRO CS 2017 - May
Which of the following is correct recurrence for worst case
of Binary Search?
T(n) = 2T(n/2) + O(1) and T(1) = T(0) = O(1)
T(n) = T(n-1) + O(1) and T(1) = T(0) = O(1)
T(n) = T(n/2) + O(1) and T(1) = T(0) = O(1)
T(n) = T(n-2) + O(1) and T(1) = T(0) = O(1)
ISRO CS 2014
Suppose there are 11 items in sorted order in an array. How
many searches are required on the average, if binary
search is employed and all searches are successful in
finding the item?
3.46
3.00
3.21
2.81
UGC NET CS 2014 Dec - III
Suppose that we have numbers between 1 and 1000 in a
binary search tree and we want to search for the number
365. Which of the following sequences could not be the
sequence of nodes examined?
4, 254, 403, 400, 332, 346, 399, 365
926, 222, 913, 246, 900, 260, 364, 365
927, 204,913, 242, 914, 247, 365
4, 401, 389, 221, 268, 384, 383, 280, 365
Indexed Sequential Search
 Index search is special search. This search method is used to
search a record in a file. Searching a record refers to the
searching of location loc in memory where the file is stored.
Indexed search searches the record with a given key value
relative to a primary key field. This search method is
accomplished by the use of pointers.
 Index helps to locate a particular record with less time. Indexed
sequential files use the principal of index creation.
 For Example, Directory, book etc. In this type of file
organization, the records are organized in a sequence and an
index table is used to speed up the access to records without
searching the entire file. The records of the file may be sorted in
random sequence but the index table is in sorted sequence on the
key field. Thus the file can be processed randomly as well as
sequentially.
Advantage
More efficient.
Time required is less.
ISRO CS 2016
A clustering index is defined on the fields which are of
type:
non-key and ordering
non-key and non-ordering
key and ordering
key and non-ordering
GATE-CS-2015 (Set 1)
A file is organized so that the ordering of data records is
the same as or close to the ordering of data entries in some
index. Then that index is called:
Dense
Sparse
Clustered
Unclustered
Hashing Techniques
In data structures,
There are several searching techniques like linear search,
binary search, search trees etc.
In these techniques, time taken to search any particular
element depends on the total number of elements.

The main drawback of these techniques is-


As the number of elements increases, time taken to
perform the search also increases.
This becomes problematic when total number of elements
becomes too large.
What is Hashing?
Hashing is the process of mapping large amount of data
item to smaller table with the help of hashing function.
Hashing is also known as Hashing
Algorithm or Message Digest Function.
It is a technique to convert a range of key values into a
range of indexes of an array.
It is used to facilitate the next level searching method
when compared with the linear or binary search.
Hashing allows to update and retrieve any data entry in a
constant time O(1).
Constant time O(1) means the operation does not depend
on the size of the data.
Hashing is used with a database to enable items to be
retrieved more quickly.
It is used in the encryption and decryption of digital
signatures.
Hash Function
A fixed process converts a key to a hash value is known as
a Hash Function.
This function takes a key and maps it to a value of a
certain length which is called a Hash value or Hash.
Hash function takes the data item as an input and returns a
small integer value as an output called hash value.
Hash value of the data item is then used as an index for
storing it into the hash table.
Types of Hash Functions
Mid Square Hash Function
Division Hash Function
Folding Hash Function
Properties of Hash Function
The properties of a good hash function are-
It is efficiently computable.
It minimizes the number of collisions.
It distributes the keys uniformly over the table.
Collision Resolution Techniques
Collision Resolution Techniques are the techniques used for
resolving or handling the collision.Collision resolution
techniques are classified as-
Separate Chaining
Open Addressing
Separate Chaining
To handle the collision,
This technique creates a linked list to the slot for which
collision occurs. The new key is then inserted in the linked
list. These linked lists to the slots appear like chains. That is
why, this technique is called as separate chaining.
Time Complexity
For Searching-
In worst case, all the keys might map to the same bucket of
the hash table. In such a case, all the keys will be present in
a single linked list. Sequential search will have to be
performed on the linked list to perform the search. So, time
taken for searching in worst case is O(n).
For Deletion-
In worst case, the key might have to be searched first and then
deleted. In worst case, time taken for searching is O(n). So,
time taken for deletion in worst case is O(n).
Load Factor (α)
Load factor (α) is defined as-

If Load factor (α) = constant, then time complexity of


Insert, Search, Delete = Θ(1)
Example
Using the hash function ‘key mod 7’, insert the
following sequence of keys in the hash table- 50, 700, 76,
85, 92, 73 and 101
Open Addressing
In open addressing,
Unlike separate chaining, all the keys are stored inside the
hash table.
No key is stored outside the hash table.

Techniques used for open addressing are-


Linear Probing
Quadratic Probing
Double Hashing
Operations in Open Addressing
Insert Operation-
 Compute the hash value of the key to be inserted. Hash value is then used as an index to
store the key in the hash table. In case of collision, Probing is performed until an empty
bucket is found. Once an empty bucket is found, the key is inserted. Probing is performed
in accordance with the technique used for open addressing.
Search Operation-
To search any particular key, compute its hash value. Using the hash value, that bucket of
the hash table is checked. If the required key is found, the key is searched. Otherwise,
the subsequent buckets are checked until the required key or an empty bucket is found.
The empty bucket indicates that the key is not present in the hash table.
Delete Operation-
The key is first searched and then deleted. After deleting the key, that particular bucket is
marked as “deleted”.
NOTE-
 During insertion, the buckets marked as “deleted” are treated like any other empty
bucket.
 During searching, the search is not terminated on encountering the bucket marked as
“deleted”.
 The search terminates only after the required key or an empty bucket is found.
Linear Probing
In linear probing,
When collision occurs, we linearly probe for the next
bucket.
We keep probing until an empty bucket is found.
Advantage-
It is easy to compute.
Disadvantage-
The main problem with linear probing is clustering.
Many consecutive elements form groups.
Then, it takes time to search an element or to find an
empty bucket.
Time Complexity-
Worst time to search an element in linear probing is O
(table size). This is because even if there is only one
element present and all other elements are deleted.Then,
“deleted” markers present in the hash table makes search
the entire table.
Quadratic Probing
When collision occurs, we probe for i2‘th bucket in
ith iteration.
We keep probing until an empty bucket is found.
Double Hashing
Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Repeat by increasing i when collision occurs
Comparison of Open Addressing Techniques
Separate Chaining Vs Open Addressing
GATE-CS-2014-(Set-3)
Consider a hash table with 100 slots. Collisions are
resolved using chaining. Assuming simple uniform
hashing, what is the probability that the first 3 slots are
unfilled after the first 3 insertions?
(97 × 97 × 97)/1003
(99 × 98 × 97)/1003
(97 × 96 × 95)/1003
(97 × 96 × 95)/(3! × 1003)
GATE-CS-2015 (Set 2)
Which one of the following hash functions on integers will
distribute keys most uniformly over 10 buckets
numbered 0 to 9 for i ranging from 0 to 2020?
h(i) = (12 ∗ i) mod 10
h(i) = (11 ∗ i2) mod 10
h(i) =i3 mod 10
h(i) =i2 mod 10
GATE-CS-2015 (Set 3)
Given a hash table T with 25 slots that stores 2000
elements, the load factor α for T is:
800
80
8000
1.25
ISRO CS 2014
Consider a 13 element hash table for which f(key)=key mod
13 is used with integer keys. Assuming linear probing is
used for collision resolution, at which location would the
key 103 be inserted, if the keys 661, 182, 24 and 103 are
inserted in that order?
0
1
11
12

You might also like