Professional Documents
Culture Documents
Sorting and Searching: Objective and properties of different sorting algorithms: Selection
Sort, Insertion Sort, Quick Sort, Merge Sort, Heap Sort, Linear and Binary Search
algorithms, Hashing (linear probing, random probing, quadratic probing, rehashing, double
hashing), Dictionaries
Sorting is a process of arranging elements in a certain order. Numeric data may be sorted in
ascending or descending order. Alphabets or strings may be sorted in lexicographical order.
Sorting is a topic that has been extensively researched and investigated in the field of computer
science. Sorting is a fundamental need in many applications. Sorting data makes it easier to
search through a given data sample, efficiently and quickly.
For sorting, there is a large class of algorithms available. All have limitations and are limited
to a certain application domain. There is no algorithm that can fulfill all objectives. However,
the most often used measure for selecting the optimal algorithm is running time.
▪ In place: Only require constant additional space to sort the algorithm. Sometimes
O(log n) space is allowed.
▪ Stable: Does not alter the relative position of same elements after sorting
▪ Online: Sort the data as it arrives
▪ Adaptive: Performance of algorithm varies with the input sequence
▪ Incremental: Build sorted sequence one number at a time
The complexity of a sorting algorithm measures the running time of a function with ‘n’ items
to sort. The decision of which sorting technique is appropriate for a problem is determined by
several dependency configurations for various problems. The following are the most important
considerations:
• The amount of time taken by a programmer in developing a certain sorting program.
• The amount of machine time required to run the program
• The amount of memory required to run the program
The complexity of algorithm is determined based on how many comparison it does to sort given
data. Sorting algorithms are sensitive to arrangement of input data.
• Best case
• Worst case
• Average case
In comparison based sorting methods, data are sorted by comparing two data elements. The
comparator function is defined to compare and sort the data. Example: Selection sort, Bubble
sort, Insertion sort
These sorting algorithms make no comparisons between elements and instead rely on
calculated assumptions during execution. Example: Radix sort, Bucket Sort, Counting sort
In data structures, in-place sorting algorithms change the ordering of array items within the
original array. Not-in-Place sorting methods, on the other hand, sort the original array using an
auxiliary data structure. Examples of in-place algorithms are Quick sort, Insertion sort,
Selection sort. Examples of not in-place algorithm is Merge sort
Selection sort is conceptually the simplest sorting algorithm. This algorithm will first find the
smallest element in the array and swap it with the element in the first position, then it will find
the second smallest element and swap it with the element in the second position, and it will
keep on doing this until the entire array is sorted. It is called selection sort because it repeatedly
selects the next-smallest element and swaps it into the right place.
Step 1: Select the first element of the list (i.e., Element at first position in the list).
Step 2: Compare the selected element with all the other elements in the list.
Step 3: In every comparision, if any element is found smaller than the selected element (for
Ascending order), then both are swapped.
Step 4: Repeat the same procedure with element in the next position in the list till the entire
list is sorted.
Example:
Complexity of the Selection Sort Algorithm
To sort an unsorted list with 'n' number of elements, we need to make ((n-1)+(n-2)+(n-
3)+......+1) = (n (n-1))/2 number of comparisons in the worst case. If the list is already sorted
then it requires 'n' number of comparisons.
Insertion sort algorithm arranges a list of elements in a particular order. In insertion sort
algorithm, every iteration moves an element from unsorted portion to sorted portion until all
the elements are sorted in the list.
• It is efficient for smaller data sets, but very inefficient for larger lists.
• Insertion Sort is adaptive, that means it reduces its total number of steps if a partially
sorted array is provided as input, making it efficient.
• It is better than Selection Sort and Bubble Sort algorithms.
Step 1: Assume that first element in the list is in sorted portion and all the remaining elements
are in unsorted portion.
Step 2: Take first element from the unsorted portion and insert that element into the sorted
portion in the order specified.
Step 3: Repeat the above process until all the elements from the unsorted portion are moved
into the sorted portion.
Example
Complexity of the Insertion Sort Algorithm
To sort an unsorted list with 'n' number of elements, we need to make (1+2+3+......+n-1) = (n
(n-1))/2 number of comparisions in the worst case. If the list is already sorted then it requires
'n' number of comparisions.
Quick Sort is one of the different Sorting Technique which is based on the concept of Divide
and Conquer, just like merge sort. But in quick sort all the heavy lifting (major work) is done
while dividing the array into subarrays, while in case of merge sort, all the real work happens
during merging the subarrays. In case of quick sort, the combine step does absolutely nothing.
It is also called partition-exchange sort. This algorithm divides the list into three main parts:
Pivot element can be any element from the array, it can be the first element, the last element
or any random element. In this tutorial, we will take the rightmost element or the last element
as pivot.
temp = list[pivot];
list[pivot] = list[j];
list[j] = temp;
quickSort(list,first,j-1);
quickSort(list,j+1,last);
}
}
Example
Below, we have a pictorial representation of how quick sort will sort the given array.
In step 1, we select the last element as the pivot, which is 6 in this case, and call for partitioning,
hence re-arranging the array in such a way that 6 will be placed in its final position and to its
left will be all the elements less than it and to its right, we will have all the elements greater
than it.
Then we pick the subarray on the left and the subarray on the right and select a pivot for them,
in the above diagram, we chose 3 as pivot for the left subarray and 11 as pivot for the right
subarray.
To sort an unsorted list with 'n' number of elements, we need to make ((n-1)+(n-2)+(n-
3)+......+1) = (n (n-1))/2 number of comparisons in the worst case. If the list is already sorted,
then it requires 'n' number of comparisons.
In Merge Sort, the given unsorted array with n elements, is divided into n subarrays, each
having one element, because a single element is always sorted in itself. Then, it repeatedly
merges these subarrays, to produce new sorted subarrays, and in the end, one complete sorted
array is produced.
Step 1:Split the given list into two halves (roughly equal halves in case of a list with an odd
number of elements).
Step 2:Continue dividing the subarrays in the same manner until you are left with only single
element arrays.
Step 3:Starting with the single element arrays, merge the subarrays so that each merged
subarray is sorted.
Step 4:Repeat step 3 unit with end up with a single sorted array.
Example
A heap is a complete binary tree, and the binary tree is a tree in which the node can have the
utmost two children. A complete binary tree is a binary tree in which all the levels except the
last level, i.e., leaf node, should be completely filled, and all the nodes should be left-justified.
Heapsort is a popular and efficient sorting algorithm. The concept of heap sort is to eliminate
the elements one by one from the heap part of the list, and then insert them into the sorted part
of the list.
In heap sort, basically, there are two phases involved in the sorting of elements. By using the
heap sort algorithm, they are as follows -
Step 1:The first step includes the creation of a heap by adjusting the elements of the array.
Step 2:After the creation of heap, now remove the root element of the heap repeatedly by
shifting it to the end of the array, and then store the heap structure with the remaining
elements.
Example
let's take an unsorted array and try to sort it using heap sort
First, we have to construct a heap from the given array and convert it into max heap.
After converting the given heap into max heap, the array elements are -
Next, we have to delete the root element (89) from the max heap. To delete this node, we have
to swap it with the last node, i.e. (11). After deleting the root element, we again have to heapify
it to convert it into max heap.
After swapping the array element 89 with 11, and converting the heap into max-heap, the
elements of array are -
In the next step, again, we have to delete the root element (81) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (54). After deleting the root element, we
again have to heapify it to convert it into max heap.
After swapping the array element 81 with 54 and converting the heap into max-heap, the
elements of array are -
In the next step, we have to delete the root element (76) from the max heap again. To delete
this node, we have to swap it with the last node, i.e. (9). After deleting the root element, we
again have to heapify it to convert it into max heap.
After swapping the array element 76 with 9 and converting the heap into max-heap, the
elements of array are -
In the next step, again we have to delete the root element (54) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (14). After deleting the root element, we
again have to heapify it to convert it into max heap.
After swapping the array element 54 with 14 and converting the heap into max-heap, the
elements of array are -
In the next step, again we have to delete the root element (22) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (11). After deleting the root element, we
again have to heapify it to convert it into max heap.
After swapping the array element 22 with 11 and converting the heap into max-heap, the
elements of array are -
In the next step, again we have to delete the root element (14) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (9). After deleting the root element, we
again have to heapify it to convert it into max heap.
After swapping the array element 14 with 9 and converting the heap into max-heap, the
elements of array are -
In the next step, again we have to delete the root element (11) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (9). After deleting the root element, we
again have to heapify it to convert it into max heap.
After swapping the array element 11 with 9, the elements of array are -
Now, heap has only one element left. After deleting it, heap will be empty.
To sort an unsorted list with 'n' number of elements, following are the complexities...
To understand the working of linear search algorithm, let's take an unsorted array.
Let the elements of array are -
The value of K, i.e., 41, is not matched with the first element of the array. So, move to the next
element. And follow the same process until the respective element is found.
Now, the element to be searched is found. So algorithm will return the index of the element
matched.
Time Complexity:
Binary search is the search technique that works efficiently on sorted lists. Hence, to search an
element into some list using the binary search technique, we must ensure that the list is
sorted.
Binary search follows the divide and conquer approach in which the list is divided into two
halves, and the item is compared with the middle element of the list. If the match is found then,
the location of the middle element is returned. Otherwise, we search into either of the halves
depending upon the result produced through the match.
if(values[guess] == target)
{
printf("Number of steps required for search: %d \n", step);
return guess;
}
else if(values[guess] > target)
{
// target would be in the left half
max = (guess - 1);
}
else
{
// target would be in the right half
min = (guess + 1);
}
}
// We reach here when element is not
// present in array
return -1;
}
Example:
Space Complexity:
Hashing is a technique that is used for storing and extracting information in a faster way. It
helps to perform searching in an optimal way. Hashing is used in databases, encryptions,
symbol tables, etc.
Hashing is needed to execute the search, insert, and deletions in constant time on an average.
In our other data structures like an array, linked list the above operations take linear time, O(n).
The best case can be a self-balanced tree-like AVL tree, where the time complexity is of order
O(logn). But Hashing allows us to perform the operations in constant time, O(1) on an average.
Component of hashing:
• Hash table
• Hash functions
• Collisions
• Collision resolution techniques
The Hash table data structure stores elements in key-value pairs where
A hash function is used for mapping each element of a dataset to indexes in the table. Hash
functions convert a key into the index of the hash table (location). A hash function should
generate unique locations, but that's difficult to achieve since the number of indexes is much
less than the number of keys. We often lead to the collision while using a hash function which
is not perfect.
A good hash function may not prevent the collisions completely however it can reduce the
number of collisions. Here, we will look into different methods to find a good hash function
a. Division Method
If k is a key and m is the size of the hash table, the hash function h() is calculated as:
h(k) = k mod m
For example, If the size of a hash table is 10 and k = 112 then h(k) = 112 mod 10 = 2. The
value of m must not be the powers of 2. This is because the powers of 2 in binary format are 10,
100, 1000, …. When we find k mod m, we will always get the lower order p-bits.
if m = 22, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 01
if m = 23, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 001
if m = 24, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 0001
if m = 2p, then h(k) = p lower bits of m
b. Multiplication Method
h(k) = ⌊m(kA mod 1)⌋
where,
• kA mod 1 gives the fractional part kA,
• ⌊ ⌋ gives the floor value
• A is any constant. The value of A lies between 0 and 1. But, an optimal choice will be ≈
(√5-1)/2 suggested by Knuth.
c. Universal Hashing
In Universal hashing, the hash function is chosen at random independent of keys.
5.9.3 Collisions
When the two different values have the same value, then the problem occurs between the two
values, known as a collision. In the above example, the value is stored at index 6. If the key
value is 26, then the index would be:
h(26) = 26%10 = 6
Therefore, two values are stored at the same index, i.e., 6, and this leads to the collision
problem. To resolve these collisions, we have some techniques known as collision techniques.
5.9.4 Collision resolution techniques
Open Hashing
In Open Hashing, one of the methods used to resolve the collision is known as a chaining
method.
The value 11 would be stored at the index 5. Now, we have two values (6, 11) stored at the
same index, i.e., 5. This leads to the collision problem, so we will use the chaining method to
avoid the collision. We will create one more list and add the value 11 to this list. After the
creation of the new list, the newly created list will be linked to the list having value 6.
The value 13 would be stored at index 9. Now, we have two values (3, 13) stored at the same
index, i.e., 9. This leads to the collision problem, so we will use the chaining method to avoid
the collision. We will create one more list and add the value 13 to this list. After the creation
of the new list, the newly created list will be linked to the list having value 3.
The value 7 would be stored at index 7. Now, we have two values (2, 7) stored at the same
index, i.e., 7. This leads to the collision problem, so we will use the chaining method to avoid
the collision. We will create one more list and add the value 7 to this list. After the creation of
the new list, the newly created list will be linked to the list having value 2.
According to the above calculation, the value 12 must be stored at index 7, but the value 2
exists at index 7. So, we will create a new list and add 12 to the list. The newly created list will
be linked to the list having a value 7.
The calculated index value associated with each key value is shown in the below table:
key Location(u)
3 ((2*3)+3)%10 = 9
2 ((2*2)+3)%10 = 7
9 ((2*9)+3)%10 = 1
6 ((2*6)+3)%10 = 5
11 ((2*11)+3)%10 = 5
13 ((2*13)+3)%10 = 9
7 ((2*7)+3)%10 = 7
12 ((2*12)+3)%10 = 7
Closed Hashing
1. Linear probing
2. Quadratic probing
3. Double Hashing technique
Linear Probing
Linear probing is one of the forms of open addressing. As we know that each cell in the hash
table contains a key-value pair, so when the collision occurs by mapping a new key to the cell
already occupied by another key, then linear probing technique searches for the closest free
locations and adds a new key to that empty cell. In this case, searching is performed
sequentially, starting from the position where the collision occurs till the empty cell is not
found.
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5 respectively. The calculated index
value of 11 is 5 which is already occupied by another key value, i.e., 6. When linear probing is
applied, the nearest empty cell to the index 5 is 6; therefore, the value 11 will be added at the
index 6.
The next key value is 13. The index value associated with this key value is 9 when hash function
is applied. The cell is already filled at index 9. When linear probing is applied, the nearest
empty cell to the index 9 is 0; therefore, the value 13 will be added at the index 0.
The next key value is 7. The index value associated with the key value is 7 when hash function
is applied. The cell is already filled at index 7. When linear probing is applied, the nearest
empty cell to the index 7 is 8; therefore, the value 7 will be added at the index 8.
The next key value is 12. The index value associated with the key value is 7 when hash function
is applied. The cell is already filled at index 7. When linear probing is applied, the nearest
empty cell to the index 7 is 2; therefore, the value 12 will be added at the index 2.
Quadratic Probing
It can also be defined as that it allows the insertion ki at first free location from (u+i2)%m
where i=0 to m-1.
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5, respectively. We do not need to
apply the quadratic probing technique on these key values as there is no occurrence of the
collision.
The index value of 11 is 5, but this location is already occupied by the 6. So, we apply the
quadratic probing technique.
When i = 0
Index= (5+02)%10 = 5
When i=1
Index = (5+12)%10 = 6
The next element is 13. When the hash function is applied on 13, then the index value comes
out to be 9, which we already discussed in the chaining method. At index 9, the cell is occupied
by another value, i.e., 3. So, we will apply the quadratic probing technique to calculate the free
location.
When i=0
Index = (9+02)%10 = 9
When i=1
Index = (9+12)%10 = 0
Since location 0 is empty, so the value 13 will be added at the index 0.
The next element is 7. When the hash function is applied on 7, then the index value comes out
to be 7, which we already discussed in the chaining method. At index 7, the cell is occupied by
another value, i.e., 7. So, we will apply the quadratic probing technique to calculate the free
location.
When i=0
Index = (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
Since location 8 is empty, so the value 7 will be added at the index 8.
The next element is 12. When the hash function is applied on 12, then the index value comes
out to be 7. When we observe the hash table then we will get to know that the cell at index 7 is
already occupied by the value 2. So, we apply the Quadratic probing technique on 12 to
determine the free location.
When i=0
Index= (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
When i=2
Index = (7+22)%10 = 1
When i=3
Index = (7+32)%10 = 6
When i=4
Index = (7+42)%10 = 3
Since the location 3 is empty, so the value 12 would be stored at the index 3.
Double Hashing
Double hashing is an open addressing technique which is used to avoid the collisions. When
the collision occurs then this technique uses the secondary hash of the key. It uses one hash
value as an index to move forward until the empty location is found.
In double hashing, two hash functions are used. Suppose h1(k) is one of the hash functions used
to calculate the locations whereas h2(k) is another hash function. It can be defined as "insert
ki at first free place from (u+v*i)%m where i=(0 to m-1)". In this case, u is the location
computed using the hash function and v is equal to (h2(k)%m).
h1(k) = 2k+3
h2(k) = 3k+1
As we know that no collision would occur while inserting the keys (3, 2, 9, 6), so we will not
apply double hashing on these key values.
On inserting the key 11 in a hash table, collision will occur because the calculated index value
of 11 is 5 which is already occupied by some another value. Therefore, we will apply the double
hashing technique on key 11. When the key value is 11, the value of v is 4.
The next element is 13. The calculated index value of 13 is 9 which is already occupied by
some another key value. So, we will use double hashing technique to find the free location.
The value of v is 0.
The next element is 7. The calculated index value of 7 is 7 which is already occupied by some
another key value. So, we will use double hashing technique to find the free location. The value
of v is 2.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (7 + 2*0)%10 = 7
When i=1
Index = (7+2*1)%10 = 9
When i=2
Index = (7+2*2)%10 = 1
When i=3
Index = (7+2*3)%10 = 3
When i=4
Index = (7+2*4)%10 = 5
When i=5
Index = (7+2*5)%10 = 7
When i=6
Index = (7+2*6)%10 = 9
When i=7
Index = (7+2*7)%10 = 1
When i=8
Index = (7+2*8)%10 = 3
When i=9
Index = (7+2*9)%10 = 5
Since we checked all the cases of i (from 0 to 9), but we do not find suitable place to insert 7.
Therefore, key 7 cannot be inserted in a hash table.
The next element is 12. The calculated index value of 12 is 7 which is already occupied by
some another key value. So, we will use double hashing technique to find the free location.
The value of v is 7.
For example, the results of a classroom test could be represented as a dictionary with student's
names as keys and their scores as the values:
The various operations that are performed on a Dictionary or associative array are:
• Add or Insert: In the Add or Insert operation, a new pair of keys and values is added
in the Dictionary or associative array object.
• Replace or reassign: In the Replace or reassign operation, the already existing value
that is associated with a key is changed or modified. In other words, a new value is
mapped to an already existing key.
• Delete or remove: In the Delete or remove operation, the already present element is
unmapped from the Dictionary or associative array object.
• Find or Lookup: In the Find or Lookup operation, the value associated with a key is
searched by passing the key as a search argument.
5.11 Graph
Graph is a non-linear data structure. Graph is a collection of nodes (or vertices) and edges (or
arcs) in which nodes are connected with edges. Generally, a graph G is represented as G = ( V
, E ), where V is set of vertices and E is set of edges.
Example
a) Vertex
Individual data element of a graph is called as Vertex. Vertex is also known as node. In above
example graph, A, B, C, D & E are known as vertices.
b) Edge
An edge is a connecting link between two vertices. Edge is also known as Arc. An edge is
represented as (startingVertex, endingVertex). For example, in above graph the link between
vertices A and B is represented as (A,B). In above example graph, there are 7 edges (i.e., (A,B),
(A,C), (A,D), (B,D), (B,E), (C,D), (D,E)).
Edges are three types.
1. Undirected Edge - An undirected egde is a bidirectional edge. If there is undirected
edge between vertices A and B then edge (A , B) is equal to edge (B , A).
2. Directed Edge - A directed egde is a unidirectional edge. If there is directed edge
between vertices A and B then edge (A , B) is not equal to edge (B , A).
3. Weighted Edge - A weighted egde is a edge with value (cost) on it.
c) Undirected Graph
A graph with only undirected edges is said to be undirected graph.
d) Directed Graph
A graph with only directed edges is said to be directed graph.
e) Mixed Graph
A graph with both undirected and directed edges is said to be mixed graph.
f) Adjacent
If there is an edge between vertices A and B then both A and B are said to be adjacent. In other
words, vertices A and B are said to be adjacent if there is an edge between them.
g) Outgoing Edge
A directed edge is said to be outgoing edge on its origin vertex.
h) Incoming Edge
A directed edge is said to be incoming edge on its destination vertex.
i) Degree
Total number of edges connected to a vertex is said to be degree of that vertex.
j) Indegree
Total number of incoming edges connected to a vertex is said to be indegree of that vertex.
k) Outdegree
Total number of outgoing edges connected to a vertex is said to be outdegree of that vertex.
l) Self-loop
Edge (undirected or directed) is a self-loop if its two endpoints coincide with each other.
m) Path
A path is a sequence of alternate vertices and edges that starts at a vertex and ends at other
vertex such that each edge is incident to its predecessor and successor vertex.
1. Adjacency Matrix
2. Incidence Matrix
3. Adjacency List
Adjacency Matrix
In this representation, the graph is represented using a matrix of size total number of vertices
by a total number of vertices. That means a graph with V vertices is represented using a matrix
of size VxV. In this matrix, both rows and columns represent vertices. This matrix is filled with
either 1 or 0. Here, 1 represents that there is an edge from row vertex to column vertex and 0
represents that there is no edge from row vertex to column vertex.
Incidence Matrix
In this representation, the graph is represented using a matrix of size total number of vertices
by a total number of edges. That means graph with 4 vertices and 6 edges is represented using
a matrix of size 4X6. In this matrix, rows represent vertices and columns represents edges. This
matrix is filled with 0 or 1 or -1. Here, 0 represents that the row edge is not connected to column
vertex, 1 represents that the row edge is connected as the outgoing edge to column vertex and
-1 represents that the row edge is connected as the incoming edge to column vertex.
In this representation, every vertex of a graph contains list of its adjacent vertices.
For example, consider the following directed graph representation implemented using linked
list:
Graph traversal is a technique used for a searching vertex in a graph. The graph traversal is
also used to decide the order of vertices is visited in the search process. A graph traversal finds
the edges to be used in the search process without creating loops. That means using graph
traversal we visit all the vertices of the graph without getting into looping path.
There are two graph traversal techniques and they are as follows...
DFS traversal of a graph produces a spanning tree as final result. Spanning Tree is a graph
without loops. We use Stack data structure with maximum size of total number of vertices in
the graph to implement DFS traversal.
Back tracking is coming back to the vertex from which we reached the current vertex.
Example
BFS (Breadth First Search)
BFS traversal of a graph produces a spanning tree as final result. Spanning Tree is a graph
without loops. We use Queue data structure with maximum size of total number of vertices
in the graph to implement BFS traversal.
Example
Important Questions