You are on page 1of 43

Advanced Data Structures

Unit I
1.1 Definitions of an Algorithm
An algorithm is a step by step procedure, which defines a set of instructions to be
executed in a certain order to get the desired output.
(Or)
An algorithm is a finite set of instructions which, if followed accomplish a particular task.
1.2 Characteristics of an Algorithm
In addition, every algorithm musty satisfies the following criteria:

Input: It takes zero or more inputs.

Output: It produces zero or more outputs. At least one output must be produced.

Definiteness: Algorithm should be clear. The good algorithm can be easily


understandable by non-programmer.

Finiteness: All the operations can be carried out with a finite number of steps.

Effectiveness: The algorithm should be very efficient. The good algorithm occupies less
memory space as much as possible.

Unambiguous Algorithm should be clear and unambiguous. Each of its steps (or
phases), and their input/outputs should be clear and must lead to only one meaning.
Algorithms are generally created independently of underlying languages, i.e. an
algorithm can be implemented in more than one programming language.

1.3 Analysis of an Algorithm


There are two primary methods for analyzing algorithms formally
i) Correctness: The primary method of validity for algorithms is using a proof of
correctness.
This is corrected through verification of the algorithm accomplishing its purpose
formally, and terminating in k finite steps.
ii) Complexity: An algorithm may be analyzed in the context of its complexity.
Asymptotic analysis of an algorithm refers to defining the mathematical
foundation/framing of its un-time performance. Using asymptotic analysis, we can very well
conclude the best case, average case and worst case scenario of an algorithm
This is typically via measures of run-times using big-O, or big-Omega, or big-theta
notation.

Complexity can be through 3 main types of analysis

Average-Case Analysis: Average time required for program execution.

Best-Case Analysis: Minimum time required for program execution.

Worst-Case Analysis: Maximum time required for program execution.


1.4 Asymptotic Notations
The following are commonly used asymptotic notations used in calculating running time
complexity of an algorithm.
Notation
Notation
Notation
Big Oh Notation,
The n is the formal way to express the upper bound of an algorithm's running time. It
measures the worst case time complexity or longest amount of time an algorithm can possibly
take to complete. It is asymptotic upper bound.

O(g(n)) = {f(n) : positive constants c and n0, such that n n0, we have 0 f(n) cg(n) }
Omega Notation,
The n is the formal way to express the lower bound of an algorithm's running time. It measures
the best case time complexity or best amount of time an algorithm can possibly take to complete

(g(n)) = {f(n) : positive constants c and n0, such that n n0, we have 0 cg(n) f(n)}
Theta Notation,
The n is the formal way to express both the lower bound and upper bound of an
algorithm's running time. It is represented as follows

(g(n)) = {f(n) : positive constants c1, c2, and n0, such that n n0, we have 0 c1g(n) f(n)
c2g(n)}
Relations between , and O
Theorem: For any two function g(n) and f(n),
f(n) = (g(n)) Iff

f(n) = O(g(n)) and f(n) = (g(n)).


i.e. (g(n)) = O(g(n))

(g(n))

1.5 Definitions of Data structure


Data Structure is a way to organize data in such a way that it can be used efficiently.
That is, organized collection of data is called a data structure.
(Or)
The data structure is a specialized format for organizing and storing data.
Data is a set of elementary items. The possible ways in which the data items are logically
related is defined by the data structure.
The programs have to follow a certain rules to access and process the structure data. And
so,
Data structure = Organized data + allowed Operation.
From data structure point of view, the data in the data structures are processed by the below
mentioned operations are:

Searching Algorithm to search an item in a data structure.

Sorting Algorithm to sort items in a certain order.

Insertion Algorithm to insert a new item in a data structure.

Updating Algorithm to update an existing item in a data structure.

Deletion Algorithm to delete an existing item from a data structure.

Traversing Algorithm to visit each item in a data structure at least once.

Merging- Algorithm to merge one item with another in a data structure.

1.6 Classifications of Data structures

Classification of sorting
a. External sorting
b. Internal sorting
c. Stable Sorting

External sorting
External sorting is a process of sorting in which large blocks of datas are stored
in storage devices are moved to the main memory and then sorted. I.e. a sorting can be
external if the records that it is sorting are in auxiliary storage.

Internal sorting
Internal sorting is a process of sorting the datas in the main memory. I.e. a sorting
can be internal if the records that it is sorting are in main memory.

Stable sort
A sorting technique is called stable if for all records i and j such that k[i] equals
k[j], if r[i] precedes r[j] in the original file, r[i] precedes r[j] in the sorted file. That is, a
stable sort keeps records with the same key in the same relative order that they were in
before the sort.

1.6.1 Insertion Sort (also called as Straight Insertion Sort)

One of the simplest sorting algorithms is the insertion sort.

If the first few objects are already sorted, an unsorted object can be inserted in the sorted
set in proper place. This is called insertion sort. (Or) It is the one that sort a set of records
by inserting records into an existing sorted file.

With each pass of an insertion sort, one or more pieces of data are inserted into their correct
location in an ordered list.

Instead of inserting an element anywhere in the list and resorting it again, each time when a
new element is encountered, it is inserted in the correct position.

In this sorting, the list is divided into two parts: sorted and unsorted.

In each pass the first element of the unsorted sub-list is transferred to the sorted list by
inserting it at the appropriate place.

It will take at most n-1 passes to sort the data.

In pass p, move the pth element left until its correct place is found among the first p
elements.
Example: Card Players (As they pick up each card, they insert it into the proper sequence in
their hand
Steps
Let A be the array of n numbers.

Our aim is to sort the numbers in ascending order.

Scan the array from A[1] to A[n-1] and find A[R] where R=1,2,3(N-1) and
insert into the proper position previously sorted sub-array, A[1],A[2]..A[R-1].

If R=1, the sorted sub-array is empty, so A[1] is sorted itself.

If R=2, A[2] is inserted into the previously sorted sub-array A[1], i.e., A[2] is
inserted either before A[1] or after A[1].

If R=3, A[3] is inserted into the previously sorted sub-array A[1],A[2] i.e., A[3] is
inserted either before A[1] or after A[2] or in between A[1] and A[2].

We can repeat the process for(n-1) times, and finally we get the sorted array.
Wall

last

Sorted

Unsorted
Fig.1.1: Insertion Sort concept

Figure 1 traces the insertion sort. Example: the sorts through a list of six numbers.
Sorting these data requires five sort passes. Each pass moves the wall one element to the right as
an element is removed from the unsorted sub-list and inserted into the sorted list.
Input: Unsorted list 34 |

64

51

32

21

After p = 2

34 |

64

51

32

21

After p = 3

34

64

| 51

32

21

After p = 4

34

51

64 |

32

21

After p = 5

32

34

51

64

After p = 6

21

32

34

51

21
64 |

Algorithm for Insertion sort


Step 1 If it is the first element, it is already sorted. Return 1;
Step 2 Pick next element
Step 3 Compare with all elements in the sorted sub-list
Step 4 Shift all the elements in the sorted sub-list that is greater than the value to be sorted
Step 5 Insert the value
Step 6 Repeat until the list is sorted.
1.6.2 Selection Sort
One of the easiest ways to sort a table is by selection. Beginning with a first record in the
table a search is performed to locate the element which has the small key. When this element is
found, it is interchanged with the first record in the table. This interchange places the record with
the smallest key in the first position of the table.
A search for second smallest key is then carried out. This is accomplished by examining
the keys from second element onward. The element which has the second smallest key is
interchanged with the element located in the second position of the table.
The process of searching for the record with the next smallest key and placing it in its
proper position (within the desired ordering) continues until all the records have been sorted in
ascending order.
A general algorithm for selection sort is

Repeat through step 5 a total of n-1 times.

Record the portion of the vector already stored.

Repeat step 4 for the elements in the unsorted portion of the vector.

Record location of the smallest element in the unsorted vector.

Exchange first element in unsorted vector by smallest element.

The list is divided into two sublists, sorted and unsorted, which are divided by an
imaginary wall.
We find the smallest element from the unsorted sublist and swap it with the element at the
beginning of the unsorted data.
After each selection and swapping, the imaginary wall between the two sublists move one
element ahead, increasing the number of sort elements and decreasing the number of unsorted
ones.
Each time we move one element from the unsorted sublist to the sorted sublist, we say
that we have completed a sort pass.
A list of n elements requires n-1 passes to completely rearrange the data.
Example 1:

23

78

45

32

56

78

45

23

32

56

23

45

78

32

56

23

32

78

45

56

23

32

45

78

56

23

32

45

56

78

Original list
8
After pass 1
8
After pass 2
8
After pass 3
8
After pass 4
8
After pass 5

Example 2:

Algorithm for Selection sort

Step 1 Set MIN to location 0


Step 2 Search the minimum element in the list
Step 3 Swap with value at location MIN
Step 4 Increment MIN to point to the next element
Step 5 Repeat until the list is sorted
The best case, the worst case, and the average case of the selection sort algorithm are

same. All of them are O(n2)

This means that the behavior of the selection sort algorithm does not depend on
the initial organization of data.

Since O(n2) grows, so rapidly, the selection sort algorithm is appropriate only for
small n.

Although the selection sort algorithm requires O(n 2) key comparisons, it only
requires O(n) moves.

A selection sort could be a good choice if data moves are costly, but key
comparisons are not costly (sort keys, long records).

1.6.3 Shell Sort

Shell sort is a highly efficient sorting algorithm and is based on insertion sort algorithm.
This algorithm avoids large shifts as in case of insertion sort if smaller value is very far right and
have to move to far left.
This algorithm uses insertion sort on widely spread elements first to sort them and then
sorts the less widely spaced elements. This spacing is termed as interval. This interval is
calculated based on Knuth's formula as
h=h*3+1
where h is interval with initial value 1
This algorithm is quite efficient for medium sized data sets as its average and worst case
complexity are of O(n) where n are no. of items.
Example:
We take the below example to have an idea, how shell sort works? We take the same
array we have used in our previous examples. For our example and ease of understanding we
take the interval of 4. And make a virtual sublist of all values located at the interval of 4
positions. Here these values are {35, 14}, {33, 19}, {42, 27} and {10, 44}
We compare values in each sub-list and swap them (if necessary) in the original
array. After this step, new array should look like this
Then we take interval of 2 and this gap generates two sublists - {14, 27, 35, 42},
{19, 10, 33, 44}
We compare and swap the values, if required, in the original array. After this step,
this array should look like this {14, 27, 35, 42}, {10, 19, 33, 44}
And finally, we sort the rest of the array using interval of value 1. Shell sort uses insertion
sort to sort the array {10, 14, 19, 27, 33, 35, 42, and 44}.
The step by step depiction is shown below

We see that it required only four swaps to sort the rest of the array.
Example 2:

5 6

9 10

11 1

44 68 191 119 119 37 83 82 191 45 158 130 7


Initial gap = length / 2 = 16 / 2 = 8
Initial sub arrays indices:
{0,

8},

{1,

9},

{2,

10},

{3,

11},

{4,

12},

{5,

6,

10,

13},

{6,

14},

{7,

15}

7,

11,

15}

next gap = 8 / 2 = 4
{0,

4,

8,

12},

{1,

5,

9,

13},

{2,

14},

{3,

next gap = 4 / 2 = 2
{0, 2, 4, 6, 8, 10, 12, 14}, {1, 3, 5, 7, 9, 11, 13, 15}
final gap = 2 / 2 = 1
Algorithm for shell sort
We shall now see the algorithm for shell sort.
Step 1 Initialize the value of h.
Step 2 Divide the list into smaller sub-list of equal interval h.
Step 3 Sort these sub-lists using insertion sort.
Step 3 Repeat until complete list is sorted.
1.6.4 Bubble sort
Another well known sorting method is bubble sort. It differs from the selection sort in
that instead of finding the smallest record and then performing an interchange two records are
interchanged immediately upon discovering that they are of out of order.
When this approach is used there are at most n-1 passes required. During the first pass
k1and k2 are compared, and if they are out of order, then records R1 and R2 are interchanged;

this process is repeated for records R2 and R3, R3 and R4 and so on .this method will cause with
small keys to bubble up.
After the first pass the record with the largest key will be in the nth position. On each
successive pass, the records with the next largest key will be placed in the position n-1, n-2,
n respectively, thereby resulting in a sorted table.
After each pass through the table, a check can be made to determine whether any
interchanges were made during that pass. If no interchanges occurred then the table must be
sorted and no further passes are required.
A general algorithm for bubble sort is

Repeat through step 4 a total of n-1 times.

Repeat step 3 for elements in unsorted portion of the vector.

If the current element in the vector > next element in the vector then exchange elements.

If no exchanges were made then return else reduce the size of the unsorted vector by one.

Example 1:
One pass

Array after completion of each pass

Example 2:

Original list

After pass 1

After pass 2

After pass 3

After pass 4

Algorithm for bubble sort


We assume list is an array of n elements. We further assume that swap function, swaps
the values of given array elements.
begin BubbleSort(list)
for all elements of list
if list[i] > list[i+1]
swap(list[i], list[i+1])
end if
end for
return list
end BubbleSort
Analysis of bubble sort
Number of comparisons (worst case):
(n-1) + (n-2) + ... + 3 + 2 + 1 O(n)
Number of comparisons (best case):
n 1 O(n)
Number of exchanges (worst case):
(n-1) + (n-2) + ... + 3 + 2 + 1 O(n)
Number of exchanges (best case):
0 O(1)
Overall worst case: O(n) + O(n) = O(n)
1.6.5 Quick sort
Like merge sort, Quick sort is also based on the divide-and-conquer paradigm. But it uses
this technique in a somewhat opposite manner, as all the hard work is done before the recursive
calls.
It works as follows:
1. First, it partitions an array into two parts,
2. Then, it sorts the parts independently,
3. Finally, it combines the sorted subsequences by a simple concatenation.

QUICK_SORT(K,LB,UB)
Given a table K of N record, this recursive procedure sorts the table, as previously
described, in ascending order.
A dummy record with key K [N+1] is assumed where K[I]<=K[N+1] for all 1<=<=N .
The Integer parameters LB and UB denote the lower and upper bounds of the current sub
table being processed.
The indices I and J are used to select certain keys during the processing of each sub table.
KEY contains the key value which is being placed in its final position within the sorted sub table.
FLAG is a logical variable which indicates the end of the process that places a record in its final
position.
When FLAG becomes false, the input sub table has been partitioned into two disjointed
parts.
Variables used
K Array to hold elements
LB,UB Denotes the lower and upper bounds of the current sub table
I,J Used to select certain keys during processing
KEY Holds the key value of the final position
FLAG Logical variable to indicate the end of process
The quick-sort algorithm consists of the following three steps:
1. Divide: Partition the list.

To partition the list, we first choose some element from the list for which we hope
about half the elements will come before and half after. Call this element the
pivot.

Then we partition the elements so that all those with values less than the pivot
come in one sublist and all those with greater values come in another.

2. Recursion: Recursively sort the sublists separately.


3. Conquer: Put the sorted sublists together.

Partitioning places the pivot in its correct place position within the array.

Arranging the array elements around the pivot p generates two smaller sorting problems.

Sort the left section of the array, and sort the right section of the array.

When these two smaller sorting problems are solved recursively, our bigger
sorting problem is solved.

First, we have to select a pivot element among the elements of the given array, and we
put this pivot into the first location of the array before partitioning.
Which array item should be selected as pivot?

Somehow we have to select a pivot, and we hope that we will get a good
partitioning.

If the items in the array arranged randomly, we choose a pivot randomly.

We can choose the first or last element as a pivot (it may not give a good
partitioning).

We can use different techniques to select the pivot.

Partition Function
Invariant for the partition algorithm

Initial state of the array

Moving theArray[firstUnknown] into S1 by swapping it with theArray[lastS1+1] and by


incrementing both lastS1 and firstUnknown.

Moving theArray[firstUnknown] into S2 by incrementing firstUnknown

Partition in Quick sort

The pivot value divides the list in to two parts. And recursively we find pivot for each
sub-lists until all lists contains only one element.
Quick Sort Pivot Algorithm
Based on our understanding of partitioning in quick sort, we should now try to write an
algorithm for it here.
Step 1 Choose the highest index value has pivot
Step 2 Take two variables to point left and right of the list excluding pivot
Step 3 left points to the low index
Step 4 right points to the high
Step 5 while value at left is less than pivot move right
Step 6 while value at right is greater than pivot move left
Step 7 if both step 5 and step 6 does not match swap left and right
Step 8 if left right, the point where they met is new pivot
QuickSort Algorithm
Using pivot algorithm recursively we end-up with smaller possible partitions. Each
partition then processed for quick sort. We define recursive algorithm for quicksort as below
Step 1 Make the right-most index value pivot
Step 2 partition the array using pivot value
Step 3 quicksort left partition recursively
Step 4 quicksort right partition recursively

Example for using this pivot function

A worst-case partitioning with quick sort

An average-case partitioning with quick sort

Quick sort analysis

Quicksort is O(n*log2n) in the best case and average case.

Quicksort is slow when the array is sorted and we choose the first element as the pivot.

Although the worst case behavior is not so good, its average case behavior is much better
than its worst case.

So, Quicksort is one of best sorting algorithms using key comparisons.

1.6.6 Merge sort


The operation of sorting is closely related to the processing of merging. This sort
formulates the sorting algorithm based on the successive merges. The approach is used to give
two formulations of merge sort. First one is recursive which is easier and to write and analyze.
The second one is the iterative which is complex.
First let us see the merging of the two ordered tables which can be combined to produce a
single sorted table. This process can be accomplished easily by successively by selecting the
record with the smallest key. Occurring in either of the tables and placing this record in the new
table. There by creating an ordered list.
Merge sort algorithm is one of two important divide-and-conquer sorting algorithms (the
other one is quick sort).
It is a recursive algorithm.

Divides the list into halves,

Sort each halve separately, and

Then merge the sorted halves into one sorted array.

Example 1:

Example 2:

Example 3:

Merge sort Analysis


Merging two sorted arrays of size k

Best-case

All the elements in the first array are smaller (or larger) than all the elements in
the second array.

The number of moves: 2k + 2k

The number of key comparisons: k

Worst-case

The number of moves: 2k + 2k

The number of key comparisons: 2k-1

Levels of recursive calls to merge sort, given an array of eight items

Merge sort is extremely efficient algorithm with respect to time.

Both worst case and average cases are O (n * log2n )

But, merge sort requires an extra array whose size equals to the size of the original array.
If we use a linked list, we do not need an extra array

But, we need space for the links

And, it will be difficult to divide the list into half ( O(n) )

1.6.7 Radix sort


Radix sort algorithm is different than other sorting algorithms that we talked.

It does not use key comparisons to sort an array.

The radix sort

Treats each data item as a character string.

First it groups data items according to their rightmost character, and put these
groups into order with respect to this rightmost character.

Then, combine these groups.

We repeat these grouping and combining operations for all other character
positions in the data items from the rightmost to the leftmost character position.

At the end, the sort operation will be completed.

Given tables of N Records arranged as a linked list, each node in the list consists of a Key
field, and a link field, this procedure performs the sort.
The first node is pointed by a pointer called FIRST the vectors T and B is the pointers to
store the address of the rear and front of the queue. In particular, these denoted as T[I] and B[I] ,
are the top and bottom of the records.
The pointer R is used to denote the current record. Next is the pointer to denote the next
record. Prev pointer is used to combine the queue. D is used to examine the digit.

Variables Used
First Pointer to point the First Node in the table.
TDenotes Top of the queue.
BDenotes the bottom of the Stack.
J Pass Index
PPocket index to point to the temporary table.
R Pointer to store the address of the current record being handled.
NEXTPointer which has the address of the next record in the table.
PREV Pointer to combine the pockets.
DCurrent digit being handled in the current Key field.
Example 1:

mom, dad, god, fat, bad, cat, mad, pat, bar,


him
original list
(dad,god,bad,mad) (mom,him) (bar)
(fat,cat,pat) group strings by rightmost
letter
dad,god,bad,mad,mom,him,bar,fat,cat,pat
combine groups
(dad,bad,mad,bar,fat,cat,pat) (him)
(god,mom)
group strings by middle
letter
dad,bad,mad,bar,fat,cat,pat,him,god,mom
combine groups
(bad,bar) (cat) (dad) (fat) (god) (him)
(mad,mom) (pat) group strings by first
letter

bad,bar,cat,dad,fat,god,him,mad,mom,par
combine groups (SORTED)
Example 2:

Analysis of Radix sort


The radix sort algorithm requires 2*n*d moves to sort n strings of d characters each.
So, Radix Sort is O(n)
Although the radix sort is O(n), it is not appropriate as a general-purpose sorting
algorithm.

Its memory requirement is d * original size of data (because each group should
be big enough to hold the original data collection.)

For example, we need 27 groups to sort string of uppercase letters.

The radix sort is more appropriate for a linked list than an array. (we will not need
the huge memory in this case)

1.7 External sorting


External sorting is a process of sorting in which large blocks of datas are stored in
storage devices are moved to the main memory and then sorted. I.e. a sorting can be external if
the records that it is sorting are in auxiliary storage.
1.7.1 Multi way merge
Multiway merge is the problem of merging m sorted lists into a single sorted list. A 2way merge, used in merge sort, is a special case of this problem.
Input: m sorted lists having n elements in total

Output: A sorted list containing all elements of the m lists


Multiple Merging can be accomplished by performing a simple merge recursively. For
example, if 16 tables are to be sorted then we can first merge them in pairs using procedure
SIMPLE_MERGE.
The result of this step yields eight tables which again are merged impairs to give four
tables.
Finally a single merged table is obtained. In this example, four separate passes are
required to yield a single table. In general K passes are required to sort 2^K tables.
This strategy can be applied to the sorting. Given a table of n records each table is
considered to be set of n tables, each of which contains a single record.
This procedure is initially invoked as
Call TWO_ WAY _MERGE _SORT (K,1,N)
Where N denotes the size of initial sub table to be sorted. Step 2 performs a return if the
size is <+ONE. The third step finds the mid position. Steps 4 and 5 recursively sorts the first and
second sub tables respectively. The last step merges these two tables.
Variables used:
K Array to hold elements
START Starting position of the array
FINISH End position of the array
SIZE Denotes the no of elements in the current sub table
MIDDLE Denotes the position of the middle element.
1. 2-pass multiway sort

Recall the 2-pass multiway sort.

Pass 1:
Graphically

Divide the input file into chunks of M blocks each


Sort each chuck individually using the M buffers.
Write the sorted chunks to disk.

Requirement
The number of chunks(K) <= M-1
Pass 2:
Divide the M buffers into:
- M-1 input buffers
- 1 output buffer
Use the M-1 input buffers to read the K sorted chunks (1 block at a time).
Merge sort the K sorted chunks together into a sorted file using 1 output buffer as
follows:
- Find the record with the smallest sort key among the K buffers.
- Move the record with the smallest sort key to the output buffer.
- When the output buffer is full, then write the output buffer to disk.
- When some input buffer is empty, then read another block from the sorted chunk
if there is more data.

We can use the M buffers to merge sort any number M1 sorted chunks into one larger
(sorted) chunk.

1.8 Searching
1.8.1 Linear/Sequential searching

The simplest search technique is the linear or sequential search. In this technique, we start
at a beginning of a list or table and search for the required record until the desired record is found
or the list is exhausted. This technique is suitable for a table or a linked list or an array and it can
be applied to an unordered list but the efficiency is increased if it is an ordered list.
For any search the total work is reflected by the number of comparisons of keys that
makes in the search. The number of comparisons depends on where the target (value to be
searched) key appears.
If the desired target key is at the first position of the list, only one comparison is required.
If the record is at second position, two comparisons are required. If it is the last position of the
list, n comparisons are compulsory.
If the search is unsuccessful, it makes n comparisons as the target will be compared with
all the entries of the list.
Variables used:
K Array to hold elements
N Total no of elements
X Element to be searched
For example:
Let us take the following example.
10

The target element is 9.


Comparisons 1:
A[0]10
A[1]7
A[2]3

checks the target key with the array element A[0].

A[3]9
A[4]5
The two elements are not equal so the target checks with the next value.
Comparisons 2:
A[0]10
A[1]7

checks the target key with the array element A[1].

A[2]3
A[3]9
A[4]5
The two elements are not equal so the target checks with the next value.
Comparisons 3:
A[0]10
A[1]7
A[2]3

checks the target key with the array element A[2].

A[3]9
A[4]5
The two elements are not equal so the target checks with the next value.
Comparisons 4:
A[0]10
A[1]7
A[2]3

A[3]9

checks the target key with the array element A[3].

A[4]5
The two elements are equal. So the checking is finished. The value is returned.
Analysis of linear /sequential search:
Whether the sequential search carried out on list is implemented as arrays or linked list or
on files. The criterion in performance is the comparison loop. The fewer the number of
comparisons, the sooner the algorithm will terminate.
The fewer possible comparisons = 1. When they require item is the first in the list. The
maximum comparisons = N when the required item is the last item in the list. Thus if required
item is in position I in the list, I comparisons are required.
Hence the average number of comparisons done by sequential search is
1+2+3+. I +..+N/N
= N(N+1)/2*N
= (N+1)/2
Thus sequential search is easy to write and efficient for short lists. It does not require
sorted data.
1.8.2 Binary search
For searching lists with more values linear search is insufficient, binary search helps in
searching larger lists. To search a particular item with value target the approximate middle entry
of the table is located, and its key value is examined.
If its value is higher than the middle value, then the search is made with the elements
after the middle element. If the target value is smaller than the middle element, then the search is
made with the elements before the middle value. This process continues till the required target is
found.
Variables Used

KVector to hold the elements


XTarget element to be searched.
LOW<-points to the lower bound of the vector.
HIGH<-points to the upper bound of the vector.
MIDDLE<-Points to the middle element in the vector.
Example:
Let us apply the algorithm with an example. Suppose array A[ ] contains elements the
following elements.
9

11 17

20

25

Let us search for the element 17.


Is low > high ? NO
Mid= (0+6)/2 = 3.
9

11
[0]

17
[1]

20

[2]

25
[3]

low

30
[4]

33
[5]

[6]

mid

high

Is 17 = = A[3] ? No
17 < A[3], repeat the steps with low =0 and high = mid-1=2
Is low > high ? NO
Mid= (0+2)/2 = 1.
9

11

17

[0]

[1]

low

mid

20
[2]

high

[3]

25

30
[4]

33
[5]

[6]

Is 17 = = A[1] ? No
17 > A[1], repeat the steps with low = mid +1 =2 and high = 2
Is low > high ? NO
Mid= (2+2)/2 = 2.
9

11
[0]

17
[1]

20
[2]

25

[3]

30
[4]

33
[5]

[6]

low
mid
high
Is 17 = = A[2] ? Yes
Return (2).
Let us search for an element that is not in the list. Eg.:10
Is low > high ? NO
Mid= (0+6)/2 = 3.
9

11
[0]

17
[1]

20
[2]

low

25

[3]

30
[4]

33
[5]

[6]

mid

high

Is 10 = = A[3] ? No
10 < A[3], repeat the steps with low =0 and high = mid-1=2
Is low > high ? NO
Mid= (0+2)/2 = 1.

11

17

20

25

30

33

[0]

[1]

[2]

low

mid

high

[3]

[4]

[5]

[6]

Is 10 = = A[1] ? No
10 < A[1], repeat the steps with low = 0 and high = mid -1 = 0
Is low > high ? NO
Mid= (0+0)/2 = 0.
9

11
[0]

17
[1]

20
[2]

25

[3]

30
[4]

[5]

33
[6]

low
mid
high
Is 10 = = A[0] ? No
Repeat the steps with low = mid + 1=1 and high = 0
Is low > high ? Yes
9

11

17

[0]

[1]

high

low

20
[2]

[3]

25

30
[4]

[5]

33
[6]

return (-1).
Analysis of binary search
The binary search method needs no more than [log2n ] +1 comparisons. This implies that
for an array of million entries, only about twenty comparisons will be needed .Contrast this with
the case of sequential search which on the average will need (n+1)/2 comparisons.
1.8.2 Ternary search

A ternary search tree is a special trie data structure where the child nodes of a standard
trie are ordered as a binary search tree.

Search: start at root

Recursively

Compare next character in key with character in node

If less, take left link

If greater, take right link

If equal, take middle and go to next character in key

Miss if encounter null link or reach end of key before NULL digit.

Insert: start at root

Search Find location where prefix diverges

Add new nodes for characters not consumed by search.

Representation

of

ternary

search

trees

Unlike trie(standard) data structure where each node contains 26 pointers for its children,
each

node

in

ternary

search

tree

contains

only

pointers:

1. The left pointer points to the node whose value is less than the value in the current
node.
2. The equal pointer points to the node whose value is equal to the value in the current
node.
3. The right pointer points to the node whose value is greater than the value in the current
node.
Apart from above three pointers, each node has a field to indicate data (character in case
of

dictionary)

and

another

field

to

mark

end

of

string.

So, more or less it is similar to BST which stores data based on some order. However,
data in a ternary search tree is distributed over the nodes. e.g. It needs 4 nodes to store the word
Geek.

Example 1

Example 2

c
/ | \

a
|
t

u
|
t
/ / |
s p e

h
| \
e u
/ |
i s

Example 3

f
o
r

i
s

n
o
w

h
e i
m
e

Search the word the.

A search or insertion in a full TST requires time proportional to the key length. The
number of links in a TST is at most three times the number of characters in all the keys.
Advantages
One of the advantage of using ternary search trees over tries is that ternary search trees
are a more space efficient (involve only three pointers per node as compared to 26 in standard
tries). Further, ternary search trees can be used any time a hash table would be used to store
strings.
Tries are suitable when there is a proper distribution of words over the alphabets so that
spaces are utilized most efficiently. Otherwise ternary search trees are better. Ternary search trees
are efficient to use (in terms of space) when the strings to be stored share a common prefix.
1. Can make more space efficient by

putting keys in leaves at point where prefix is unique, and

Eliminating one-way branching as we did in Patricia Tries.

2. Can compromise speed and space by having large branch at root (R or R 2) and rest of
trie is regular TST.

Works well if first char(s) well-distributed

Nice for practical use

Disadvantage
1. Adapt to non-uniformity often seen
2. Though character set may be large, often only a few are used, or are used after a
particular prefix

Dont make many links we dont need

3. Structured format keys

May have many symbols used

But only a few at each part of key

Search misses are really fast!

4. Can adapt for partial match searches


Dont care characters in search key
5. Can adapt for almost match searches

All but (any) one character match

6. Access bytes or larger symbols rather than bits (like Patricia tries), which are often
better supported/efficient, or more natural to the keys
Applications

of

ternary

search

trees

1. Ternary search trees are efficient for queries like Given a word, find the next word in
dictionary(near-neighbor lookups) or Find all telephone numbers starting with 9342 or typing
few starting characters in a web browser displays all website names with this prefix(Auto
complete feature).
2. Used in spell checks: Ternary search trees can be used as a dictionary to store all the
words. Once the word is typed in an editor, the word can be parallel searched in the ternary
search tree to check for correct spelling.
This search demonstrates three creative applications of the TST:
An English dictionary that matches words as you type and checks spelling.
A flexible array that can assume any size or dimension on the fly.
A database that stores all information in the same place (regardless of which record or
column the information belongs to), thereby decreasing access time and reducing storage
requirements.
Time Complexity: The time complexity of the ternary search tree operations is similar to
that of binary search tree. I.e. the insertion, deletion and search operations take time proportional

to the height of the ternary search tree. The space is proportional to the length of the string to be
stored.

You might also like