Professional Documents
Culture Documents
CHAPTER
1
DICTIONARIES
Syllabus:
Sets, Dictionaries, Hash Tables, Open Hashing, Closed Hashing(Rehashing
Methods), Hashing Functions(Division Method, Multiplication Method, Universal
Hashing), Analysis of Closed Hashing Result (Unsuccessful Search, Insertion,
Successful Search, Deletion), Hash Table Restructuring, Skip Lists, Analysis of Skip
Lists.
10 20 40 50 60 NULL
A
1 8 9 6 4 NULL
5 9 6 7 3 NULL
1 8 9 6 4
5 7 3 NULL
5 9 6 7 3 NULL
9 6 NULL
: - Two sets A and B in the form of linked list are given to find A-B the
difference of B from A.
A
1 8 9 6 4 NULL
5 9 6 7 3 NULL
1 8 4 NULL
: - The two A and B are equal. Therefore, the equality test of two given sets
first check whether the number of elements in both sets are equal or not. If the root
elements in the sets are equal, next we test whether an element in a set is also
present in other set.
Each link of the sparser lists skips over many items of the full list in one step,
hence the structure's name. The skip list is an efficient implementation of
dictionary using sorted chain. This is because in skip list each node consists of
forward of more than one node at time.
3.1 Skip List Creation:
Step 1: First all the data items created in normal order with head and tail nodes.
Step 2: In the total list the middle element is 40, a pointer to middle element is
added then the skip list as follows
For example If we want to search node 50 from above chain there we will require
comparatively less time. This search again can be made efficient if we add few more
pointers of forward references.
Step 3: The Final Skip list is
1. Division Method
2. Mid square method
3. Multiplicative hash function
4. Digit folding
1. Division method:
The Division method returns the remainder after division. The divisor is the table
length.
H(key)=data/table length
For example the elements are 55 67 88 34 is to be placed in the hash table and
the table size is 10
Key = 55 % 10 =5
8
9
Advantages of separate chaining unlimited number of elements, unlimited number
of collisions. Disadvantages of separate chaining overhead of multiple linked lists.
2. Open Addressing-Linear Probing:
One of the simplest rehashing functions is linear probing. Just finding the
next free cell to store in the case of collision occurs.
For example:
The keys are 121,2,4,31,41,64,7,87,9.
a hash function as
H(key) = key%T
Where T is the size of table. The hash table size (T) is 10
The element 121 can be placed at
H(key) = 121%10 = 1
Index 1 will be the home bucket for 121.Continuning in this fashion we will place
2,4 and 9.
Then next key element to be inserted is 87. According to the hash function
H(key) = 87%10
H(key) = 7.
But the index 7 location is already occupied by 7 i.e. collision occurs. Hence we will
apply quadratic probing to insert this record in the hash table.
Hi(key)=(Hash(key)+i2)%k
Consider i=0 then
(87+02)%10 =7
(87+12)%10 =8
(87+22)%10 =1
(87+32)%10 =6
The index position 6 is empty hence we will place the element at index 6.
That means we have to insert the element 64 at 6 places from 4. Then we have to
take 6 jumps. Then 64 will be placed at index 0.
5.Rehashing (Hash Restructuring):
Rehashing is a technique in which the table is resized, i.e., the size of the table is
doubled by creating a new table .it preferable if the total size of table is a prime
number . There are situations in which the rehashing is required-
When table is completely full.
With quadratic probing when the table is filled half.
When insertions fail due to overflow.
Applications of Hashing:
1. In compilers to keep track of declared variables.
2. For online spelling checking the hashing functions are used.
3. Hashing helps in Game playing programs to store the moves made.
4. For browser program while caching the web pages, hashing is used.
#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
typedef struct node
{
int data;
while(1)
{
printf("\nMENU:\n1.Create");
printf("\n2.Search for a value\n3.Delete an value");
printf("\nEnter your choice:");
scanf("%d",&ch);
switch(ch)
{
case 1:printf("\nEnter the number of elements to be inserted:");
scanf("%d",&n);
printf("\nEnter the elements to be inserted:");
for(i=0;i<n;i++)
{
scanf("%d",&num);
insert(num);
}
break;
case 2:printf("\nEnter the element to be searched:");
scanf("%d",&n);
search(n);
break;
case 3:printf("\nEnter the element to be deleted:");
CHAPTER
2
BALANCED
TREES
Syllabus:
AVL Trees: Maximum Height of an AVL Tree, Insertions and Deletions. 2-3 Trees :
Insertion, Deletion.
An AVL tree is a binary tree in which the difference between the height of the right
and left sub tree is never more than one. At the time of insertion of deletion of
element there is scope of violating this rule. In that case the tree must maintain
AVL property by applying restore methodology.
Restructuring requires rotation of the trees. These rotations are two types one
is single rotation and other one right rotation. Each model again comes with left
rotations and right rotations.
Difference between AVL tree and binary search tree, in AVL trees every node
in the re the height of the left and right sub trees can differ by at most 1.
: - There are four different cases when rebalancing is
required after insertion of new node –
1. An insertion of new node into left subtree of left child (LL)
Left-Left Left-Right
(LL rotation) (LR rotation)
Right-Right Right-Left
(RR rotation) (RL rotation)
: - Consider A, B and C are any three sub trees and p, q are two
nodes. The relation between p and q is always p<q.
!"#
In the above diagram if we are inserting element in to sub tree A or sub tree B
considering there is a chance for violating AVL tree property at node q. In this case
p is made to be parent node and q is made to right child of p. Sub tree B is assigned
as left tree to node q. This is called single rotation. In these diagrams either p or q
need not be root nodes.
The following diagram shows this Single rotation applied for above diagram.
So, clearly violation occurred at Node 45. Consider the single rotation principle
according to it p is 41 and q is 45. Sub tree A is at 38 and there no sub trees like B
and C. After applying single rotation operation the new AVL tree is shown in the
following diagram. Here partial sub tree rotation for 45 is shown in first diagram
and complete diagram shown next.
Double Rotation: - Unlike single rotation, in double rotation there are two cases
exists. One is right left double rotation and other one left right double rotation.
Rotations based on the relationship between nodes and balancing depends on type
$ % & # &
In the above diagram if we are inserting element in to sub tree considering there is a
chance for violating AVL tree property at a node. In this case q is made as parent
node for r and p. Node r is added as left node to q and node p is added as right node
to q. Sub trees A and B are assigned to node r, C and D are assigned to node p. In
this q need not be root node.
The following diagram shows this Right-Left double rotation applied for above
diagram.
For example observe the following diagrams that show insertion of an element
without applying double rotation that results for violating AVL balanced property
(inserted element shown with dotted lines)
( & ) !"#
After removing element 17 the new diagram is as follows
Examples:
Construct an AVL tree for
20, 11, 5, 32, 40, 2, 4, 27, 23, 28, 50
if(height_diff > 1)
{
if(get_balance((*p)->left) > 0)
*p = rotate_LL(*p);
else
*p = rotate_LR(*p);
}
else if(height_diff < -1)
{
if(get_balance((*p)->right) < 0)
*p = rotate_RR(*p);
else
*p = rotate_RL(*p);
}
return *p;
}
s* avl_add(s **root,int key)
{
if(*root == NULL)
{
*root = (s*)malloc(sizeof(s));
(*root)->data = key;
(*root)->left = (*root)->right = NULL;
}
else if(key < (*root)->data)
{
(*root)->left = avl_add(&((*root)->left),key);
(*root) = balance_tree(root);
}
else if(key > (*root)->data)
{
(*root)->right = avl_add(&((*root)->right), key);
2. 2-3 Trees: A 2-3 tree is either an empty tree or a single node tree or a tree with
multiple nodes with following properties.
1. Each interior node has either two or three children
2. Each path from root to leaf has the same length
2-3 tree is always height balanced. That is path from root to leaf should have
same length.
Elements
between Key 1
and Key 2
Insertion: To insert an item, find a leaf to put the item in and then split nodes, if
necessary. There are three cases for insertion of a node.
& ! :
i) Left branch
& '! % :
i) Left branch
)* % +,- .
44
20 60 70
11 12 30 50 65 90
/ $ +0
Locate leaf to insert 29
+ $ +1
- $ +2
Locate leaf to insert 27
There is a leaf that contains only 1 data value, insert the value 27
5 $ +5
2 $ +-
1 $ ++
Deletion: When we insert a node, we split nodes. But when we delete some node
we need to merge the nodes.
The deletion operation is carried out using two stages – either remove merge
or remove – redistribute.
20 50 95
11 12 30 48 75 88 97
+ Delete 75. The node 75 is an internal node. Swap it with inorder successor.
The inorder successor will always be in leaf node.
- Delete 97
The 97 is already a leaf, just remove leaf.
3 Delete 88
Swap 88 with inorder successor. Then delete 88
CHAPTER
3
PRIORITY
QUEUES
Syllabus:
Binary Heaps : Implementation of Insert and Delete min, Creating Heap.
Binomial Queues : Binomial Queue Operations, Binomial Amortized Analysis, Lazy
Binomial Queues
Step 1:
Step 2: Now will construct min heap structure. That means a binary tree in which
each parent node is less than its children.
We will start scanning from bottom up manner.
Insertion:
Deletion:
Binomial Queues:
A binomial queue is a forest of heap-ordered trees.
Binomial heap is a collection of binomial trees. Hence let us define binomial trees
first.
Binomial tree: The ith binomial tree, Bi with i>=0 has a root with i children,Bi-1,….,B0.
Four Cases:
Case 1: If degree[m] != degree[next_m] then move the pointers ahead.
The above situation resembles case 3. Hence we will apply transition of case. That
is node 20 will be the child of node 13. Hence we will get.
The above situation resembles case 2. Then according to case 2 we will simply
move the pointers ahead. Hence we will get
CHAPTER
4
GRAPHS
Syllabus:
Operations on Graphs: Vertex insertion, vertex deletion, find vertex, edge addition,
edge deletion, Graph Traversals- Depth First Search and Breadth First Search(Non
recursive) .Graph storage Representation- Adjacency matrix, adjacency lists.
V1 E7
E1
E5
E3 V3
V2
E4 E6
E2
V4 V5
If two nodes are connected by an edge, those two nodes are said to be
adjacent or neighbors. An edge which has direction is called directed edges. The
edge which has no specific direction is called undirected edge.
If a node does not have any adjacent nodes, then it is said to be isolated node.
A graph which contains only isolated node. A graph which contains only isolated
node is called a null graph.
( ) : Graphs are two types
1. Directed graphs 2. Undirected graphs
A directed graph (or diagraph) is a graph in which all edges are directed
edges.
An undirected graph is a graph in which all edges are undirected edges.
If a graph contains both directed and undirected edges, it is said to be mixed
graph.
An edge in a graph which starts and ends on the same node is called a loop.
The degree of a node is the number of edges connected directly to that node, i.e. the
number edges incident on it.
In a directed graph, the indegree of a node is the number of edges beginning
from the node. The outdegree of a node is the number of edges terminating at that
node. The sum of indegree and outdegree of a node is called total degree. The
concept of indegree and outdegree can not apply to undirected graph. A node whose
outdegree is 0 is called source node and a node whose indegree is 0 is called sink
node. For isolated nodes, the degree is 0.
( ' " % ) : - A weighted graph is a graph which consists of weights
along its edges.
A
5 10
C
B
30
20 50
40
D E
+ # :
1. All nodes are initialized to ready state and initialize stack to empty.
2. Begin with any nod which is in ready state and push into stack
Mark the status of that node to waiting.
3. While stack is not empty do
Begin
4. Pop the top node k of stack and process it. Mark the status of that node to
visited.
5. Push all the adjacent nodes of K which are in ready state into stack and mark
the status of those nodes to waiting.
End
6. If the graph still contains nodes which are in ready state then
Go to step 2
7. return
, % + # :-
1. All nodes are initialized as ready states and initialize Queue to empty
2. Begin with any node which is in ready state and put into queue
Mark the status of that node to waiting
3. while queue is not empty do
a. begin
4. delete the first node K from queue and process it. Mark the status of that node
5. Add all the adjacent nodes of K which are in ready state to the rear side of the
queue and mark the status those nodes to waiting
End
6. If the graph still contains nodes which are in ready state then got to step 2
7. return.
' ' ( 7 ,
#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
#define size 20
#define TRUE 1
#define FALSE 0
int g[size][size];
int visit[size];
int Q[size];
int front,rear;
int n;
void create();
void bfs(int);
void main()
{
int v1,v2;
char ans='y';
clrscr();
create();
getch();
do
{
for(v1=0;v1<n;v1++)
visit[v1]=FALSE;
clrscr();
printf("enter the vertex from which you want to traverse");
scanf("%d",&v1);
if(v1>=n)
printf("invalid vertex\n");
else
{
printf("the breadth first search of the graph is \n");
bfs(v1);
getch();
}
printf(" \n do you want to traverse from any other node?");
ans=getche();
}
while(ans=='y');
exit(0);
}
visit[v1]=TRUE;
front=rear=-1;
Q[++rear]=v1;
while(front!=rear)
{
v1=Q[++front];
printf("%d\n ", v1);
for(v2=0;v2<n;v2++)
{
if(g[v1][v2]==TRUE && visit[v2]==FALSE)
{
Q[++rear]=v2;
visit[v2]=TRUE;
}
}
}
}
A B
0 A D
B C
1 B D C
2 C A
D E
3 D C A
4 E C A
V0 0 1 2 3
0 0 1 1 0
1 1 0 0 1
V2 2 1 0 0 1
V1
3 0 1 1 0
V3
V0
0 1 2 3 4
V1 V2
0 0 1 1 0 0
1 1 0 0 1 0
2 1 0 0 1 0
V3 3 0 1 1 0 1
4 0 0 0 1 0
V4
V4
V1 V3
V2
V2
After addition of edge
V0 0 1 2 3
0 0 1 1 1
1 1 0 1 0
V1 V3 2 1 1 0 1
3 1 0 1 0
V2
V2
If we want to delete an edge V2-V3 then
0 1 2 3
V0 0 0 1 1 1
1 1 0 1 0
2 1 1 0 0
V1 V3 3 1 0 0 0
V2
V1 V3
V2
V4
If we search for vertex V4 then the vertices V1, V2 and V3 are neighbouring
vertices.
If the given vertex is not present in the graph then there will not be the
neighbouring vertices. Hence we can simply display a message-“vertex not found”.
- '
*
" *
*
)
void delete_vertex()
{
int i,v;
printf("\n Enter the vertex to be deleted");
scanf("%d",&v);
for(i=0;i<MAX;i++)
{
G[v][i]=0;
G[i][v]=0;
}
printf("\n The vertex is deleted");
}
void find_vertex()
{
int v,i;
int flag=1;
printf("\n Enter the vertex to be searched in the Graph");
scanf("%d",&v);
for(i=0;i<MAX;i++)
{
if(G[v][i]==1)
{
flag=0;
printf("\n Neighbouring vertex is %d",i);
}
}
if(flag==1)
printf("\n Vertex is not present in the Graph");
}
void insert_edge()
{
int v1,v2;
printf("\n Enter the edge to be inserted by v1 & v2");
scanf("%d%d",&v1,&v2);
G[v1][v2]=1;
G[v2][v1]=1;
}
CHAPTER
5
GRAPH
ALGORITHMS
Syllabus:
Minimum-Cost Spanning Trees- Prim's Algorithm, Kruskal's Algorithm Shortest
Path Algorithms: Dijkstra's Algorithm, All Pairs Shortest Paths Problem: Floyd's
Algorithm, Warshall's Algorithm
First create a forest for the above graph. Forest means nodes with out any
edges, the following diagram shows forest and respective initialized matrix.
Representation as follows:
V(Vertex) K (Known) D(Distance) P(Path)
a b
g a 0 0 0
b 0 ∞ 0
d f
c 0 ∞ 0
c
d 0 ∞ 0
e h
e 0 ∞ 0
f 0 ∞ 0
g 0 ∞ 0
h 0 ∞ 0
Step 1: In the above diagram a is considered as source vertex. Values of edges for a
are b and c filled in the matrix.
b 0 34 a
d f
c c 0 12 a
e d 0 ∞ 0
h
e 0 ∞ 0
f 0 ∞ 0
g 0 ∞ 0
h 0 ∞ 0
Step 2: Minimum of edges of a values is 12, so draw edge for a and c. Next c is
processing node. Values of edges for c are d and e filled in the matrix.
f 0 ∞ 0
g 0 ∞ 0
h 0 ∞ 0
Step 3: Minimum of edges of c is 15, so draw edge for c and d. Next d is processing
node. Values of edges for d are b, e and f filled in the matrix. Current value of b
(34) is greater than weight of edge for b and d(7). So, update the distance of b to 7
and path to d.
f 0 10 d
g 0 ∞ 0
h 0 ∞ 0
15 b 1 7 d
d f
c c 1 12 a
e d 1 15 c
h
e 0 13 d
f 0 10 d
g 0 44 b
h 0 ∞ 0
Step 5: Next selected edge 10 of d and f, so draw edge between d and f. Now f is the
processing node. Values of edges of f are g and h entered in to matrix.
f 1 10 d
g 0 5 h
h 1 8 f
Step 7: Next minimum value is 5 so draw edge for h and g. For node g all edges are
processed.
f 1 10 d
g 1 5 h
h 1 8 f
Step 8: After drawing edge between e and h, Prim’s algorithm generated final
minimum spanning tree that is shown in the following diagram
V(Vertex) K (Known) D(Distance) P(Path)
a b
g
7 a 1 0 0
12
5 b 1 7 d
15 10 f
d
c 1 12 a
c
8
d 1 15 c
e h
11 e 0 11 b
f 1 10 d
g 1 5 h
h 1 8 f
Step 1: - First create a forest for the above graph. Forest means nodes without any
edges.
a b
g
d f
e h
Step 2: - Arrange all the weighted values into increasing order (sorted order)
5 7 8 9 10 11 12 13 15 20 34 44
a b
g
d f
e h
Step 3: - Pick the smallest weight from the sorted weights that is 5, draw edge for g
and h.
5 7 8 9 10 11 12 13 15 20 34 44
5
d f
e h
Step 4: - Remove 5 from the Queue of sorted weights. Next minimum is 7 so draw
edge for b and d. (no cycle)
7 8 9 10 11 12 13 15 20 34 44
a b
g
7
5
d f
e h
Step 5: - Remove 7 from the Queue of sorted weights. Next minimum is 8, draw
edge between h and f(no cycle)
8 9 10 11 12 13 15 20 34 44
a b
g
7
5
d f
c
8
e h
Step 6: - Remove 8 from the Queue of sorted weights. Next minimum is 9, draw
edge between h and f(But creates cycle).
According to Krushkal algorithm, there must be no cycle in the minimum
spanning tree, so remove this edge from the diagram.
Remove 9 from the Queue of sorted weights. Next minimum is 10, so draw
edge between d and f (no cycle)
a b
g a b
7 9 g
7
5 10
d f 5
d f
c
c
8
e 8
h e h
a b
g
7
10 5
d f
c
8
e h
11
Step 8: - Remove 11 from Queue of sorted weights. Next minimum value is 12, so
draw edge between a and c (No cycle)
12 13 15 20 34 44
a b
g
7
12
10 5
d f
c
8
e h
11
Step 9: - Remove 12 from Queue of sorted weights. Next minimum value is 13,
draw edge between d and e (But creates cycle)
a b
a b g
g 7
7 12
10 5
12 f
10 5 d
d f
c 15
c 8
13 e h
8
e h 11
11
All the vertices are connected, hence completed Krushkal’s minimum
spanning tree
Action of Krushkal’s algorithms are shown in the following table
Edge Weight Action
(a,b) 34 Rejected
(a,c) 12 Connected
(b,g) 44 Rejected
(c,d) 15 Connected
(c,e) 20 Rejected
(d,b) 7 Connected
(d,f) 10 Connected
(e,d) 13 Rejected
(e,h) 11 Connected
(f,g) 9 Rejected
(f,h) 8 Connected
(h,g) 5 Connected
34
a b 44
g
7
12 9
10 5
15 d f
13 8
c
20 e
11 h
Step 2: - From the two available distances 12 is the smallest value so consider c for
Dijkstra’s node. Here the cell c has two paths c -> d and c -> e with weights 15 and
20 respectively. From these 15 is small value, it is added to existing(selected) path
that is 12. That results matrix update with the value 27 in distance of row d.
Step 3:
Step 4:
Step 5:
Step 6:
Step 7:
CHAPTER
6
SORTING
METHODS
Syllabus:
Order Statistics: Lower Bound on Complexity for Sorting Methods: Lower Bound on
Worst Case Complexity, Lower Bound on Average Case Complexity, Heap Sort,
Quick Sort, Radix Sorting, Merge Sort.
i<j condition true then swap i position element and j position element
i<j condition true then swap i position element and j position element
i < j condition false then terminate the loop then swap pivot element and j
position element
)* /: Trace the Quick sort algorithm for the following list of numbers:
90, 77, 60, 99, 55, 88, 62
)* +: Sort the following list of elements using quick sort
50, 30, 10, 90, 80, 20, 40, 70
Program : % : ;
#include<stdio.h>
void qsort(int[],int,int);
void main()
{
int n,i,a[10];
printf("enter n value ");
scanf("%d",&n);
printf("enter the elements ");
for(i=0;i<n;i++)
scanf("%d",&a[i]);
qsort(a,0,n-1);
printf(" output sorted list is \n");
for(i=0;i<n;i++)
printf("%3d",a[i]);
}
void qsort(int a[10],int left,int right)
{
int p,i,j,t;
if(left<right)
{
p=a[left];
i=left;
j=right;
do
{
do
{
i++;
}while(a[i]<=p&&i<=right);
while(a[j]>p&&j>left)
In the above diagram we have seen process of merge sort but execution of merge
sort not done as said above. Because recursive calls made for each sub list, observe
the following diagrams shows actual process of merge sort call.
Advantages and Disadvantages: -
The merge sort is slightly faster than the heap sort for large sets. It requires twice
the memory of the heap sort because of the second array. This additional memory
requirement makes it unattractive for most purposes in those cases the quick sort
#include<stdio.h>
#include<conio.h>
int a[10],n;
void getdata();
void display();
void msort(int,int);
void merge(int,int);
void main()
{
clrscr();
getdata();
display();
msort(0,n-1);
display();
}
void getdata()
{
int i;
printf("enter the n value");
scanf("%d",&n);
printf("enter the array elements");
for(i=0;i<n;i++)
scanf("%d",&a[i]);
}
void display()
{
int i;
printf("\n the array elements \n");
for(i=0;i<n;i++)
printf("%3d",a[i]);
}
void msort(int left,int right)
{
int mid;
if(left<right)
{
mid=(left+right)/2;
msort(left,mid);
msort(mid+1,right);
merge(left,right);
Time Complexity: -
The time complexity of merge sort is O(n log n). Recursive nature of merge sort
results for O(n log n). Each Recursive call has O(n) steps and there are log n steps
to get the final solution. so, it results for O(n log n).
0 1 2 3 4 5 6 7 8 9
a) Pass-I
70
121
432
12
683
70
965 121 577
577 12 432 965 683
0 1 2 3 4 5 6 7 8 9
b)Pass-2
b) Pass-III
70
121
432
12
683
70
965
577 12 121 432 577 683 965
432
0 1 2 3 oIn
4 5 6 7 8 9
Program : % *
#include<stdio.h>
#include<conio.h>
int a[10],n,bucket[10][10],b[10];
void getdata();
void display();
void rsort();
void main()
{
clrscr();
getdata();
display();
rsort();
display();
}
void getdata()
{
int i;
printf("enter the n value");
scanf("%d",&n);
printf("enter the array elements");
for(i=0;i<n;i++)
scanf("%d",&a[i]);
}
void display()
{
int i;
printf("\n the array elements \n");
!" 0
#include<stdio.h>
void read(int*,int);
void display(int*,int);
void heapsort(int*,int);
void formheap(int*,int,int);
void swap(int*,int*);
void main()
{
int arr[10],n;
printf("enter the n value ");
scanf("%d",&n);
printf("enter the array elements \n");
read(arr,n-1);
heapsort(arr,n-1);
printf("enter the array elements \n");
display(arr,n-1);
}
void read(int *a,int n)
{
int i;
for(i=0;i<=n;i++)
CHAPTER
7
PATTERN
MATCHING &
TRIES
Syllabus:
Pattern matching algorithms- the Boyer –Moore algorithm, the Knuth-Morris-Pratt
algorithm Tries: Definitions and concepts of digital search tree, Binary trie, Patricia
, Multi-way trie
Step 2: ABABBABABABA
ABABA
Step 3: ABABBABABABA
ABABA
Step 4: ABABBABABABA
ABABA
Step 5: ABABBABABABA
ABABA
Step 6: ABABBABABABA
ABABA
Pattern matched, so return the index
Boyer-Moore Algorithm:-
The Boyer-Moore algorithm was invented by Boyer and Moore. The Boyer-Moore
algorithm is considered to be the one of the most efficient string matching
algorithms. One of the applications that make use of this algorithm is text editors
for Find and Replace with commands. Boyer Moore algorithm scans the characters
of the pattern from right to left beginning with the rightmost character. In case of a
match or mismatch it uses two predefined functions to shift the search window to
the right. These two shift functions are called as the good-suffix shift, and bad-
character shift functions. Good suffix shift is also called as matching shift and bad
character shift is also called as the occurrence shift.
The Boyer-Moore’s Pattern matching algorithm is based on to heuristics. One
is looking glass heuristic and second one character-jump heuristic.
Looking-Glass Heuristic: Compare Ptrn with a sub-sequence of String moving
backwards
Character-Jump Heuristic: When a mismatch occurs at S[i]=c then If Ptrn contains
c, shift P to align the last occurrence of c in P with S[i] else shift Ptrn to align P[0]
with S[i+1].
Shift Operation: -
There are two cases exists for shift operation
Shift Case 1: if 1+Last(c)<=j then shift the pattern j-Last(c) units
Examples:
1. Consider a text T= XYXZXXYXTZXYXZXXYXXYXXYY
To match against the pattern P = XYXZXY
Trie: - A trie is a tree-based data structure for storing strings in order to search
patterns in quick way. A trie is an indexed search tree. Tries can be used to perform
prefix queries for information retrieval. These queries search for the longest prefix of
given string that matches a prefix of some string in trie. The name trie comes from
the retrieval. Much faster for retrieving data in stored in a trie structure, hence it is
trie.
Consider a set of Strings s = { a, an, and ant, any, at} and these words
represented in trie diagram as follows.
Advantages of tries:
1. In tries the keys are searched using common prefixes. Hence it is faster. The
lookup of keys depends upon the height in case of binary search tree.
2. Tries take less space when they contain a large number of short strings. As
nodes are shared between the keys.
3. Tries help with longest prefix matching, when we want to find the key.
' ' :
Standard Tries Single character per node
Compressed Tries Eliminating chains of nodes
Compact Tries Stores indices into tree
Suffix Tries Stores all suffixes of string into node
% % :-
A Standard trie representing a set of strings S is ordered tree that based on
following rules.
1. All nodes except root node labeled with a character
2. Nodes order children alphabetically
3. Combining characters along path from root to external node yields a string in set
of strings S.
4. All strings in the set S encoded with in standard trie
Compressed Tries: - A compressed trie is like a standard trie but each trie had, a
degree of at least 2. Single child nodes are compressed into single edge. This
ensures that each internal node in the trie has at lease two children. A compressed
trie is obtained from standard trie by compressing chains of redundant nodes. A
critical node is a node v such that v labeled with a string from S, v has at least 2
children, or v is the root.
Each internal node in a compressed trie has at least two children and each
external is associated with a string. The compression reduces the total space for trie
from O(m) where m is the sum of the lengths of strings in set S to O(n) where is n is
number of strings in S.
Compact Tries: -
Compact tries are the compact representation of compressed tries. For an array of
strings S that is S[0], S[1],… S[s-1] store ranges of indices at each node instead of
substring. Each one represents as a triplet of integers (i,j,k) such that X=s[i][j..k]
First consider all strings into S
S = {SELL, STOCK, BALL, BEARD, BELL, BULL, ROCK, STOP}
All the above strings are based on indexes as shown in the following table
SUFFIX TRIES: - Suffix tries, tree of all suffixes of a single string. Suffix tries are
used in pattern matching that is a substring is the prefix of a suffix. Changes a
linear search for the beginning of the pattern is like in KMP to a tree search. In
space complexity uses O(n) instead of O(n2) because characters only need to appear
once.
Consider MINIMIZE as a string it has eight suffixes as follows.
Remaining suffixes are act as nodes. The following diagram represents suffix trie for
the above diagram.
CHAPTER
8
FILE
STRUCTURES
Syllabus:
Fundamental File Processing Operations-opening files, closing files, Reading and
Writing file contents, Special characters in files.
Fundamental File Structure Concepts- Field and record organization, Managing
fixed-length, fixed-field buffers.
Keyboar Monitor
Mouse Printer
Program
Network Network
Memory Memory
The source stream that provides data to program is called the input stream,
the destination that receives output functions the program is called output stream.
A program extracts the bytes from an input stream and inserts bytes into output
stream. The data in the input stream can come from the key board or any other
storage device. The data in the output stream can go the screen or any other
storage device. A stream acts as a interface between the program and the
input/output device.
Input Stream
Input
>>
Device
Program
Output
Device <<
Output Stream
Streams are divided into 3 types. Console Streams, File Streams, String Streams.
Console output streams are used displaying data to standard output device that is
monitor. File input and output streams are used for handling input and output
operations to and from files. String input and output streams are used for handling
string operations in C++, a stream is represented by an object
clog is a standard buffered output stream that also provides error messages
to the standard error device .
istream ostream
iostrea
m
istream class: This class is input stream does formatted input. It contains some
input functions such as get(), getline() and read(),there is an overloaded member
function, stream extraction operator >>. This extraction operator is used to read
data from the standard input device to the memory items.
ostream class: This class is output stream does formatted output. It contains some
important output functions such as put() and write(). This class insertion operator
is used to write data to the standard output device.
iostream class: It provides both input and output functions. It is derived from
multiple base class istream and ostream. It provides both input and output
function because at these functions are inherited from istream and ostream classes.
istream_withassign: This class adds the assignment operators to the istream class.
8.2.1 istream class:- istream class performs all the activities related to specific
input. It is derived from the base class ios. The most important member function
of this class is the overloaded operator >>. For example: cin>>name;
The istream class contains several member functions the most important functions
are get().getline(),read()
get(str, size) :extracts character upto size specified into str array.
get(str, size, delim) :extracts the character into str array until size characters are
extracted or delim character is reached.
get() example:-
char ch;
cout<<"enter the character":
cin.get(ch);
cout<<ch;
enter the character ramesh
3.read(): This function reads a block of data of length specified by an argument 'n'.
This function reads data sequentially. So when the EOF is reached before the whole
block is read then the buffer will contain the elements read until EOF.
8.2.2 ostream class:- The ostream class handles all the activities related to specific
output. The ostream class is derived from the ios class. The ostream class contains
several member functions. The most important function of this class is overloaded
<< operator function. The ostream class overloades << operator.
1.put(): The put() function inserts the character specified by 'ch' onto output
stream. if it is ready, otherwise the bad bit is set.
syntax: put(char ch);
The objects cin and cout (predefined in iostream class) are used for taking input
and displaying output for various data types. Actually the operations >> (used for
input) and << (used for output) are overloaded to recognize all the basic c++ types.
The istream class overloads the operator>> and ostream class overloads the
operator<<.
syntax: cin>>v1>>v2.........>>vn;
cout<<i1<<i2<<......<<in;
The two functions get() and put()from istream and ostream class respectively
are used to handle the single character oriented input/output operations. The get()
function gets a character from keyboard and put() function output a character on
the screen.
get():- There are two forms of get()function
1. get(char *)function gets a character including blank space, tab and newline
character and assign it to its argument.
The two line oriented I/O functions getline() and write() can read and display a line
of txt more efficiently.
getline():- This function reads a line of text which ends with a new line character it
can call using cin object as follows.
cin.getline(line,size);
size is number of characters to be read getline()function reads the input until 'it
encounters '\n' or size-I characters are read. when '\n' is read it is not saved
instead it is replaced by the null characters.
A string can also be read using >> operator but the string should not contain any
white spaces. after reading a string 'cin' adds a null character at the end of the
string automatically.
write(): This function displays a line on the screen. it is called using cout object as
follows: cout. write(line,size);
where line is the string to be display. size is the number of characters to be
displayed.
if the size of string is greater than the line (i.e., text to be displayed) then
write () function stop displaying on encountering null character but displays beyond
the bonds of line.
C++ supports a number of features to format the output of a program. These are
2. manipulators
i. ios class functions and flags:- the ios class in includes a number of functions that
are used format the output of a program in several ways.
width():- This functions specifies the required width(or field size) for displaying
output of an item. This function is invoked using an object of ios class as follows:
cout.width(w);
for example
cout.width(5);
cout<<543<<12<<"\n";
5 4 3 1 2
The value 543 is printed right justified in the first five columns. The
specification width(5) does not retain the setting for printing the number 12.
cout.width(5);
cout<<543;
5 4 3 1 2
If the width specified is smaller than the size of value to be printed then C++
expands the field width to fit the value. The field width for each item should be
specified separately because after printing one item the width is revert back to the
default value.
precision():- This function is used to set the number of digits to be displayed after
decimal point while printing floating point numbers. The default precision is six
digits. This function is invoked as follows,
cout.precision(d);
e.g; cout.precision(2);
fill():- This function is used to fill the unused position in a filed by any desired
character. When the field width is more than actually required by the values these
unused positions are filled with white spaces by default. This function is called as
follows:
cout.fill(ch);
cout.fill('*');
#include<iostream.h>
#include<math.h>
#include<conio.h>
void main()
{
clrscr();
int n;
cout<<"enter the n value ";
cin>>n;
cout.precision(2);
for(int i=0;i<n;i++)
{
cout.width(5);
cout.fill('*');
cout<<i;
****2####1.41
****3####1.73
****4#######2
****5####2.24
Setf(): - Set flags is used to set format that controls the output. The setf() is an
overloaded function in ios class. Therefore it has two forms as shown below
cout.setf(arguement);
cout.setf(arguement1,arguement2);
where first argument in both of forms is the flag to be set and argument 2 in second
form specifies the group to which argument 1 belongs.
The first form of flags can be set or unset using setf() and unsetf() respectively.
These types of flags takes the first form of set() functions.
For example:
cout.setf(ios::showpos);
cout<<375;
The second type of formatting flags works in a group. This flags takes the second
form of the setf() functions.
cout.setf(ios::left,ios::adjustfield);
There are three groups to which a flag can belong these are : adjustfield, floatfield
and basefield. The table below shows the flags (arguement1), bit field (arguement 2)
and their format actions.
unsetf(): This function is used clear the flag specified. This is called as follows:
cout.unsetf(flag);
for example:cout.unsetf(ios::left);
Manipulators that don’t take arguments are declared in <iostream.h> header file
and the one that take arguments are declared in <iomanip.h> header file.
Manipulator Meaning
cout<<1350;
The output of this statement is right justified in a field width of 5 the output
to be left justified.
< 7 < : The file can be opened by the function called open(). The
syntax of file open is
open(filename,mode)
File Modes are
ios::in Open a file for input operations
ios::out Open a file for output operations
ios::binary Open a file in binary mode
Ex:
8.5 Positioning the Pointer in the File: While performing file operations, we must
be able to reach at any desired position inside the file. For this purpose there are
two commonly used functions –
1 *: The seek operation is using two functions seekg and seekp
* : means get pointer of specific location for reading of record
* : means get pointer of specific location for writing of record
syntax:
seekg(offset, reference-position);
seekp(offset, reference-position);
where offset is any constant specifying the location.
reference-position is for specifying beginning, end or current position. It can
be specified as