You are on page 1of 85

Handout: Data Structures with C

Version: DSC/Handout/0307/2.1 Date: 05-03-07

Cognizant 500 Glen Pointe Center West Teaneck, NJ 07666 Ph: 201-801-0233 www.cognizant.com

Data Structures with C

TABLE OF CONTENTS
Introduction ................................................................................................................................4 About this Document..................................................................................................................4 Target Audience.........................................................................................................................4 Objectives ..................................................................................................................................4 Pre-requisite ..............................................................................................................................4 Session 1: Introduction to Data Structure .................................................................................5 Learning Objectives ...................................................................................................................5 Overview....................................................................................................................................5 Summary ...................................................................................................................................9 Test your Understanding..........................................................................................................10 Session 2: Arrays ......................................................................................................................11 Learning Objectives .................................................................................................................11 Overview..................................................................................................................................11 Summary .................................................................................................................................20 Test your Understanding..........................................................................................................20 Session 4: Linked Lists .............................................................................................................21 Learning Objectives .................................................................................................................21 Linked lists ...............................................................................................................................21 Summary .................................................................................................................................32 Test your Understanding..........................................................................................................32 Session 6: Sorting and Searching ............................................................................................33 Learning Objectives .................................................................................................................33 Sorting .....................................................................................................................................33 Summary .................................................................................................................................43 Test your Understanding..........................................................................................................44 Session 8: Trees ........................................................................................................................45 Learning Objectives .................................................................................................................45 Overview: .................................................................................................................................45 Summary .................................................................................................................................56 Test your Understanding..........................................................................................................56

Page 2 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Session 10: Balanced trees and hashing.................................................................................57 Learning Objectives .................................................................................................................57 Overview: .................................................................................................................................57 Hashing....................................................................................................................................68 Summary .................................................................................................................................70 Test your Understanding..........................................................................................................70 Session 11: Graphs ...................................................................................................................71 Learning Objectives .................................................................................................................71 Graphs .....................................................................................................................................71 Summary .................................................................................................................................80 Test your Understanding..........................................................................................................80 Glossary .....................................................................................................................................81 References .................................................................................................................................84 Websites ..................................................................................................................................84 Books.......................................................................................................................................84 STUDENT NOTES: .....................................................................................................................85

Page 3 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Introduction

About this Document


This module provides the participants with the basic knowledge to understand data structures and to measure the performance of various algorithms used in different problems.

Target Audience
In-Campus Trainees

Objectives
Acquire the basic knowledge on data structures Select the appropriate data structures for the application Analyze the complexity of the algorithm Apply data structures using data structures

Pre-requisite
The participants must have basic knowledge in writing programs using C.

Page 4 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Session 1: Introduction to Data Structure

Learning Objectives
After completing this chapter, you will be able to: Define a data structure List the types of data structures Identify how to analyze and select data structure for a particular application

Overview
Study of computer science involves study of organization, manipulation and utilization of data in a computer in order to improve the efficiency of the processor and memory. Data type and data structure Data can be represented in the form of binary digits in memory. A binary digit can be stored using the basic unit of data called bit. A bit can represent either a zero or a one. Data type A data type defines the specification of a set of data and the characteristics for that data. Data type is derived from the basic nature of data that are stored for processing rather from their implementation. Data Structure Data structure refers to the actual implementation of the data type and offers a way of storing data in an efficient manner. Any data structure is designed to organize data to suit a specific purpose so that it can be accessed and worked in appropriate ways both effectively and efficiently. In computer programming, a data structure may be selected or designed to store data for the purpose of working on it by various algorithms. The choice of a data structure begins from the choice of an abstract data type. Data structures are implemented using the data types, references and operations on them that are provided by a programming language. Example data structures include: Arrays Stacks Queues Linked Lists

Page 5 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Abstract Data Types (ADT) An Abstract Data Type (ADT) defines data together with the operations. ADT is specified independently of any particular implementation. ADT depicts the basic nature or concept of the data structure rather than the implementation details of the data. A stack or a queue is an example of an ADT. Both stacks and queues can be implemented using an array or using a linked list. Types of Data Structures The different types of data structures include linear data structures, hash tables and non linear data structures. The structure of a data file defines how records, or rows of data, are related to fields, or columns of data. Linear structures A data structure is said to be linear if its elements form a sequence or a linear list.

Some of the linear structures are: Array: Linked-list: Stack: Queue: Priority queue: Traversal: Search: Insertion: Deletion: Sorting: Merging: Fixed-size Variable-size Add to top and remove from top Add to back and remove from front Add anywhere, remove the highest priority Travel through the data structure Traversal through the data structure for a given element Adding new elements to the data structure Removing an element from the data structure Arranging the elements in some type of order Combining two similar data structures into one

Possible operations on these linear structures include:

Hash table A hash table, or a hash map, is a data structure that associates keys with values. A function termed as Hash function is applied on the key to find the address of the record. Non linear structures A data structure is said to be non linear if its elements are not in a sequence. The elements in the data structure are not arranged in a linear manner; rather it has a branched structure. Some of the non linear structures are: Tree: Graph: Collection of nodes represented in hierarchical fashion Collection of nodes connected together through edges

Page 6 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Selecting a Data Structure Data structures that suit certain applications may not suit certain other applications. The choice of the data structure often begins from the choice of an abstract data structure an abstract storage for data defined in terms of the set of operations to be performed on data and computational complexity for performing these operations, regardless of the implementation in a concrete data structure. Selection of an abstract data structure is crucial in the design of efficient algorithms and in estimating their computational complexity, while selection of concrete data structures is important for efficient implementation of algorithms. The names of many abstract data structures and abstract data types match the names of concrete data structures. In the design of many types of programs, the choice of data structures is a primary design consideration, as experience in building large systems has shown that the difficulty of implementation and the quality and performance of the final result depends heavily on choosing the best data structure. Performance Analysis and Measurements Performance analysis is often made in terms of best, worst and average cases of a given algorithm. This expresses the resource usage as minimum, maximum, and average respectively. The resource includes the running time, memory and any other resource. In real-time computing, the worst case execution time is often of particular concern since it is important to know how much time might be needed in the worst case to guarantee that the algorithm would always finish on time. Average performance and worst case performance are the most used in algorithm analysis. Less widely found is best case performance. The best case performance is measured usually to improve accuracy of an overall worst case analysis. Computer scientists use probabilistic analysis techniques, especially expected value, to determine expected average running times. Worst case performance analysis and average case performance analysis have similarities, but usually require different tools and approaches in practice. Determining what average input means is difficult. The complexity is analyzed based on the input in general. Based on the nature of input, it is difficult to analyze equations in average case, and hence it is difficult to characterize the complexity mathematically. Worst case analysis has similar problems. Typically it is difficult to determine the exact worst case scenario. Instead, a scenario is considered which is at least as bad as the worst case. For example, when analyzing an algorithm, it may be possible to find the longest possible path through the algorithm. It is always important to find the efficiency of an algorithm with respect to the following: CPU (time) usage memory usage disk usage network usage

Page 7 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Measurement of complexity

Big O notation or Big Oh notation Big O notation (Big Oh notation) expresses the amount of time required by the algorithm to execute. It can be denoted using the symbol O. It is used in the analysis of the complexity of algorithms and is used to characterize a function's behavior for the extreme inputs in a simple way. The measurement of complexity for different scenarios is expressed as follows: For a method which executes in constant time period, the complexity is given by O(1) For a method which executes in linear time period, the complexity is given by O(N) For a method which executes in quadratic time period, the complexity can be given by O(N2) Determination of complexities Determining the complexity of an algorithm depends on the statements being used in the algorithm. For different types of statements the complexity is given below Sequence of statements Statement 1; Statement 2; . . . . . Statement n; // none of the statements are loops, all are independent statements Time period can be given by Total time = time (statement 1) + time (statement 2) + . + time (statement 3) If each statement is simple, then the time for each statement is constant, and hence the total time is also constant. This makes the complexity as O(1). Selection statement (if-then-else) if (condition) Sequence of statements 1; else Sequence of statements 2;

Page 8 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Here, either the sequence of statements 1 will be executed or sequence of statements 2 will be executed. So, the worst case complexity for the entire selection statement depends on the complexity of sequence 1 and sequence 2. If sequence 1 has the complexity O(1) and sequence 2 has the complexity O(N), the worst case complexity is taken as O(N). Looping statement (for) for (condition) Sequence of simple statements; Here, considering that the loop executes N times, the complexity can be given by N * O(1) which is equivalent to O(N). Nested loops for (condition 1) for (condition 2) Sequence of simple statements; Here, considering that the outer loop executes N times and the inner loop executes M times, the complexity can be given by N * M * O(1). i.e., the complexity can be given as O(N*M)

Summary
Study of data structure deals with the actual implementation of the data type and offers a way of storing data in an efficient manner. An Abstract Data Type (ADT) is a data type together with the operations, whose properties are specified independently of any particular implementation The different types of data structure available are: o o o o Linear Hash table Trees Graphs

A well-designed data structure allows a variety of critical operations to be performed, using as few resources, both execution time and memory space, as possible. Big O Notation can be made use of for the analysis of the complexity of algorithms.

Page 9 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Test your Understanding


1. The complexity of an algorithm which finds the sum of n numbers will be a. b. c. d. O(n log n) O(n2) O(n) O(2n)

2. ParentChild relationship can be considered as a linear data structure a. True b. False Answers 1. c 2. b

Page 10 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Session 2: Arrays

Learning Objectives
After completing this chapter, you will be able to: Define arrays Use arrays as data structures

Overview
An array is a collection of individual values of the same data type stored in consequent memory locations. An array index (positioning in the array) usually starts from 0. We can even specify the value from which the index should start depending on the language we use. Here is an array of integers: myArray 13 0 5 1 12 2 3 3 6 4 Array values Array positions/Index

Declaring an array in C int CArray[10];

Referring to elements of the array The position of an element in an array is given by the index. The name of the array, followed by the index, is used to refer to a particular element: myArray[1] = 5; The above statement assigns the value 5 to the element at the position 1(second element) of the array, myArray. Using elements of an array Elements of the array can be used in the same way as variables of the same data type can be used. i.e. an element of an array of integers can be used anywhere an integer variable can be used. printf ('The fifth element of the array is %d', myArray[4]);

Page 11 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


The above statement prints the 5th element in myArray. i.e, it will print as follows: The fifth element of the array is 6

Example: Assigning values to each element of the array

for ( count = 0 ; count < 5 ; count++) { evens[count] = 2 * count; } The above piece of code will construct an array evens as given below 0 2 4 6 8 Array values

Array index

Multi Dimensional Arrays These are the arrays which has more than one dimension. For example, the following declaration in C creates a two-dimensional array of two rows and two columns: int myArray1[4,2]

The following declaration creates an array of three dimensions, 2, 2, and 3: int myArray2[4,2,3];

Initialization The following piece of code initializes the arrays myArray1 and myArray2 myArray1 = {(1, 2), (3, 4)} myArray2 = {(1, 2), (3, 4), (5, 6)} In a matrix form the above array can be represented as below myArray1 1 3 myArray2 1 3 5 2 4 6

2 4

Page 12 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Memory Organization in an array Array elements occupy contiguous locations in memory. The array elements are accessed using their index. A function is needed to translate an array index to the address of the indexed element. For a single dimensional array the address can be calculated as below: Address = Base Address + (Index Base Index) * Size Where, Base Index represents the value of the first index in the array Size represents the size of a single element in bytes

Advantages and disadvantages of an array Advantages Array data structure is simple to use. Elements in an array are stored in contiguous memory locations and hence each element can be accessed directly using their index. Allocation and de-allocation of memory is done automatically by the computer. Disadvantages Elements in an array are stored in contiguous memory locations and hence array can not be stored if the available memory is non contiguous. i.e. if the size of the array is n bytes, then there should be n contiguous bytes available in memory. The array size is fixed and hence the size of the array can not be reduced or increased at run time based on the requirement. Stacks A stack is a homogeneous collection of items of any one type, arranged linearly with access at one end only, known as the top. This means that data can be added or removed from only the top. Formally this type of stack is called a Last In First Out (LIFO) stack. Data is added to the stack using the Push operation, and removed using the Pop operation.

In order to clarify the idea of a stack here is an example. Think of a number of plates kept in a cafeteria. When the plates are being stacked, they are added one on top of each other. It doesn't make much sense to put each plate on the bottom of the pile, as that would be far more work. Similarly, when a plate is taken, it is usually taken from the top of the stack. Stack consists of two parts: Storage space within stack that contains the elements of a stack. Top of stack that refers to the element pushed recently.

Page 13 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

A stack can be implemented either using an array or a linked list. Stack implementation using an array Top is an integer value, which contains the array index for the top of the stack. Each time data is pushed or popped, top is incremented or decremented accordingly, to keep track of the current top of the stack. By convention, an empty stack is indicated by setting top to be equal to -1. Stacks implemented as arrays are useful if a fixed amount of data is to be used. However, if the amount of data is not a fixed size or the amount of the data fluctuates widely during the stack's life time, then an array is a poor choice for implementing a stack. Any recursive call is implemented with the help of a stack by the computer. The size of the stack can not be predicted in recursion, and implementing the stack using array is a poor choice in this

Algorithm to implement the operations using array Push: if(top>=total_no_elements) return(1); else { printf("\n Enter the element \n"); scanf("%d",&stack[top]); top++; } // Error code

Page 14 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Pop: if(top==0) { printf("\n STACK EMPTY \n"); } else { top--; printf("\n\nPopped element = %d\n",stack[top]); } Display: if(top==0) { printf("\n STACK IS EMPTY \n"); } else { printf("\n The elements inside the stack are :\n"); for(j=top-1;j>=0;j--) { printf("\n%d",stack[j]); } } Stack operations: Operation Push Description This operation adds or another item onto the stack. pushes Return type Data type Requirement The number of items on the stack is less than n. The number of items on the stack must be greater than 0. Note: It does not remove that item.

Pop:

This operation removes an item from Data type the stack. This operation returns the value of the Data type item at the top of the stack. This operation returns true if the stack Boolean is empty and false if it is not. This operation returns true if the stack Boolean is full and false if it is not.

Top:

Is Empty:

Is Full:

Page 15 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Queues A queue is data structure in which elements are accessed from two different ends called Front and Rear. The elements are inserted into a queue through the Rear end and are removed from the Front end. The principle used in queue is "First In First Out" or FIFO. There are two basic operations associated with a queue: enqueue and dequeue. Enqueue means adding a new item to the rear end of the queue. The rear end always points to the recently added element. Dequeue refers to removing the item from front end of the queue. The front end always points to the recently removed element. Theoretically, a queue does not have a specific capacity. Regardless of how many elements are already contained, a new element can always be added. It can also be empty, at which point removing an element will be impossible until a new element has been added again. A practical implementation of a queue using arrays does have some capacity limit. For a data structure the executing computer will eventually run out of memory, thus limiting the queue size. Queue overflow results from trying to add an element into a full queue and queue underflow happens when trying to remove an element from an empty queue. A queue consists of two major variables Front and Rear. Front refers to the first position of the queue and Rear refers to the last position of the queue.

Types of queues Circular queue A circular queue is one in which the insertion of a new element is done at the very first location of the queue if the last location of the queue is full. i.e. circular queue is one in which the first element comes just after the last element. A circular queue overcomes the problem of unutilized space in linear queues implemented as arrays. A circular queue also have a Front and Rear to keep the track of elements to be deleted and inserted and therefore to maintain the unique characteristic of the queue. The assumptions made are: 1. 2. 3. 4. Front will always be pointing to the first element If Front=Rear, the queue is empty Each time a new element is inserted into the queue the Rear is incremented by one. Each time an element is deleted from the queue the value of Front is incremented by one

Page 16 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example: Circular Queue

Q[0]

Q[1]

Q[4] Q[3]

Q[2]

Inserting and deleting elements Insertion and deletion of elements in a circular queue is the same as that in a linear queue except that whenever an element is deleted from the front of the queue, the rear pointer can be made to point to the vacant position and the element can be inserted there once the queue is full. Front

10

Q[4] Q[3]

20
Rear

Front

10

40
Rear

20 30

Before insertion

Page 17 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Front

Q[0]

10

40
Rear

20 30

After inserting two elements 30 and 40 Queue full Deletion in a circular queue Now Q[0] will be available in the queue for another insertion. Double Ended Queues Double ended queue is a homogeneous list of elements in which insertion and deletion operations are performed from both the ends. They are also called as deque. There are two types of deques Input-restricted deques and Output-restricted deques

The major operations involved are: Insertion of an element at the Rear end of the queue. Deletion of an element from the Front end of the queue Insertion of an element at the Front end of the queue Deletion of an element from the Rear end of the queue For an input-restricted deque, all the four operations mentioned above are valid. For an outputrestricted deque, all the above points except the fourth are valid. Priority Queue In priority queues, the items added to the queue have a priority associated with them which determines the order in which they exit the queue. Items with highest priority are removed first. A priority queue is an abstract data type supporting the following three operations: add an element to the queue with an associated priority remove the element from the queue that has the highest priority, and return it (optionally) peek at the element with highest priority without removing it The simplest way to implement a priority queue data type is to keep an associative array mapping each priority to a list of elements with that priority

Page 18 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Applications of queues Round robin technique for processor scheduling uses the concept of queues Railway ticket reservation center is designed using queues to store customer information Printer server routines are designed using queues Scheduling and buffering queues A queue is natural data structure for a system to serve the incoming requests. Most of the process scheduling or disk scheduling algorithms in operating systems use queues. Computer hardware like a processor or a network card also maintain buffers in the form of queues for incoming resource requests. A stack like data structure causes starvation of the first requests, and is not applicable in such cases. A mailbox or port to save messages to communicate between two users or processes in a system is essentially a queue like structure. Search space exploration Like stacks, queues can be used to remember the search space that needs to be explored at one point of time in traversing algorithms. Breadth first search of a graph uses a queue to remember the nodes yet to be visited. Implementation of queue using array Inserting an element into a queue if(rear==max_no_of_elements) rear=0; else rear=rear+1; if(rear==front) { printf("QUEUE OVERFLOW \n"); if(rear==0) rear=max_no_of_elements-1; else rear=rear-1; break; } else { printf("\n :\n"); scanf("%d",&x); queue[rear]=x; } Enter the elements which you want to insert

Page 19 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Deletion of an element from a queue if(front==rear) printf(" QUEUE UNDERFLOW \n "); else { if( front == (max_no_of_elements -1) ) front=0; else front=front+1; x=queue[front]; } In a stack, each new data item is stored at the top of the stack. Top points to the top of the stack in the figure. When a new data is added, the data is stored in the Top position and the Top pointer is increased.

Summary
An array is a collection of individual values of the same data type stored in adjacent memory locations A stack is a homogeneous collection of items of any one type, arranged linearly with access at one end only, known as the top. The two major operations available for a stack include push(adding an element) and pop(deleting an element) A collection of items in which only the earliest added item may be accessed. Basic operations are add (to the tail) or enqueue and delete (from the head) or dequeue. The major variations for queues are double ended queue, circular queue and priority queue

Test your Understanding


1. The elements inserted in order A, B, C, D are traversed in stack as a. ABCD b. DCBA c. ADCB d. None of the above 2. The size of an array can be --a. Extended b. Reduced c. Either a or b d. Neither a nor b Answers 1. b 2. d

Page 20 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Session 4: Linked Lists

Learning Objectives
After completing this chapter, you will be able to: Define linked list Implement linked list operations in your program

Linked lists
A linked list can be viewed as a group of items, each of which points to the item in its neighbourhood. An item in a linked list is known as a node. A node contains a data part and one or two pointer part which contains the address of the neighbouring nodes in the list. Linked list is a data structure that supports dynamic memory allocation and hence it solves the problems of using an array.

Types of linked lists The different types of linked lists include: Singly linked lists Circular linked lists Doubly linked lists Simple/Singly Linked Lists In singly linked lists, each node contains a data part and an address part. The address part of the node points to the next node in the list. Node Structure of a linked list

Data part

Link part

An example of a singly linked list can be pictured as shown below. Note that each node is pictured as a box, while each pointer is drawn as an arrow. A NULL pointer is used to mark the end of the list.

Page 21 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


The head pointer points to the first node in a linked list If head is NULL, the linked list is empty

A head pointer to a list Possible Operations on a singly linked list Insertion: Elements are added at any position in a linked list by linking nodes. Deletion: Elements are deleted at any position in a linked list by altering the links of the adjacent nodes. Searching or Iterating through the list to display items. To insert or delete items from any position of the list, we need to traverse the list starting from its root till we get the item that we are looking for. Implementation of a singly linked list Creating a linked list A node in a linked list is usually a structure in C and can be declared as struct Node { int info; Node *next; }; //end struct A node is dynamically allocated as follows: Node *p; p = new Node; For creating the list, the following code can be used: do { Current_node = malloc (sizeof (node) ); Current_node->info=input_value; Current_node->next=NULL; if(root_node==NULL) root_node=Current_node; else

// the first node in the list

previous_node->next=Current_node; previous_node=Current_node; scanf("%d",&input_value); } while(x!=-999);

Page 22 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


The above given code will create the list by taking values until the user inputs -999. Inserting an element After getting the position and element which needs to be inserted, the following code can be used to insert an element to the list if(position==1||root_node==NULL) { Current_node->next=root_node; Root_node=Current_node; } else { counter=2; temp_node=root_node; while((counter<position) &&(temp_node!=NULL)) { counter++; temp_node=temp_node->next; } Current_node->next=temp_node->next; temp_node->next=Current_node; } The following figure illustrates how a node is inserted at an intermediate position in the list.

To insert a node between two nodes

The following figure illustrates how a node is inserted at the beginning of the list.

Page 23 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

To insert a node at the beginning of a linked list

Deleting an element After getting the element to be removed, the following code can be used to remove the particular element. temp_node=root_node; if ( root_node != NULL ) if ( temp_node->info == input_element ) { root_node=root_node->next; return; } While ( temp_node != NULL input_element ) temp_node = temp_node->next; if ( temp->next != NULL ) { && temp_node->next->info !=

delete_node = temp_node->next; temp_node->next=delete_node->next; free ( delete_node ) ; } The following figures illustrate the deletion of an intermediate node and the deletion of the first node from the list.

Page 24 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Deleting an intermediate node from a linked list

Deleting the first node

To display the elements of the list temp_node = root_node; while(temp_node != NULL) { printf("%d\t", temp_node->info); temp_node = temp_node->next; } The following figure illustrates the above piece of code.

The effect of the assignment temp_node = temp_node->next

Page 25 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Efficiency and advantages of Linked Lists Although arrays require same number of comparisons, the advantage lies in the fact that no items need to be moved after insertion or deletion. As opposed to fixed size of arrays, linked lists use exactly as much memory as is needed. Individual nodes need not be contiguous in memory. Doubly Linked List A more sophisticated kind of linked list is a doubly-linked list or a two-way linked list. In a doubly linked list, each node has two links: one pointing to the previous node and one pointing to the next node. Node structure

Previous Link An example of a doubly linked list

Data

Next Link

Implementation of a doubly linked list Adding an element to the list To add the first node first_node->next = NULL; first_node->data = input_element; first_node->prev = NULL; To add a node at the position specified Temp_node = *first_node; for ( counter = 0 ; counter<position-1 ; counter++ ) { Temp_node = Temp_node->next; } new_node->next = temp_node->next; temp_node->next->new_node; new_node->prev = temp_node->next->prev; temp_node->next->prev = new_node;

Page 26 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Deleting a particular element from the list Temp_node = *first_node; If ( temp_node->data = = input_element ) First_node = first_node->next; else { while input_element) ( temp_node != NULL && temp_node->next->data !=

temp_node = temp_node -> next; delete_node=temp_node->next; temp_node->next=delete_node->next; delete_node->next->prev=temp_node; free(delete_node); }

Circular Linked Lists In a circularly-linked list, the first and final nodes are linked together. In another words, circularlylinked lists can be seen as having no beginning or end. To traverse a circular linked list, begin at any node and follow the list in either direction until you return to the original node. This type of list is most useful in cases where you have one object in a list and wish to see all other objects in the list. The pointer pointing to the whole list is usually called the end pointer. Singly-circularly-linked list In a singly-circularly-linked list, each node has one link, similar to an ordinary singly-linked list, except that the link of the last node points back to the first node. As in a singly-linked list, new nodes can only be efficiently inserted after a node we already have a reference to. For this reason, it's usual to retain a reference to only the last element in a singly-circularly-linked list, as this allows quick insertion at the beginning, and also allows access to the first node through the last node's next pointer. The following figure shows a singly circularly linked list.

10

20

30

40

Doubly-circularly-linked list In a doubly-circularly-linked list, each node has two links, similar to a doubly-linked list, except that the previous link of the first node points to the last node and the next link of the last node points to the first node. As in doubly-linked lists, insertions and removals can be done at any point with access to any nearby node.

Page 27 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


The following figure illustrates a doubly circularly linked list

10

20

30

40

Circularly-linked list vs. linearly-linked list Circularly linked lists are useful to traverse an entire list starting at any point. In a linear linked list, it is required to know the head pointer to traverse the entire list. The linear linked list cannot be traversed completely with the help of an intermediate pointer. Access to any element in a doubly circularly linked list is much easier than in a linearly linked list since the particular element can be approached in two directions. For example to access an element present in the fourth node of a circularly linked list having five elements, it is enough to start from the last node and traverse the list in the reverse direction to get the value in the fourth node. Implementation of a circular linked list: Creating the list while (input_element != -999) { new_node=(struct node *) malloc (size); new_node->info=input_element; if ( root_node==NULL ) root_node=new_node; else ( *last_node )->next=new_node; (*last_node)=new_node; scanf("%d",&input_element); } if(root!=NULL) new->next=root; return root;

Page 28 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Inserting elements into the list After getting the position and value to be inserted, the following code can be followed: new_node=(struct node *)malloc(sizeof(struct node)); new_node-> info=input_element; if((position==1)||((*root_node)==NULL)) { new_node->next =*root_node; *root_node = new_node; if((*last_node)!=NULL) (*last_node)->next=*root_node; else *last_node=*start_node; } else { temp_node=*root_node; counter=2; while ( (counter<position) (*root_node) ) ) { temp_node=temp_node->next; ++counter; } if(temp_node->next==(*root_node)) *last_node=new_node; new_node->next=temp_node->next; temp_node->next=new_node; } Deleting an element from the list After getting the element to be deleted, the following code can be used: If(*front_node != NULL) { printf(The item deleted is %d,(*front_node->info)); If (*front_node == *rear_node) { *front_node = *rear_node = NULL; } else { *front_node = *front_node->next; *rear_node->link = *front_node; }

&&

(temp_node->next

!=

Page 29 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


} Stacks and queues using pointers One disadvantage of using an array to implement a stack or queue is the wasted space---most of the time most of the most of the space in the array is unused. A more elegant and economical implementation of a stack or queue uses a linked list. Here is a sketch of a linked-list-based stack that holds 1, then 5, and then 20 at the bottom:

Top

20

NULL

The list consists of three cells, each of which holds a data object and a link to another cell. A variable, top, holds the address of the first cell in the list. An empty stack looks like this: Top NULL Implementing stacks as linked lists provides a feasibility on the number of nodes by dynamically growing stacks, as a linked list is a dynamic data structure. The stack can grow or shrink as the program demands it to. Algorithm to implement stack operations using pointers: Push node=(struct stack*)malloc(sizeof(struct stack)); printf("\n\n Enter the data scanf("%d",&node->data); node->link=top; top=node; Pop if(top==NULL) return(1); else { printf("\n \n Item deleted is %d ",top->data); top=top->link; } //Error code ");

Page 30 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Display i=top; if(top==NULL) return(1); else { printf(" \n\n ELEMENTS ARE : \n"); while(i!=NULL) { printf("%d\n\n",i->data); i=i->link; } } Implementation of queues using lists is very similar to the implementation of stacks, except that in this case items join the queue at the back and leave at the front. If the queue is represented by the list [5, 2], adding a new item 3 will give the list [5, 2, 3]. In other words new items are added to the end of the list. Removing an item from the queue will be done from the front. //Error code

A pictorial representation of a queue being implemented as a linked list is given below. The variable rear points to the last item in the queue.

Front

3
Rear

NULL

Algorithm to represent queue operations using pointers Inserting an element new_element->link = NULL; if (front==NULL) front = new_element; else rear->link = new_element; rear = new_element;

Deleting an element temp = front; front = front->link; free (temp);

Page 31 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Summary
A linked list is a collection of elements called nodes, each of which contains a data portion and a pointer to the node following that one in the linear ordering of the list. A singly linked list is a dynamic data structure which can grow and shrink depending upon the operations made. It has a single pointer which points to the successive node in the list. A doubly linked list is one in which all nodes are linked together by multiple number of links which help in accessing both the successor node and the predecessor node from a given node position. It provides bi-directional traversing. A circular linked list is the one which has no end. i.e the link field of the last node does not point to NULL, rather it points back to the beginning of the linked list. Stacks and queues can be more efficiently implemented using pointers rather than by using arrays.

Test your Understanding


1. The last node of a linear linked list ______. a. Has the value null b. Has a next reference whose value is null c. Has a next reference which references the first node of the list d. Cannot store any data 2. To delete a node N from a linear linked list, you will need to ______. a. Set the link in the node that precedes N to link in the node that follows N b. Set the link in the node that precedes N to link N c. Set the link in the node that follows N to link in the node that precedes N d. Set the link in N to link in the node that follows N 3. Write a function that removes all duplicate elements from a linear linked list. 4. Write a function to print the elements in reverse order of a singly linked list. 5. Write a function to find the largest element in a circular linked list.

Answers 1. b 2. b

Page 32 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Session 6: Sorting and Searching

Learning Objectives
After completing this chapter, you will be able to: Explain the concepts of sorting and searching List the advantages of each technique List the limitations of each technique

Sorting
Sorting refers to ordering data in an increasing or decreasing fashion according to some linear relationship among the data items. Sorting can be done on names, numbers and records. Sorting reduces the For example, it is relatively easy to look up the phone number of a friend from a telephone dictionary because the names in the phone book have been sorted into alphabetical order. This example clearly illustrates one of the main reasons that sorting large quantities of information is desirable. That is, sorting greatly improves the efficiency of searching. If we were to open a phone book, and find that the names were not presented in any logical order, it would take an incredibly long time to look up someones phone number. Sorting can be performed using several methods, they are: Selection Sort. In this method, the successive elements are selected in order and are placed in their proper sorted positions. Insertion sort. In this method, sorting is done by inserting elements into an existing sorted list. Initially, the sorted list has only one element. Other elements are gradually added into the list in the proper position. Bubble Sort. In this method, the entire file will be passed through several times. Each pass will compare each element with its successor and putting the element in the proper position. Merge Sort. In this method, the elements are divided into partitions until each partition has sorted elements. Then, these partitions are merged and the elements are properly positioned to get a fully sorted list.

Page 33 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Quick Sort. In this method, an element called pivot is identified and that element is fixed in its place by moving all the elements less than that to its left and all the elements greater than that to its right. Radix Sort. In this method, sorting is done based on the place values of the number. In this scheme, sorting is done on the less-significant digits first. When all the numbers are sorted on a more significant digit, numbers that have the same digit in that position but different digits in a less-significant position are already sorted on the less-significant position. Heap Sort In this method, the file to be sorted is interpreted as a binary tree. Array, which is a sequential representation of binary tree, is used to implement the heap sort. In this chapter, focus is given to bubble sort, quick sort and heap sort. The basic premise behind sorting an array is that its elements start out in some random order and need to be arranged from lowest to highest. It is easy to see that the list 1, 5, 6, 19, 23, 45, 67, 98, 124, 401 is sorted, whereas the list 4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345 is not. The property that makes the second one "not sorted" is that there are adjacent elements that are out of order. The first item is greater than the second instead of less, and likewise the third is greater than the fourth and so on. Once this observation is made, it is not very hard to devise a sort that proceeds by examining adjacent elements to see if they are in order, and swapping them if they are not. Bubble Sort This sorting technique is named so because of the logic is similar to the bubble in water. When a bubble is formed it is small at the bottom and when it moves up it becomes bigger and bigger i.e. bubbles are in ascending order of their size from the bottom to the top. This sorting method proceeds by scanning through the elements one pair at a time, and swapping any adjacent pairs it finds to be out of order.

Page 34 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example 6.1 Input sequence: 34 8 64 51 32 21 After iteration Altered sequence

# after an iteration # of swaps -----------------------------------------------------------------------1 8 34 51 32 21 64 4 2 8 34 32 21 51 64 2 3 4 5 6 8 32 21 34 51 64 8 21 32 34 51 64 8 21 32 34 51 64 8 21 32 34 51 64 2 1 0 0

Each pass consists of comparing each element in the file with its successor (x[i] > x[i+1]) Swap the two elements if they are not in proper order. After each pass i, the largest element x[n-(i1)] is in its proper position within the sorted array. Bubble Sort - Algorithm bubble(int x[], int n) { int hold, j, pass; int switched = TRUE; for (pass = 0; pass < n - 1 && switched == TRUE; pass++) { switched = FALSE; for (j = 0; j < n-pass-1; j++) if (x[j] > x[j+1]) { switched = TRUE; /* swap x[j], x[j+1] */ hold = x[j]; x[j] = x[j+1]; x[j+1] = hold; } } } In the first pass, n-1 items have to be scanned. On the second pass, the second largest item will move to its correct position, and on the third pass (stopping at item n-3) the third largest will be in place. It is this gradual filtration, or bubbling of the larger items to the top end that gives this sorting technique its name. /* it stops if there is no swap in the pass */

Page 35 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


There are two ways in which the sort can terminate with everything in the right order. It could complete by reaching the n-1st pass and placing the second smallest item in its correct position. Alternatively, it could find on some earlier pass that nothing needs to be swapped. That is, all adjacent pairs are already in the correct order. In this case, there is no need to go on to subsequent passes, for the sort is complete already. If the list started in sorted order, this would happen on the very first pass. If it started in reverse order, it would not happen until the last one. Quick Sort In this sort an element called pivot is identified and that element is fixed in its place by moving all the elements less than that to its left and all the elements greater than that to its right. Since it partitions the element sequence into left, pivot and right it is referred as a sorting by partitioning. Instead of moving a single element towards its place, a pair element is moved in a single swap. This makes the sorting quick. After the partitioning, each of the sub-lists is sorted, which will cause the entire array to be sorted. quickSort(int first,int last) { if (first < last) /* if the part being sorted isn't empty */ { mid = quickParition(first,last); if (mid-1 > first) quickSort(first,mid-1); if (mid+1 < last) quickSort(mid+1,last); } return; } The hardest part of quick sort is the partitioning of elements. The algorithm looks at the first element of the array (called the "pivot"). It will put all of the elements which are less than the pivot in the lower portion of the array and the elements higher than the pivot in the upper portion of the array. When that is complete, it can put the pivot between those two sections and quick sort will be able to sort the two sections separately. The details of the partitioning algorithm depend on counters which are moving from the ends of the array toward the center. Each will move until it finds a value which is in the wrong section of the array (larger than the pivot and in the lower portion or less than the pivot and in the upper portion). Those entries will be swapped to put them into their appropriate sections and the counters will continue searching for out of place values. When the two counters cross, partitioning is complete and the pivot can be swapped to its proper place between the two sections.

Page 36 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


QuickParition(first, last) { mid_val = data[first]; /* This is the pivot value */ i = first+1; j = last; while (i<=j) { while ((i < last) && (data[i] <= mid_val)) i++; while ((j >= first) && (data[j] > mid_val)) j--; if (i < j) swap(i,j); else i++; } if (j != first) swap(j,first); return j; } Example: 6.2 Input sequence: 34,8,64,51,32,21 Square brackets are used to demarcate sub files yet to be sorted. R1 R2 R3 R4 R5 R6 m n [34 [32 [21 [8] 8 8 8 8 8] 21 21 21 64 21] 32 32 32 32 51 34 34 34 34 34 32 [51 [51 [51 [51 51 21] 64] 64] 64] 64] [64] 1 1 1 1 5 6 6 3 2 1 6 6

Heap Sort In heap sort the file to be sorted is interpreted as a binary tree. The sorting technique is implemented using array, which is a sequential representation of binary tree. The positioning of a node is given as follows For a node at position i the parent is at position i/2, the left child is at position 2i and right child is at position 2i+1 ( 2i and 2i+1 <=n, otherwise children do not exist). Heap sort is a two stage method. In the first stage the tree representing the input data is converted into a heap. A heap can be defined as a complete binary tree with the property that the value of each node is at least as large as the value of its children nodes. This, in turn, gives the root of the tree as the largest key. In the second stage the output sequence is generated in decreasing order by outputting the root and restructuring the remaining tree into a heap.

Page 37 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example 6.3 The list of numbers 34, 8, 64, 51, 32, 21 is arranged in an array initially as in Input file of the example given below. Here the value of n is 6, hence the least parent is 6/2 = 3. Left child of 64 (index 3) is compared with largest child, since 64 > 21 it is retained in its position. Parent 8 (index 2) is compared with its largest child 51 and are interchanged since 8 < 51. Now root 31(index 1) is compared with its largest child 64 and are interchanged since 34 < 64 and is shown in initial heap.

34

64

64

51

34

51

32

21

32

21

Input File Initial Heap In fig 6.3(a) given below, the first largest number 64 which was brought into root is interchanged with the last element 21 (index 6) in the tree. For easy identification of arranged elements the edge is removed from its parent. In fig 6.3(b) given below, the same procedure is followed to bring 51 to root and is interchanged with the element in index 5. The same step is followed in fig 6.3(c) and fig 6.3(d) to get a sorted file as given in fig 6.3(e)

51

34

32

34

32

21

21

64

51

64

6.3 (a)

6.3 (b)

Page 38 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

32

21

21

32

34

51

64

34

51

64

6.3 (c)

6.3 (d)

21

32

34

51

64

6.3 (e) Sorted File Algorithm 6.3.1: Heap Sort implementation Heap is an algorithm which sorts the given set of numbers using heap sort technique. Where n is the number of elements, a is the array representation of elements in the input binary tree. The heap algorithm 6.3.1 calls adjust algorithm 6.3.2 each time when heaping is needed. heap(a,n) { Int i,t; for(i=n/2;i>=1;i--) { adjust(a,i,n); } for(i=n;i>=2;i--) { t=a[i]; a[i]=a[1]; a[i]=t; adjust(a,1,i-1); } }

Page 39 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Algorithm 6.3.2 adjust(int x[10],int i, int n) { int item, j; j=2 * i; item = x[i]; while (j <=n) { if((j<n)&&(x[j]<x[j+1])) j=j+1; if(item>=x[j]) break; x[j/2]=x[j]; j=2 * j; } x[j/2]=item; return 0; } Searching Searching is a process of locating a particular element present in a given set of elements. The element may be a record, a table, or a file. A search algorithm is an algorithm that accepts an argument a and tries to find an element whose value is a. It is possible that the search for a particular element in a set is unsuccessful if that element does not exist. There are number of techniques available for searching. Linear Search and Binary Search techniques are discussed in this session. Linear Search In Linear Search the list is searched sequentially and the position is returned if the key element to be searched is available in the list, otherwise -1 is returned.. The search in Linear Search starts at the beginning of an array and move to the end, testing for a match at each item. All the elements preceding the search element are traversed before the search element is traversed. i.e. if the element to be searched is in position 10, all elements form 1-9 are checked before 10.

Page 40 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Algorithm : Linear search implementation bool linear_search ( int *list, int size, int key, int* rec ) { // Basic Linear search bool found = false; int i; for ( i = 0; i < size; i++ ) { if ( key == list[i] ) break; } if ( i < size ) { found = true; rec = &list[i]; } return found; }

The code searches for the element through a loop starting form 0 to n. The loop can terminate in one of two ways. If the index variable i reach the end of the list, the loop condition fails. If the current item in the list matches the key, the loop is terminated early with a break statement. Then the algorithm tests the index variable to see if it is less than that size (thus the loop was terminated early and the item was found), or not (and the item was not found). Example 6.4 Assume the element 45 is searched from a sequence of sorted elements 12, 18, 25, 36, 45, 48, 50. The Linear search starts from the first element 12, since the value to be searched is not 12 (value 45), the next element 18 is compared and is also not 45, by this way all the elements before 45 are compared and when the index is 5, the element 45 is compared with the search value and is equal, hence the element is found and the element position is 5. List 12 18 25 36 12 18 25 12 18 25 12 18 25 12 18 25 36 36 36 36 i 1 2 3 4 5 Result of comparison 12 <> 45 : false 18 <> 45 : false 25 <> 45 : false 36 <> 45 : false 45 = 45 : true

45 45 45 45 45

48 48 48 48 48

50 50 50 50 50

Page 41 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Binary Search In a linear search the search is done over the entire list even if the element to be searched is not available. Some of our improvements work to minimize the cost of traversing the whole data set, but those improvements only cover up what is really a problem with the algorithm. By thinking of the data in a different way, we can make speed improvements that are much better than anything linear search can guarantee. Consider a list in sorted order. It would work to search from the beginning until an item is found or the end is reached, but it makes more sense to remove as much of the working data set as possible so that the item is found more quickly. If we started at the middle of the list we could determine which half the item is in (because the list is sorted). This effectively divides the working range in half with a single test. This in turn reduces the time complexity. Algorithm: bool Binary_Search ( int *list, int size, int key, int* rec ) { bool found = false; int low = 0, high = size - 1; while ( high >= low ) { int mid = ( low + high ) / 2; if ( key < list[mid] ) high = mid - 1; else if ( key > list[mid] ) low = mid + 1; else { found = true; rec = &list[mid]; break; } } return found; }

Page 42 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example 6.5 Binary search is applied for data in example 6.4 The active part of search is underlined List 12 18 25 12 18 25 12 18 25 i 1 5 5 j 7 7 6 Advantages Simple Elements need not be in order More efficient since the time complexity is less compared to Linear search O(log n) mid 4 6 5 Result of comparison 45 > 36 : Right part 45 < 48 : Left part 45 = 45 : Found Disadvantages Less efficient since time Complexity is more compared to Binary search -O(n) Not simple as Linear search Elements must be in order

36 36 36

45 45 45

48 48 48

50 50 50

Method of search Linear

Binary

Summary
Sorting is process of arranging elements either in ascending or descending order. This facilitates the searching faster. Bubble sorting is a sorting in which each element is compared with its adjacent elements and largest value is moved to last. Quick sorting is a sorting by partitioning. Instead of a single element a pair of elements are arrange in one swap. Heap sorting is a sorting by heaping the elements in a tree. It works with the same complexity in all its worst, best and average cases. In Linear search all the elements preceding the search element must be searched. In Binary search the middle element is compared and either the left are right part is only checked instead of all.

Page 43 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Test your Understanding


1. Which of the following sort works with same complexity in all cases a. b. c. d. Heap sort Quick sort Merge sort Bubble sort

2. Quick sort works better if the input elements are of a. Sorted order b. Jumbled order c. Reverse order d. All the above Answers 1. a 2. c

Page 44 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Session 8: Trees

Learning Objectives
After completing this chapter, you will be able to Describe a tree Explain how a tree can be represented internally Describe how a tree can be traversed

Overview:
The data structures discussed in the previous sessions like Lists, stacks, and queues, are all linear data structures. Tree is one of the several types of non-linear data structure. Tree is a collection of nodes represented in a hierarchical fashion, with a specially designated node called root. Except root all other nodes have parent in their higher hierarchy. A parent node of a particular node is the one which is in the higher hierarchy for a A node can have exactly one parent i.e. a node can be attached to exactly one node in its higher hierarchy. Example 8.1

Page 45 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


The following table depicts some of the important terminologies related to a general tree structure. Term Node Root Sub tree Leaf Edge Path Length Height Depth Degree of a node Degree of a tree Description An item or single element represented in a tree Node that does not have any ancestors (parent or Grandparent Internal nodes in a tree which has both ancestor(parent) and descendant(child) External nodes that descendant(child) does not have any Example A,B,C.,H A B,C,D E,F,G,H (A-B),(A-C) A-B-E for E from root 2 for E from B 3 2 for D 3 for A, 1 for B,D, 2 for C and 0 for leaves 3 (since A has maximum degree)

The line depicts the connectivity between two nodes Sequence of nodes connected Number of nodes involved in the path Length of the longest path from the root Length of the path to that node from the root Number of children connected from that node Degree of a node which has maximum degree

Some applications of trees are: representing family genealogy as the underlying structure in decision-making algorithms to represent priority queues (a special kind of tree called a heap) to provide fast access to information in a database (a special kind of tree called a btree) Binary Tree Binary tree is a finite set of nodes which either empty, or consist of a root and two disjoint binary trees, called the left and right sub-trees. In other words it can be defined as a tree in which all the nodes can have 2 as a maximum degree i.e. a node can have maximum two children. A binary tree differs from a general tree in the following aspects: A tree must have at least one node but a binary tree may be empty. A tree may have any number of sub-trees but a binary tree can have at most two.

Page 46 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example 8.2

Full Binary tree: A binary tree in which all its leaf nodes are in the same level is called a full binary tree. Example 8.3

Complete Binary tree A binary tree in which the array representation is contiguous without any null pointers in between is a complete binary tree.

Page 47 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example 8.4

Array representation of the above tree is : 0 A

1 B

2 C

3 D

4 E

In a binary tree the maximum number of nodes at level i (level of the root node is 1) is equal to 2i-1 and the maximum number of nodes till level i is equal to 2i 1 Example 8.5 In example 8.2 Number of nodes at level 2 is 22-1 = 2 Number of nodes at level 3 is 23-1 = 4 Maximum number of nodes till level 2 is 2 -1 = 3 Skewed binary tree A binary tree is a skewed binary tree, if it has only left child (skewed left) or only right (skewed right) child for all its internal nodes.
2

Page 48 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example 8.6

Skewed left

Skewed right

Tree Representation

A binary tree can be represented in two ways and are 1. Array representation 2. Linked list representation Array representation The binary tree can be represented as we have discussed in the heap sort. Since a binary-tree node never has more than two children, a node can be represented with 3 fields as one field for the data in the node in remaining two fields for two child pointers.

Left child

Data

Right Child

Programming representation of node is as follows. Struct BinaryTreenode { Struct BinaryTreenode * leftChild; Char data; Struct BinaryTreenode * rightChild; };

Many algorithms pertaining to tree structures usually involve a process in which each node of the tree is visited, or processed, exactly once. Such a process is called a traversal.

Page 49 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Tree Traversals A tree can be traversed in three different ways and are Inorder traversal Preorder traversal Postorder traversal. In all the traversal types the order of left and right sub tree are not changed i.e. always the left sub tree is traversed before the right sub tree. The type of traversal is decided based on the position of the data. In preorder traversal the data is traversed before its sub trees are traversed. In post order traversal the data is traversed after its sub trees are traversed. In inorder traversal the data is traversed between its sub trees. Simple steps in traversals Preorder traversal o o o o o o Visit the root Traverse the left sub-tree in preorder Traverse the right sub-tree in preorder Traverse the left sub-tree in inorder Visit the root Traverse the right subtree in inorder

Inorder traversal

Postorder traversal o Traverse the left subtree in postorder o Traverse the right subtree in postorder o Visit the root

Page 50 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example 8.7

Inorder traversal

:DBEAIHJFCG

Preorder traversal : A B D E C F H I J G Postorder traversal : D E B I J H F G C A Algorithms for the tree traversals Inorder traversal void inorder(struct btreenode *sr) { if(sr!=NULL) { inorder (sr->left); printf(%d\n, sr->data); inorder (sr ->right); } }

Page 51 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Preorder traversal void preorder(struct btreenode *sr) { if(sr!=NULL) { printf(d\n, sr->data); preorder(sr -> left); preorder (sr ->right); } }

Postorder traversal void postorder(struct btreenode *sr) { if(sr!=NULL) { postorder(sr -> left); postorder (sr ->right); printf(d\n, sr->data); } }

Binary Search Tree (BST) BST is a binary tree which has the following properties. All elements stored in the left subtree of a node whose value is K have values less than K. All elements stored in the right subtree of a node whose value is K have values greater than or equal to K. That is, a nodes left child must have a key less than its parent, and a nodes right child must have a key greater or equal to its parent The left and right sub trees of a node is also a binary search tree

Page 52 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example 8.8

63

47

71

54

67

84

79

91

Operations that can be performed on a BST are: Creation Insertion Deletion Searching Creation The first element in the list is made as the root of the node. The elements following first are placed in its left sub tree if they are less than the root and are placed in its right sub tree if they are greater than the root. In other words we can state that creation is a combination of search and insertion after the of root node. Searching The search is always carried from the root node, if the node to be searched is less than the root value then the left sub tree is searched. If the search value is greater than the node value then the right sub tree is searched. The search is continued till the search node is found or till the search is ended without any branch to proceed. Insertion Steps involved in inserting a node are Search for the node that has to be inserted (though it is not available) in the tree. If the search ended at a node x insert the new node as its left child if the new node is less than X, otherwise insert as its right child.

Page 53 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example 8.9: Inserting 15 in BST The dotted line represents the search and the dotted circle represents the newly added node.

63

47

71

54

67

84

15

79

91

15 is greater than 6 hence it is joined as its right child. Deletion The node which has to deleted is first searched from the root to find its position. The deletion operation is easier if the node which has to deleted is a leaf node. The link from its parent is disconnected in order to delete that node. If the node is a non leaf node the deletion is carried as below. If the non leaf node has a single sub tree then the child node is replaced in its place. If the non leaf node has both left and right sub tree then either the in order successor or the predecessor is replaced in its place.(i.e. the greatest left descendent or the smallest right descendent) Example 8.10 : Deleting 71 from example 8.9 The dotted line represents the search and the dotted circle represents the node to be deleted.

Page 54 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

63

47

71

54

67

84

15

79

91

The node 71 is replaced either by its left or right descendent

63

63

47

67

47

79

54

84

54

67

84

15

79

91

15

91

Replaced by its left descendant

Replaced by its left descendant

Advantage of a BST Searching a node in a BST is faster, since either left or right sub tree is only searched from the root till the node is found instead of comparing all the nodes preceding it. Disadvantage of a BST The tree may be a skewed binary tree if the elements are either in ascending(skewed left) or in descending(skewed right) order, which lead to more levels.

Page 55 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Summary
Tree is collection of nodes arranged in hierarchical fashion Binary tree is tree with 2 as its maximum degree Tree can be represented either using an array or linked list Tree can be traversed in 3 ways Binary search tree is a binary tree in which a node can have all its left descendants as less than that and right as greater than that.

Test your Understanding


1. A complete binary tree is a tree in which ---a. All the leaf nodes are in the same level b. All the parent nodes have exactly two children c. The representation is contiguous without any null branch in between d. None of the above 2. Binary search tree must be a ---a. Complete binary tree b. Full binary tree c. Either a or b d. Need not be a or b Answers 1. c 2. d

Page 56 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Session 10: Balanced trees and hashing

Learning Objectives
After completing this chapter you will be able to Define a balanced tree Identify how a balanced tree can be constructed from a Binary tree Define hashing List the advantages and disadvantages of Hashing

Overview:
Balanced trees are classified into two categories Height Balanced tree Weight Balanced tree AVL Tree An AVL tree is a height balanced Binary Search Tree. The number of null branches is more in a normal BST if the elements are almost in order, this leads to more levels and in turn need more space. This problem is solved by balancing the height whenever a node is inserted into an AVL tree. The re-balancing is recommended based on the balancing factor. Balancing factor Balancing factor of each node is calculated by finding the difference in levels between the left and right sub tree. Balancing factor of X = height of left sub tree of X - height of right sub tree of X If the balancing factor of all the nodes in the tree is within the range of -1 and 1, then the tree is already in balanced form, otherwise balancing is needed. AVL Tree Rotations As mentioned previously, an AVL Tree and the nodes it contains must meet strict balance requirements to maintain its O(log n) search capabilities. These balance restrictions are maintained using various rotation functions. Below is a diagrammatic overview of the four possible rotations that can be performed on an unbalanced AVL Tree, illustrating the before and after states of an AVL Tree requiring the rotation.

Page 57 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example 10.1: LL Rotation

Example 10.2: RR Rotation

Page 58 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Example 10.3: LR Rotation

Example 10.4: RL Rotations

Page 59 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Inserting in an AVL Tree Nodes are initially inserted into AVL Trees in the same manner as an ordinary binary search tree (that is, they are always inserted as leaf nodes). After insertion, however, the insertion algorithm for an AVL Tree travels back along the path it took to find the point of insertion, and checks the balance at each node on the path. If a node is found that is unbalanced (that is, it has a balance factor of either -2 or +2), then a rotation is performed based on the inserted nodes position relative to the node being examined (the unbalanced node). NB. There will ever be at most one rotation required after an insert operation. Example: 10.5: Constructing an AVL tree for the list of elements 50, 45, 30, 55, 63, 53 The upper part of the node represents the balancing factor and the lower part represents data. LL rotation Insert 50, 45, 30 Insert 55 Insert 63

-2 45 0 2 50 1 45 0 30 -1 45 -1 50 -1 55 0 63 30 -2 50

0 30

0 55

Page 60 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


RR Rotation Insert 53 RL Rotation

-2 45 0 -1 45 0 30 0 55 -1 50 0 63 0 30 30 1 55 1 45

0 50 0 55

0 53

0 63

0 50

0 63

0 53

Deletion in AVL tree The deletion algorithm for AVL Trees is a little more complex, as there are several extra steps involved in the deletion of a node. If the node is not a leaf node (that is, it has at least one child), then the node must be swapped with either it's in-order successor or predecessor (based on availability). Once the node has been swapped we can delete it (and have its parent pick up any children it may have - bear in mind that it will only ever have at most one child). If a deletion node was originally a leaf node, then it can simply be removed. Now, as with the insertion algorithm, we traverse back up the path to the root node, checking the balance of all nodes along the path. If we encounter an unbalanced node we perform an appropriate rotation to balance the node. NB. Unlike the insertion algorithm, more than one rotation may be required after a delete operation, so in some cases we will have to continue back up the tree after a rotation.

Weight Balanced Trees Tree structures support various basic dynamic set operations including Search, Predecessor, Successor, Minimum, Maximum, Insert, and Delete in time proportional to the height of the tree. Ideally, a tree will be balanced and the height will be log n where n is the number of nodes in the tree. To ensure that the height of the tree is as small as possible and therefore provide the best running time, a balanced tree structure like a red-black tree, AVL tree, or b-tree must be used. When working with large sets of data, it is often not possible or desirable to maintain the entire structure in primary storage (RAM). Instead, a relatively small portion of the data structure is

Page 61 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


maintained in primary storage, and additional data is read from secondary storage as needed. Unfortunately, a magnetic disk, the most common form of secondary storage, is significantly slower than random access memory (RAM). In fact, the system often spends more time in retrieving data than actually processing data. B-trees are weight balanced trees that are optimized for situations when part or the entire tree must be maintained in secondary storage such as a magnetic disk. Since disk accesses are expensive (time consuming) operations, a b-tree tries to minimize the number of disk accesses. For example, a b-tree with a height of 2 and a branching factor of 1001 can store over one billion keys but requires at most two disk accesses to search for any node B-Trees The Structure of B-Trees Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children. The keys are stored in non-decreasing order. Each key has an associated child that is the root of a subtree containing all nodes with keys less than or equal to the key but greater than the preceding key. A node also has an additional rightmost child that is the root for a subtree containing all keys greater than any keys in the node. A b-tree has a minimum number of allowable children for each node known as the minimization factor. If t is this minimization factor, every node must have at least t - 1 keys. Under certain circumstances, the root node is allowed to violate this property by having fewer than t - 1 keys. Every node may have at most 2t - 1 keys or, equivalently, 2t children. Since each node tends to have a large branching factor (a large number of children), it is typically necessary to traverse relatively few nodes before locating the desired key. If access to each node requires a disk access, then a b-tree will minimize the number of disk accesses required. The minimization factor is usually chosen so that the total size of each node corresponds to a multiple of the block size of the underlying storage device. This choice simplifies and optimizes disk access. Consequently, a b-tree is an ideal data structure for situations where all data cannot reside in primary storage and accesses to secondary storage are comparatively expensive (or time consuming). Height of B-Trees For n greater than or equal to one, the height of an n-key b-tree T of height h with a minimum degree t greater than or equal to 2,

The worst case height is O(log n). Since the "branchiness" of a b-tree can be large compared to many other balanced tree structures, the base of the logarithm tends to be large; therefore, the number of nodes visited during a search tends to be smaller than required by other tree structures. Although this does not affect the asymptotic worst case height, b-trees tend to have smaller heights than other trees with the same asymptotic height. Operations on B-Trees The algorithms for the search, create, and insert operations are shown below. Note that these algorithms are single pass; in other words, they do not traverse back up the tree. Since b-trees

Page 62 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


strive to minimize disk accesses and the nodes are usually stored on disk, this single-pass approach will reduce the number of node visits and thus the number of disk accesses. Simpler double-pass approaches that move back up the tree to fix violations are possible. Since all nodes are assumed to be stored in secondary storage (disk) rather than primary storage (memory), all references to a given node be preceded by a read operation denoted by Disk-Read. Similarly, once a node is modified and it is no longer needed, it must be written out to secondary storage with a write operation denoted by Disk-Write. The algorithms below assume that all nodes referenced in parameters have already had a corresponding Disk-Read operation. New nodes are created and assigned storage with the Allocate-Node call. The implementation details of the DiskRead, Disk-Write, and Allocate-Node functions are operating system and implementation dependent.

B-Tree-Search(x, k) i <- 1 while i <= n[x] and k > keyi[x] do i <- i + 1 if i <= n[x] and k = keyi[x] then return (x, i) if leaf[x] then return NIL else Disk-Read(ci[x]) return B-Tree-Search(ci[x], k) The search operation on a b-tree is analogous to a search on a binary tree. Instead of choosing between a left and a right child as in a binary tree, a b-tree search must make an n-way choice. The correct child is chosen by performing a linear search of the values in the node. After finding the value greater than or equal to the desired value, the child pointer to the immediate left of that value is followed. If all values are less than the desired value, the rightmost child pointer is followed. Of course, the search can be terminated as soon as the desired node is found. Since the running time of the search operation depends upon the height of the tree, B-Tree-Search is O(logt n). B-Tree-Create(T) x <- Allocate-Node() leaf[x] <- TRUE n[x] <- 0 Disk-Write(x) root[T] <- x The B-Tree-Create operation creates an empty b-tree by allocating a new root node that has no keys and is a leaf node. Only the root node is permitted to have these properties; all other nodes must meet the criteria outlined previously. The B-Tree-Create operation runs in time O(1). B-Tree-Split-Child(x, i, y) z <- Allocate-Node() leaf[z] <- leaf[y]
Page 63 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


n[z] <- t - 1 for j <- 1 to t - 1 do keyj[z] <- keyj+t[y] if not leaf[y] then for j <- 1 to t do cj[z] <- cj+t[y] n[y] <- t - 1 for j <- n[x] + 1 downto i + 1 do cj+1[x] <- cj[x] ci+1 <- z for j <- n[x] downto i do keyj+1[x] <- keyj[x] keyi[x] <- keyt[y] n[x] <- n[x] + 1 Disk-Write(y) Disk-Write(z) Disk-Write(x) If is node becomes "too full," it is necessary to perform a split operation. The split operation moves the median key of node x into its parent y where x is the i child of y. A new node, z, is allocated, and all keys in x right of the median key are moved to z. The keys left of the median key remain in the original node x. The new node, z, becomes the child immediately to the right of the median key that was moved to the parent y, and the original node, x, becomes the child immediately to the left of the median key that was moved into the parent y. The split operation transforms a full node with 2t - 1 key into two nodes with t - 1 key each. Note that one key is moved into the parent node. The B-Tree-Split-Child algorithm will run in time O(t) where t is constant. B-Tree-Insert(T, k) r <- root[T] if n[r] = 2t - 1 then s <- Allocate-Node() root[T] <- s leaf[s] <- FALSE n[s] <- 0 c1 <- r B-Tree-Split-Child(s, 1, r) B-Tree-Insert-Nonfull(s, k) else B-Tree-Insert-Nonfull(r, k) B-Tree-Insert-Nonfull(x, k) i <- n[x] if leaf[x] then while i >= 1 and k < keyi[x] do keyi+1[x] <- keyi[x]
th

Page 64 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


i <- i - 1 keyi+1[x] <- k n[x] <- n[x] + 1 Disk-Write(x) else while i >= and k < keyi[x] do i <- i - 1 i <- i + 1 Disk-Read(ci[x]) if n[ci[x]] = 2t - 1 then B-Tree-Split-Child(x, i, ci[x]) if k > keyi[x] then i <- i + 1 B-Tree-Insert-Nonfull(ci[x], k) To perform an insertion on a b-tree, the appropriate node for the key must be located using an algorithm similar to B-Tree-Search. Next, the key must be inserted into the node. If the node is not full prior to the insertion, no special action is required; however, if the node is full, the node must be split to make room for the new key. Since splitting the node results in moving one key to the parent node, the parent node must not be full or another split operation is required. This process may repeat all the way up to the root and may require splitting the root node. This approach requires two passes. The first pass locates the node where the key should be inserted; the second pass performs any required splits on the ancestor nodes. Since each access to a node may correspond to a costly disk access, it is desirable to avoid the second pass by ensuring that the parent node is never full. To accomplish this, the presented algorithm splits any full nodes encountered while descending the tree. Although this approach may result in unnecessary split operations, it guarantees that the parent never needs to be split and eliminates the need for a second pass up the tree. Since a split runs in linear time, it has little effect on the O(t logt n) running time of B-Tree-Insert. Splitting the root node is handled as a special case since a new root must be created to contain the median key of the old root. Observe that a b-tree will grow from the top. B-Tree-Delete Deletion of a key from a b-tree is possible; however, special care must be taken to ensure that the properties of a b-tree are maintained. Several cases must be considered. If the deletion reduces the number of keys in a node below the minimum degree of the tree, this violation must be corrected by combining several nodes and possibly reducing the height of the tree. If the key has children, the children must be rearranged.

Page 65 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


B-Tree Insertion 10 17 25 9 13 Underlined elements are newly added 16 8 5 15 22

10

10

17

17

10

25

17

10

17

10

17

9 10

25

13

25

13 16

25

10 10 17 8 8 9 13 25 17

13 16

25

Page 66 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

10

15 17

13

16

25

10

15 17

13

16

22

25

After deleting 16 from the above B-Tree

10

15

22

13

17

25

Page 67 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C Hashing


Hashing is a technique which improvises the speed of search by calculating the address of the search element directly using a mathematical formula instead of searching it. Symbol Table Symbol table is a dictionary of ADT used in a program. It is a set of names and attributes. The characteristics of the name attributes vary depend upon its application. Name : Identifier Attribute : Initial value, list of lines using that id, etc

The possible operations in a symbol table are Search if a particular name is in table Retrieve attribute of that name Modify the name and attributes Insert a new name and attributes Delete a name and attributes Hashing techniques are used to search, insert, and delete the items (name & attributes). Unlike identifier comparisons to perform a search, hashing technique uses a formula called hash function h(x). The hashing technique can be classified into two types Static Hashing Dynamic Hashing Static Hashing: In Static hashing the identifiers are stored in a fixed sized table called the hash table. The table size cannot be altered in this hashing. Dynamic Hashing: In dynamic hashing the identifiers are stored in a dynamic sized table called the hash table. The table size can be altered in this hashing. The arithmetic function h(x) gives the address of x in the table. The address is named as hash address or home address. Overflow: A new key k1 is mapped or hashed into a full table. If the mapping results in a table which is already full, then it cannot be inserted into that table, this type of situation is called overflow.

Hash Collision: When two different keys are resulting in same address after a hash function is termed as collision. Suppose that two keys k1 and k2 are such that h(k1) equals h(k2). Then when a record with key k1 is entered into the table, it is inserted at position h(k1). But when k2 is hashed, because its hash key is the same as that of k2, an attempt may be made to insert the record into the same position where the record with key k1 is stored. Cleary, two records cannot occupy the same position. Such a situation is called a hash collision or hash clash. Hash collision can be avoided through rehashing and double hashing.

Page 68 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


There are several kinds of hash functions, four of them are Mid - Square Method. Division Folding Digit analysis (Radix) Mid- Square Method. In this method the key value of the id x is squared and the bits form the middle part will be considered for the address. Since the square depends on the entire digits of the key the address will be usually unique even if some digits are same. A = 67 A2 = 448910 = 106118 Mid A = 061 is the address. Actually the binary bits will be calculated for address. Division The key value is divided by a hash and modulo is taken as id address f D(X) = X mod M The function returns the bucket address 0 through M-1 and so the hash table is at least of size b = M. If M is powers of 2 then h D (x) depends only on least significant bits LSB (x), since programmer tendency is keeping variable with same suffix, it results in many collisions. If M is divisible by 2, then Odd keys will be mapped to odd buckets and even keys to even buckets. This causes the hash table biased and increase in collision. These difficulties can be avoided by making M as prime hash, and then only the factors of M will be M and 1.

Folding The key x is divided in to several parts and are added together to get the final result of hashing. Two types of folding methods are available and are: Shift folding Folding at the boundaries In shift folding the parts are simply added together. Example: 74568392 74 + 56 + 83 + 92 305 Folding at the boundaries (Reverse Folding) Parts in even position are reversed and then the values are added together. Example: 74 + 56 + 83 + 92 74 + 65 + 83 + 29 => 242

Page 69 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Digit Analysis: This type of hashing is useful in the case of a static file, i.e. all the identifiers in a table are known in advance. Each id x is interpreted as a hash with the radix r. The same radix is used for all ids in the table. Using this radix, the digits of each id are examined. Digits having most skewed distribution are deleted. Enough digits are deleted, so that the hash of remaining digits is small enough to give an address in the range of the hash table. To Manage Overflow The size can be doubled, but this is wasteful Adding new page to the end and dividing the id at one between the original and new page. But this will complicate the family of hash function The new id is joined as an overflow and the new page is created at the end and the first page ids get rehash. But sometimes no id from first will go to new page, this results in un-uniform hash function. The pages (from 1 according to hash of new page this is if n new pages added then n pages from 1 will be rehashed) to be rehashed and the new pages are addressed using 3 bits. The pages with overflow are addressed with r+1 bits and the pages without overflow are retained with r bits itself.

Summary
Balanced Tree is a tree in which the number levels are minimized by balancing the height or weight. AVL tree is a height balanced tree, balancing is done through four possible rotations. B-tree is a weight balanced tree, balancing is done to maintain number of elements and sub trees in each node. Hashing is the process of calculating the address of the item using a mathematical formula instead of searching.

Test your Understanding


1) In an AVL tree, if the balance factors are -2 and 2, the tree has to be rotated using a) b) c) d) Right Left Right Right Left Right Left Left

2) Which of the following is not a hashing method a) Mid-Square b) Radix c) Folding d) None of the above Answers 1) a 2) d

Page 70 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Session 11: Graphs

Learning Objectives
After completing this chapter, you will be able to: Represent the graph using array and Linked list Traverse the graph Calculate minimum cost spanning tree Calculate the shortest route from source to all other nodes

Graphs
Introduction Graph is a collection of nodes or vertices connected together through edges or arcs. Graphs are used to model electrical circuits, chemical compounds, highway maps, and so on. They are also used in the analysis of electrical circuits, finding the shortest route, project planning, linguistics, genetics, social science, and so forth. Graph Definitions and Notations A graph G is a pair, G = (V, E), where V is a finite nonempty set, called the set of vertices of G. E is called the set of edges. Let V(G) denote the set of vertices, and E(G) denote the set of edges of a graph G. If the elements of E(G) are ordered pairs, G is called a directed graph or digraph; otherwise, G is called an undirected graph. In an undirected graph, the pairs (u, v) and (v, u) represent the same edge. Let G be a graph. A graph H is called a sub-graph of G if V(H) V(G) and E(H) E(G); that is, every vertex of H is a vertex of G, and every edge in H is an edge in G. A graph can be shown pictorially. The vertices are drawn as circles, and a label inside the circle represents the vertex. In an undirected graph, the edges are drawn using lines. In a directed graph, the edges are drawn using arrows.

Page 71 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Example: 11.1 Undirected graph Directed graph

(a) (b) Let G be an undirected graph. Let u and v be two vertices in G. Then u and v are called adjacent if there is an edge from one to the other; that is, (u, v) E. Let e = (u, v) be an edge in G. We then say that edge e is incident on the vertices u and v. An edge incident on a single vertex is called a loop. If two edges, e1 and e2, are associated with the same pair of vertices, then e1 and e2 are called parallel edges. A graph is called a simple graph if it has no loops and no parallel edges. There is a path from u to v if there is a sequence of vertices u1, u2, ..., un such that u = u1, un = v, and (ui, ui + 1) is an edge for all i = 1, 2, ..., n 1.Vertices u and v are called connected if there is a path from u to v. A simple path is a path in which all the vertices, except possibly the first and last vertices, are distinct. A cycle in G is a simple path in which the first and last vertices are the same. G is called connected if there is a path from any vertex to any other vertex. A maximal subset of connected vertices is called a component of G. Let G be a directed graph, and let u and v be two vertices in G. If there is an edge from u to v, that is, (u, v) E, then we say that u is adjacent to v and v is adjacent from u. The definitions of the paths and cycles in G are similar to those for undirected graphs. G is called strongly connected if any two vertices in G are connected. Graph Representation A graph can be represented in several ways. Two common ways: adjacency matrices and adjacency lists. Adjacency Matrix Let G be a graph with n vertices, where n > 0. Let V(G) = {v1, v2, ..., vn}.The adjacency matrix AG is a two dimensional matrix n x n matrix such that the (i, j)th entry of AG is 1 if there is an edge from vi to vj; otherwise, the (i, j)th entry is zero. Example 11.2: Adjacency Matrix for graphs 11.1 (a) and (b) A A 0 B 1 C 1 D 1 E 0 B 1 0 0 0 C 1 0 0 0 D 1 0 0 0 1 E 0 1 1 1 0 A B A 0 1 B 0 0 C 0 0 D 1 0 E 0 C 1 1 0 0 D 0 0 0 0 0 E 0 0 1 1 0

1 1 (a)

1 0 (b)

Page 72 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Adjacency Lists Let G be a graph with n vertices, where n > 0. Let V(G) = {v1, v2, ..., vn}. In the adjacency list representation, corresponding to each vertex, v, there is a linked list such that each node of the linked list contains the vertex u, such that (v, u) E(G). Because there are n nodes, we use an array, A, of size n, such that A[i] is a reference variable pointing to the first node of the linked list containing the vertices to which vi is adjacent. Each node has two components, say vertex and link. The component vertex contains the index of the vertex adjacent to vertex i. Example 11.3: Adjacency list of graph in example 11.1

Operations on Graphs The operations commonly performed on a graph are as follows: Create the graph. That is, store the graph in computer memory using a particular graph representation. Clear the graph. This operation makes the graph empty. Determine whether the graph is empty. Traverse the graph. Print the graph. How a graph is represented in computer memory depends on the specific application. For illustration purposes, we use the adjacency list (linked list) representation of graphs. Therefore, for each vertex, v, the vertices adjacent to v (in a directed graph, also called the immediate successors) is stored in the linked list associated with v.

Page 73 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Graph Traversals Processing a graph requires the ability to traverse the graph. Traversing a graph is similar to traversing a binary tree, except that traversing a graph is a bit more complicated. Recall that a binary tree has no cycles. Also, starting at the root node, we can traverse the entire tree. On the other hand, a graph might have cycles and we might not be able to traverse the entire graph from a single vertex (for example, if the graph is not connected). Therefore, we must keep track of the vertices that have been visited. We must also traverse the graph from each vertex (that has not been visited) of the graph. This ensures that the entire graph is traversed. The two most common graph traversal algorithms are the depth first traversal and breadth first traversal, which are described next. For simplicity, we assume that when a vertex is visited, its index is output. Moreover, each vertex is visited only once. We use the bool array visited to keep track of the visited vertices. Depth First Traversal The depth first traversal is similar to the preorder traversal of a binary tree. An initial or source vertex is identified to start traversing, then from that vertex any one vertex which is adjacent to the current vertex is traversed i.e. only one adjacent vertex is traversed from the vertex which had been traversed last. The general algorithm is: for each vertex v in the graph if v is not visited start the depth first traversal at v The general algorithm to do a depth first traversal at a given node v is: 1. Mark node v as visited 2. Visit the node 3. For each vertex u adjacent to v a. if u is not visited b. start the depth first traversal at u c. Clearly, this is a recursive algorithm. Breadth First Traversal The breadth first traversal of a graph is similar to traversing a binary tree level by level (the nodes at each level are visited from left to right).All the nodes at any level, i, are visited before visiting the nodes at level i + 1. As in the case of the depth first traversal, because it might not be possible to traverse the entire graph from a single vertex, the breadth first traversal also traverses the graph from each vertex that is not visited. Starting at the first vertex, the graph is traversed as much as possible; we then go to the next vertex that has not been visited. In other words it can be stated as all vertices that are adjacent to the current vertex are traversed first. To implement the breadth first search algorithm, we use a queue.

Page 74 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


The general algorithm is: a.for each vertex v in the graph if v is not visited add v to the queue // start the breadth first search at v b. Mark v as visited c. while the queue is not empty c.1. Remove vertex u from the queue c.2. Retrieve the vertices adjacent to u c.3. for each vertex w that is adjacent to u if w is not visited c.3.1. Add w to the queue c.3.2. Mark w as visited Example 11.4 The Depth first search for the above undirected graph in example 11.1 A, B, E, C, D The Depth first search for the above undirected graph in example 11.1 A, B, C, D, E

Shortest Path Algorithm Shortest path can be calculated only for the weighted graphs. The edges connecting two vertices can be assigned a nonnegative real number, called the weight of the edge. A graph with such weighted edges is called a weighted graph. Let G be a weighted graph. Let u and v be two vertices in G, and let P be a path in G from u to v. The weight of the path P is the sum of the weights of all the edges on the path P, which is also called the weight of v from u via P. Let G be a weighted graph representing a highway structure. Suppose that the weight of an edge represents the travel time. For example, to plan monthly business trips, a salesperson wants to find the shortest path (that is, the path with the smallest weight) from her or his city to every other city in the graph. Many such problems exist in which we want to find the shortest path from a given vertex, called the source, to every other vertex in the graph. This section describes the shortest path algorithm, also called the greedy algorithm, developed by Dijkstra. Shortest Path Given a vertex, say vertex (that is, a source), this section describes the shortest path algorithm. The general algorithm is: 1. Initialize the array smallestWeight so that smallestWeight[u] = weights[vertex, u]. 2. Set smallestWeight[vertex] = 0. 3. Find the vertex, v, that is closest to vertex for which the shortest path has not been determined. 4. Mark v as the (next) vertex for which the smallest weight is found.

Page 75 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


5. For each vertex w in G, such that the shortest path from vertex to w has not been determined and an edge (v, w) exists, if the weight of the path to w via v is smaller than its current weight, update the weight of w to the weight of v + the weight of the edge (v, w). Because there are n vertices, repeat Steps 3 through 5, n 1 times.

Example 11.5: Shortest Path

B 1 5 A

2 C

SOURCE

A Edge B C D Cost 1 2 5 Path A-B A-C A-D

Direct Cost Select A-B Edge B C D Cost 1 2 3 Path A-B A-C A-B-D

Page 76 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Therefore A-B-D (3) < A-D (5) Adjusted from B Select A-C Edge B C D Therefore A-B-D (3) < A-D(5) Minimum Spanning Trees A spanning tree of a graph, G, is a set of |V|-1 edges that connect all vertices of the graph. Suppose we have a group of islands that we wish to link with bridges so that it is possible to travel from one island to any other in the group. Further suppose that (as usual) our government wishes to spend the absolute minimum amount on this project (because other factors like the cost of using, maintaining, etc, these bridges will probably be the responsibility of some future government). The engineers are able to produce a cost for a bridge linking each possible pair of islands. The set of bridges which will enable one to travel from any island to any other at minimum capital cost to the government is the minimum spanning tree. In general, it is possible to construct multiple spanning trees for a graph, G. If a cost, cij, is associated with each edge, eij = (vi,vj), then the minimum spanning tree is the set of edges, Espan, forming a spanning tree, such that: C = sum( cij | all eij in Espan ) is a minimum. Kruskal's Algorithm This algorithm creates a forest of trees. Initially the forest consists of n single node trees (and no edges). At each step, we add one (the cheapest one) edge so that it joins two trees together. If it were to form a cycle, it would simply link two nodes that were already part of a single connected tree, so that this edge would not be needed. The basic algorithm looks like this: The steps are: 1. Construct a forest - with each node in a separate tree. 2. Place the edges in a priority queue. 3. Until we've added n-1 edges, i. Continue extracting the cheapest edge from the queue, until we find one that does not form a cycle, ii. Add it to the forest. Adding it to the forest will join two trees together. Every step joins two trees in the forest together, so that, at the end, only one tree will remain in T. Cost 1 2 3 Path A-B A-C A-B-D

Page 77 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


The following sequence of diagrams illustrates Kruskal's algorithm in operation.

Example 11.6 Kruskals Algorithm

4 1 A

B 1 2 C E

2 D

First edge A-C is selected

1 A C

Page 78 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Second edge B-E is selected

B 1 1 A C E

Third edge A-D is selected

1 A C E

2 D

Page 79 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Fourth edge C-D is selected

1 2

1 A C

2 E

Summary
Graph is a collection of nodes Connected together using edges. Graph can be traversed using DFS or BFS Shortest path for a vertex with other vertices can be calculated using Dijkstras algorithm. Spanning tree is an acyclic graph. Minimum cost spanning trees can be derived using Kruskals algorithm.

Test your Understanding


1. In a directed graph of n nodes, if the number of edges are ----- the graph is completed graph a. 4n*(n-1) b. n*(n-1)/2 c. n d. 2n 2. The drawback of using array representation is ---a. Less memory utilized if number of edges are less b. Can not find the in-degree and out-degree of a node c. Both a and b d. Neither a nor b Answers 1. b 2. b

Page 80 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

Glossary
Abstract data type Acyclic graph Adjacency list Adjacency matrix Adjacent Arc AVL Tree A formal, language-independent description of data elements, the relationships among then, and the operations that act upon them. A graph without any cycles. A method using linked list to represent a the edges of graph or network. A method that uses a matrix to represent the edges of a graph or network. Two nodes of a graph are adjacent if they are connected by an edge. The edge of a graph that establishes a directional orientation between its end point. A tree in which, for each node, the difference between the height of its left sub tree and the height of its right sub tree is at most one. For a node in a binary tree, the difference between the height of its left sub tree and the height of its right sub tree. A technique in which the time and the space requirements of an algorithm are estimated in order of magnitude terms, The process examining a middle value of a sorted array to see which half contains the value in question and continuing to halve until the value is located. A Binary tree with the ordering property. A tree in which each node has at most two sub trees. A visiting of all nodes in a graph, it proceeds from each node by first visiting all nodes adjacent to that node. An efficient, flexible index structure often used in DBMS on random access files. Rearranges elements of an array until they are in either ascending or descending order. In bucket hashing contiguous region of storage locations. Nodes pointed to by an element in a tree. A linked list in which the lost node of the list points to the first node in the list. Occurs when a collision resolution strategy causes keys that have a collision at an initial hashing position to be relocated to the same region within the storage space. Condition in which more than one key hashes to the same position with a given hashing function. A path of a graph which originates and terminates at the same node.

Balanced Factor Big-O analysis Binary search

Binary search tree Binary tree Breadth first search B-tree Bubble sort Bucket Children Circular linked list Clustering

Collision Cycle

Page 81 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Data Structure Degree Depth first search An abstraction of the elementary data types provided by a language. The number of edges of a graph or tree for which the node is an end point. A visiting of all nodes in a graph, it proceeds from each node by first visiting one node adjacent to that node.

Digraph/Directed graph A Graph in which each edge establishes a directional orientation between its end points. Dijkstras algorithm Directed path An algorithm for finding the shortest path between two nodes in a graph. A sequence of directed edges from one node of a graph to another. Each pair of successive edges in the path contains a end point. A collision processing method in which a second hashing is used to determine a sequence of storage locations to examine until an available spot is found. A linked list in which each node has two pointers instead of one. One pointer points to the node preceding that node and the points to the node following that node in that list. Establishes a link between two nodes. A method of constructing a hashing function in cases where the key is not an integer value. The non numeric characters are removed and the remaining digits are combined to produce an integer value. A binary tree in which all the leaf nodes are in the same level. A structure composed of two sets of objects: a set of nodes and a set of edges. A density dependent search technique in the key for a given data item is transformed using a hash function to produce the address. Sort in which the array is treated like the array implementation of binary tree and the items are repeatedly manipulated to create a heap from which the root is removed and added to the sorted portion of the array. A technique for ensuring that an ordered binary tree remains as full as possible in form. The number of directed edges that terminate at the node. A binary tree traversal in which the order of traversal is left sub tree, node and right sub tree. An algorithm for finding the minimum spanning tree of a graph. In a tree a node that has no children. Number of edges in a path.

Double hashing

Double linked list

Edge Folding

Full binary tree Graph Hashing Heap sort

Height balancing In-degree of a node In-order traversal Kruskals algorithm Leaf Length

Page 82 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C


Level Linear search All nodes in a tree whose path are the same length from the root node. The process of examining the first element in a list and proceeding to examine the elements in the order until a match is found. A pointer from one node to another. Collection of nodes connected through a pointer in a linear fashion. Each node is divided into two parts as data and link.

Link Linked List

Minimum spanning tree A collection of edges connecting all of the nodes of a graph without any cycle. Node Out degree of node Overflow Parent Partition A structure storing a data item in a linked list, tree, graph. A number of directed edges that originate at the node. In linked collision processing the area in which keys that cause collisions are placed. In a tree the node that is pointing to its children. In quick sort the process of moving the pivot to the location where it belongs in the sorted array and arranging the remaining data items to the left of pivot if they are less than or equal to the pivot and to the right if they are greater than or equal to the pivot. A sequence of edges that connects two nodes in a graph. Item used to direct the partitioning of quick sort. A memory location containing the location of another data item. A binary tree traversal in which the nodes are traversed in the order of left sub tree, right sub tree and the node. A binary tree traversal in which the nodes are traversed in the order of node, left sub tree and the right sub tree. A queue in which the deletion is done on priority. A data structure in which the elements are added at one end and removed form the other end. Referred to as a FIFO. Relatively fast sorting technique that uses recursion and a partition algorithms. Method of handling a collision in which a sequence of new hashing function is applied to the key that caused the collision until an available location for that the key is found. Children of a same node. A data structure in which the elements are accessed from one end. Referred to as a LIFO. A collection of nodes arranged in a hierarchical fashion. A numeric value associated with an edge in a graph. Maintaining the number elements that can be handled in a single node of trees.

Path Pivot Pointer Post order traversal Preorder traversal Priority queue Queue Quick sort Rehashing

Sibling Stack Tree Weight Weight balancing

Page 83 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

References

Websites
http://www.macs.hw.ac.uk/~alison/ds98/ds98.html http://www.cs.auckland.ac.nz/software/AlgAnim/lists.html http://students.washington.edu/mukundn/courses/cse490b/ http://en.wikipedia.org/wiki/Data_structure http://www.cs.indiana.edu/classes/a202-sbog/notes/BigOh.html http://www.personal.psu.edu/faculty/j/h/jhm/f90/lectures/18.html http://cslibrary.stanford.edu/103/ http://ocw.mit.edu/NR/rdonlyres/Civil-and-Environmental-Engineering/1-00Spring2005/9EBF826C-7CC3-40C8-8FA6-FF579460CC3E/0/sptutorial10.pdf http://www.csc.liv.ac.uk/~frans/COMP101/AdditionalStuff/moreRecords.html http://www.cse.cuhk.edu.hk/~csc2100a/lecture/sort1.pdf http://www.cs.sunysb.edu/~skiena/214/lectures/lect16/lect16.html http://www.iimb.ernet.in www.ncsi.iisc.ernet.in www.highered.mcgraw-hill.com www.tech.purdue.edu www.indianmba.com www.iimb.ernet.in http://wps.prenhall.com

Books
Fundamentals of Data Structures, Ellis Horowitz, Sartaj Sahni, Glagotia Book Source, Computer Science Press Inc 1983 An Introduction to Data Structures with applications Jean-Paul Tremblay, Paul G Sorenson, II Edition, Tata McGraw-Hill Edition Introduction to Data Structures, Bhagat Singh, Thomas L Naps, Glagotia Book Source Data Structures using C and C++, Yedidyah Langson, Moshe J Augenstein, Aaron M Tenenbaum, Pearson Education Asia Introduction to Data Structures and Algorithms Analysis, Thomas L Naps, Second Edition, WEST publishing company, US

Page 84 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

Data Structures with C

STUDENT NOTES:

Page 85 Copyright 2007, Cognizant Technology Solutions, All Rights Reserved C3: Protected

You might also like