Professional Documents
Culture Documents
Trees
Data Structure and Algorithms
Overview
Trees (7hrs)
1.Concept and definitions
2.Basic operation in binary tree
3.Binary search tree and insertion /deletions
4.Binary tree traversals (pre-order, post-order and in-order )
5.tree height level and depth
6.Balanced trees
7.AVL balanced trees
8.Balancing algorithm
9.The Huffman algorithm
10.Game tree
11.B- Tree.
R1 R2
T0 T1
Tree- Terminologies
1.Root:
• In a tree, every individual element is called a node.
• The first node (the top most node in the hierarchy) is called
as Root Node.
• Root of a tree is a special and first node from which the tree
originates and it therefore doesn’t have parent.
• A tree can have only one root.
Tree- Terminologies
2.Edge:
• A connecting link between any two nodes is called as edge.
• The arrowheads on the edges indicate the direction of the
connection
• In a tree with 'N' number of nodes there will be a maximum of
'N-1' number of edges.
Tree- Terminologies
3.Parent:
• The node which is a predecessor of any node is called as PARENT
NODE.
• Parent node can also be defined as "The node which has child /
children".
Tree- Terminologies
4.Child:
• The node which is descendant (successor or offspring) of any
node is called as CHILD Node.
• Each node can have arbitrary number of children (or subtrees).
• In a tree, all the nodes except root are child nodes.
Tree- Terminologies
5.Siblings:
• The nodes which belong to same Parent are called as SIBLINGS.
Tree- Terminologies
6.Leaf:
• The nodes with no children are called leaves nodes or external
nodes.
Tree- Terminologies
7.Internal nodes:
• The node which has at least one child is called as INTERNAL
Node
Tree- Terminologies
8.Degree:
• The total number of children of a node is called as DEGREE of
that Node.
• The highest degree of a node among all the nodes in a tree is
called as 'Degree of Tree'
Tree- Terminologies
9.Level:
• In a tree each step from top to bottom is called as a Level and
the Level count starts with '0' and incremented by one at each
level (Step).
• The root node is said to be at Level 0 and the children of root
node are at Level 1 and the children of the nodes which are at
Level 1 will be at Level 2 and so on.
Tree- Terminologies
10.Height:
• The total number of edges from leaf node to a particular node in
the longest path is called as HEIGHT of that Node.
• height of the root node is said to be height of the tree
• height of all leaf nodes is '0'
Tree- Terminologies
11.Depth:
• The total number of egdes from root node to a particular node is
called as DEPTH of that Node
• the total number of edges from root node to a leaf node in the
longest path is said to be Depth of the tree
• depth of the root node is '0'
Tree- Terminologies
12.Path:
• The sequence of Nodes and Edges from one node to another
node is called as PATH between those two Nodes
• Length of a Path is total number of nodes in that path
• the path A - B - E - J has length 4.
Tree- Terminologies
13.Sub tree:
• Each child from a node forms a subtree recursively.
• A child of a parent node is a subtree
• Every node has zero or one or more subtree
General Tree
• A general tree is a tree where each node may have zero or more
children
• General trees are used to model applications such as file systems,
web page structure etc.
General Tree
• Application: File System
General Tree
• Application: Representing an Organization
• Many organizations are hierarchical in nature, such as the military and most
businesses. Consider a company with a president and some number of vice
presidents who report to the president. Each vice president has some number
of direct subordinates, and so on. If we wanted to model this company with a
data structure, it would be natural to think of the president in the root node
of a tree, the vice presidents at level 1, and their subordinates at lower
levels in the tree as we go down the organizational hierarchy.
• Because the number of vice presidents is likely to be more than two, this
company's organization cannot easily be represented by a binary tree. We
need instead to use a tree whose nodes have an arbitrary number of children.
Unfortunately, when we permit trees to have nodes with an arbitrary number
of children, they become much harder to implement than binary trees. To
distinguish them from binary trees, we use the term general tree.
General Tree
• Application: Representing an Organization
• Administrative Organization: as In Figure 10.1 which illustrates
(part) of the administrative structure governing the University of
Liverpool
General Tree
• Application: The HTML tags used to create the web page
• The HTML source code and the tree accompanying the source illustrate
another hierarchy. Notice that each level of the tree corresponds to a level
of nesting inside the HTML tags. The first tag in the source is <html> and
the last is </html> All the rest of the tags in the page are inside the pair.
29
General Tree
• Application: The HTML tags used to create the web page
Binary Tree
• A binary tree is a finite set of elements that is either empty or is
partitioned into three disjoint subsets (Disjoint means that they have
no nodes in common):
1. The first subset contains a single element called the root of the
tree.
2. The other two subsets are themselves binary trees, called left
subtree and right subtree.
3. A left or right subtree can be empty.
Binary Tree
• A Binary Tree, T is recursively defined as follows:
• T is empty (i.e. contains no nodes) OR
• T consists of a root node which is connected to two distinct
binary trees TLeft and TRight
(`distinct' in the sense that TLeft and TRight have no nodes in common).
Binary Tree
Binary Tree
• Tree structures enable efficient access and efficient update to large
collections of data.
• Binary trees in particular are widely used and relatively easy to
implement.
• Binary trees are useful for many things besides searching.
• Just a few examples of applications that trees can speed up include
• prioritizing jobs,
• describing mathematical expressions and
• the syntactic elements of computer programs, or
• organizing the information needed to drive data compression
algorithms.
A
A
A B C
B C
D E F G
A d=0
Depth (d)
B C d=1
d=2
D E F G
No of nodes = 2d + 1 – 1 nodes. Depth d # nodes at depth d # of child nodes
--------------------------------------------------------------
Depth = log(n + 1) – 1 0 1 = 20 2
1 2 = 21 4
Leaf Nodes = 2d 2 4 = 20 8
A A
B C B C
D E F D
Skewed Binary Tree Left Skewed Binary Tree Right Skewed Binary Tree
Dummy Node
9 5 4 1 3 2 -
0 1 2 3 4 5 6
⌊ 2 ⌋
i−1
A B C D E F G • Its parent is at
0 1 2 3 4 5 6
Array representation of the above tree.
TREE
Array representation of the above tree.
A B C D E F G
A B C D E - - A B C - - F G
0 1 2 3 4 5 6 0 1 2 3 4 5 6
• This node can be recursively used to represent a binary tree. If a node doesn’t
have a left child, it will be assigned a null value. If the node doesn’t have a
right child, it will be assigned a null value.
struct node
{
int data; Link to left node Link to right node
node *left;
node *right;
};
Inorder Traversal
• In this traversal method, the left subtree is visited first, then the
root and later the right sub-tree.
• If a binary search tree is traversed in-order, the output will produce
sorted values in an ascending order.
ti
Inorder Traversal
Pre-order Traversal
• In this traversal method, the root is visited first, then the left
subtree is visited and the right sub-tree is visited at last.
ti
Pre-order Traversal
Post-order Traversal
• In this traversal method, the left subtree is visited at first, then the
right sub-tree is visited and the root is visited at last.
ti
Post-order Traversal
Figure: Two Binary Search Trees for a collec on of values. Tree (a) results if values
are inserted in the order 37, 24, 42, 7, 2, 42, 40, 32, 120. Tree (b) results if the same
values are inserted in the order 120, 42, 42, 7, 2, 32, 37, 24, 40.
Recursive Definition:
Insert(N, T) = N if T is empty
= insert(N, T.left) if N<T
= insert(N, T.right)if N>T
• In-order traversal:
• 30->40->50->60->70->80
• Where 40 = in-order predecessor
• And 60 = in-order successor
• We can replace 50 either by 40 (in-order
predecessor) or by 60 (in-order successor)
• Finally delete in-order successor if you
have replaced 50 by in-order successor.
Recursive Definition:
Search(N, T) = false if T is empty
= true if T = N
= search(N, T.left) if N < T
= search(N, T.right)if N > T
• •
AVL Trees
• AVL tree is a height-balanced binary search tree.
• The AVL tree is named after its two Soviet inventors, Georgy
Adelson-Velsky and Evgenii Landis
• AVL is a binary search tree with the following additional property:
• For every node x, the heights of its left and right subtrees differ
by at most 1. (It is called balance factor)
AVL Trees
• In this tree, the balance factor associated with its each node is in
between -1 and +1. therefore, it is an AVL tree.
• A node X with
• BF(X)<0 is called “left-heavy"
• BF(X)>0 is called "right-heavy",
• BF(X)=0 is simply called "balanced".
AVL Trees
• Examples:
AVL Trees
• Complexity:
Algorithm Average case Worst case
AVL Trees
• An insert(65) operation that violates the AVL tree balance property.
insert(65)
• Prior to the insert operation, all nodes of the tree are balanced
• After inserting the node 65, the node 100 is no longer balanced.
AVL Trees
• An insert(5) operation that violates the AVL tree balance property.
insert(5)
• Prior to the insert operation, all nodes of the tree are balanced
• After inserting the node 5, the node 24 and 7 are no longer balanced.
AVL Trees
• An delete(25) operation that violates the AVL tree balance property.
delete(25)
• Prior to the delete operation, all nodes of the tree were balanced
• After deleting the node 25, the node 50 is no longer balanced.
Balancing Algorithm
Balancing Algorithm
• In AVL tree, after performing operations like insertion and deletion
we need to check the balance factor of every node in the tree.
• If every node satisfies the balance factor condition then we conclude
the operation otherwise we must make it balanced.
• Whenever the tree becomes imbalanced due to any operation we
use rotation operations to make the tree again balanced.
• Rotation is the process of moving nodes either to left or to right to
make the tree balanced.
• AVL tree permits difference (balance factor) to be only 1.
Balancing Algorithm
• If the balance factor is more than 1, then the tree is rebalanced using
some rotation techniques.
• There are four cases due to which AVL tree may become unbalanced:
Case 1:
A node is inserted in the left subtree of left subtree of a node.
Case 2:
A node is inserted in the right subtree of right subtree of a node.
Case 3:
A node is inserted in the right subtree of left subtree of a node.
Case 4:
A node is inserted in the left subtree of right subtree of a node.
Balancing Algorithm
• To deal with the four cases that can violate the AVL property, there
are four different rotations to rebalance the AVL tree.
• AVL Rotations:
1. Left rotation
2. Right rotation
3. Left-Right rotation
4. Right-Left rotation
• The first two rotations are single rotations and the next two
rotations are double rotations.
• To have an unbalanced tree, we at least need a tree of height 2.
Balancing Algorithm
Case Case description that causes Case Name Required
Diagram to violate AVL property Rotation
Balancing Algorithm
• In above example, node C has balance factor 2 because a node A is inserted in the
left subtree of left subtree of C. As depicted, the unbalanced node becomes the
right child of its left child B after performing a right rotation.
RR Case LL Case
Left Right
Rotation Rotation
Right Left
Rotation Rotation
Right
Rotation
Huffman Algorithm
• Fixed-length codes: ASCII coding
• Standard ASCII coding scheme assigns a unique eight-bit value
to each character.
• Capable to represent 256 unique characters
• ASCII coding takes ⌈log 128⌉ or seven bits to provide the 128
unique codes needed to represent the 128 symbols of the ASCII
character set.
Huffman Algorithm
• Below Table shows the relative frequencies
of the letters of the alphabet. From this
table we can see that the letter 'E' appears
about 60 times more often than the letter
'Z'.
• If some characters are used more frequently
than others, is it possible to take advantage
of this fact and somehow assign them
shorter codes? The price could be that other
characters require longer codes, but this
might be worthwhile if such characters
appear rarely enough.
• variable-length codes (Example Huffman
coding)
Huffman Algorithm
• Huffman coding assigns codes to characters such that the length of the
code depends on the relative frequency or weight of the corresponding
character. Thus, it is a variable-length code.
• The Huffman code for each letter is derived from a full binary tree
called the Huffman coding tree, or simply the Huffman tree.
• Each leaf of the Huffman tree corresponds to a letter, and we define the
weight of the leaf node to be the weight (frequency) of its associated
letter.
• The goal is to build a tree with the minimum external path weight.
• Define the weighted path length of a leaf to be its weight times its
depth.
• A letter with high weight should have low depth, so that it will count the
least against the total path length. As a result, another letter might be
pushed deeper in the tree if it has less weight.
Huffman Algorithm
• Developed by David Hu man in 1951, this technique is the basis for
all data compression and encoding schemes
• It is a famous algorithm used for lossless data encoding
• It follows a Greedy approach, since it deals with generating
minimum length prefix-free binary codes
• It uses variable-length encoding scheme for assigning binary codes
to characters depending on how frequently they occur in the given
text. The character that occurs most frequently is assigned the
smallest code and the one that occurs least frequently gets the
largest code
• Uses a binary tree to determine the encoding of each character, to decode
an encoded le – i.e., to decompress a compressed le, pu ng it back into
ASCII
ff
fi
tti
Huffman Algorithm
• Example of a Hu man tree (for a text with only six chars):
Huffman Algorithm
• The major steps involved in Huffman coding:
2. Step II - Assigning the binary codes to each symbol by traversing
Huffman tree
• Generally, bit ‘0’ represents the left child and bit ‘1’
represents the right child
Huffman Algorithm
• Building the Huffman tree for n letter:
1. Create a collection of n initial Huffman trees, each of which is
a single leaf node containing one of the letters.
2. Put the n partial trees onto a priority queue organized by
weight (frequency).
3. Remove the first two trees (the ones with lowest weight) from
the priority queue. Join these two trees together to create a
new tree whose root has the two trees as children, and whose
weight is the sum of the weights of the two trees. Put this new
tree back into the priority queue.
4. This process (step 3) is repeated until all of the partial Huffman
trees have been combined into one.
Huffman Algorithm
A.Crea ng the Hu man Tree
1.Create a leaf node for each character and build a min heap using all the
nodes (The frequency value is used to compare two nodes in min heap)
2.Repeat Steps 3 to 5 while heap has more than one node
3.Extract two nodes, say x and y, with minimum frequency from the heap
4.Create a new internal node z with x as its le child and y as its right child.
Also frequency(z)= frequency(x)+frequency(y)
5.Add z to min heap
6.Last node in the heap is the root of Hu man tree
ff
ft
Huffman Algorithm
B.Traversing the Hu man Tree (Encoding processes)
1.Create an auxiliary array
2.Traverse the tree star ng from root node
3.Assign 0 to array while traversing the le child and assign 1 to array while
traversing the right child
4.Print the array elements whenever a leaf node is found
ft
Characters Frequencies
a 10
e 15
i 12
o 3
u 4
s 13
t 1
ft
(a)
(b)
Dr. Udaya R. Dhungana, Pokhara University, Nepal. udaya@pu.edu.np 141
ff
ff
ft
ff
ft
Student Practice Huffman Algorithm: Example
• Build a Huffman tree and Huffman coding for the following
characters with given frequencies:
Game Tree
• A game tree is a type of recursive search function that examines all
possible moves of a strategy game, and their results, in an attempt
to ascertain the optimal move.
• They are very useful for Artificial Intelligence
• The most commonly-cited example is chess
• A game tree is a directed graph whose nodes are positions in a game
and whose edges are moves.
• The complete game tree for a game is the game tree starting at the
initial position and containing all possible moves from each position;
the complete tree is the same tree as that obtained from the
extensive-form game representation.
Game Tree
• Game trees are generally used in board games to determine the best
possible move. For example: Tic-Tac-Toe.
• The idea is to start at the current board position, and check all the
possible moves the computer can make.
• Then, from each of those possible moves, to look at what moves the
opponent may make. Then to look back at the computers.
• Ideally, the computer will flip back and forth, making moves for
itself and its opponent, until the game's completion.
• It will do this for every possible outcome, effectively playing
thousands (often more) of games.
• From the winners and losers of these games, it tries to determine
the outcome that gives it the best chance of success.
Game Tree
• This can be likened to how people think during a board game - for
example "if I make this move, they could make this one, and I could
counter with this" etc.
• Game trees do just that. They examine every possible move and every
opponent move, and then try to see if they are winning after as many
moves as they can think through.
• As one could imagine, there are an extremely large number of possible
games.
• In Tic-Tac-Toe, there are 9 possible first moves (ignoring rotation
optimization). There are 8 on the second turn, then 7, 6, etc. In total,
there are 255,168 possible games.
• The goal of a game tree is to look at them all (on the first move) and
try to choose a move that will, at worst, make losing impossible, and,
at best, win the game.
Game Tree
Figure: Game tree of Tic-Tac-Toe with the possible combinations of the rst two moves
B Tree
• In search trees like binary search tree, AVL Tree, Red-Black tree,
etc., every node contains only one value (key) and a maximum of
two children.
• But there is a special type of search tree called B-Tree in which a
node contains more than one value (key) and more than two
children.
• B-Tree was developed in the year 1972 by Bayer and McCreight with
the name Height Balanced m-way Search Tree. Later it was named
as B-Tree.
B Tree
• Definition:
• B-Tree is a self-balanced search tree in which every node contains
more than one key and has more than two children.
• It is also known as a height-balanced m-way tree.
Properties of B-tree
• The number of keys in a node and number of children for a node
depends on the order of B-Tree. Every B-Tree has an order.
• B-Tree of Order m has the following properties:
1. All leaf nodes must be at same level.
2. All nodes except root must have at least [m/2]-1 keys and maximum
of m-1 keys.
3. All non leaf nodes except root (i.e. all internal nodes) must have at
least m/2 children (and at most m children).
4. If the root node is a non leaf node, then it must have at least 2 children
and contains a minimum of 1 key..
5. A non leaf node with n-1 keys must have n number of children.
6. All the key values in a node must be in Ascending Order.
B-tree: Example
• B Tree of order 4
1 Search O(log n)
2 Insert O(log n)
3 Delete O(log n)
fi
fi
Step 8 Step 9
Dr. Udaya R. Dhungana, Pokhara University, Nepal. udaya@pu.edu.np 175
Insertion Operation in B-Tree
Step 8 Step 9
Step 11 Step 10
Step 12 Step 13
• Look at https://www.programiz.com/dsa/deletion-from-a-b-tree
B Tree B+ Tree
Data is stored in leaf nodes as well as Data is stored only in leaf nodes.
internal nodes.
Searching is a bit slower as data is stored Searching is faster as the data is stored only
in internal as well as leaf nodes. in the leaf nodes.
No redundant search keys are present. Redundant search keys may be present.
Leaf nodes cannot be linked together. Leaf nodes are linked together to form a
linked list.
Graph is a Tree If …
• An n-vertex graph G(V,E) is a tree if:
• There is a path of edges between any two vertices and there are
exactly n-1 edges.
• There is a path of edges between any two vertices and there are
no cycles in G.
• There is a unique path between any two vertices.
THANK YOU
180