You are on page 1of 42

Graduate Studies Program

Term: Fall 2022/2023

Computing

Lecture 2
Linear Dynamic Data Structures (Draft)
This lecture notes have been compiled from different resources,
I’d like to thank those authors who make them available to use.

1
Lecture Outline

✓Dynamic Data Structures


➢Linked Lists
➢Stacks
➢Queues
✓Hash Tables
✓Hashing Searching Techniques

2
Dynamic Data Structures
✓Dynamic data structures
➢Grow and shrink at execution time.
✓Linear Data Structures
➢Linked Lists:
✓“Lined up in a row”
✓Insertions and removals can occur anywhere in the list
➢Stacks: Insertions and removals only at top
✓Push and pop
➢Queues: Insertions made at back, removals from front
✓Non-Linear Data Structures
➢Trees, including binary trees:
✓Facilitate high-speed searching and sorting of data
✓Efficient elimination of duplicate items

3
Structures
◼ Data of various types or multiple data items of the same type
can be attributed to a single object.
◼ Structured Programming languages provide a construct that
allows to give a cover name to the object that we create.
◼ In C/C++ this construct is known as a structure and is
introduced with the keyword struct.
◼ For example, a person can possess both a name and an age,
which would be stored in different kinds of variables -- a char
array for the name and probably an int for the age.
◼ In mathematics, a point on a graph would be represented by
numbers for the coordinates, both of which could be integers or
floats.
◼ For example, we could use xcoord and ycoord for the
coordinates of a point.

4
Structures
◼ The definition below groups four characteristics of a rectangle
into a single struct called rectangle.

struct rectangle {
float L; /* length */
float W; /* width */
float A; /* Area */
float P; /* Perimeter */
};

◼ This definition creates a new programmer-defined data type.


◼ Consequently, no data can be stored in rectangle.
◼ To get memory allocated, we need to declare a data structure of
this type, such as: struct rectangle rect;

5
Dynamic Memory Allocation
✓Self-Referential Class
➢Contains a reference member to an object of the same class type
✓E.g.: class Node
{
private int data;
private Node next; // self-reference to node

}
✓Reference can be used to link objects of the same type together
✓Dynamic data structures require dynamic memory allocation
➢Ability to obtain memory when needed
➢Release memory when not needed any more
➢Uses new operator
✓Ex: Node nodeToAdd = new Node(10);

data next data next

15 10

Two self-referential class objects (nodes) linked together.


6
Linked Lists
✓ Linked List:
➢ Linear collection of self-referential nodes connected by links
✓Nodes: class objects of linked-lists
✓Programs access linked lists through a reference to first node
✓Subsequent nodes accessed by link-reference members
✓Last node’s link set to null to indicate end of list
✓Nodes can hold data of any type
✓Nodes created dynamically
reference to first node

lastNode
firstNode (with null link)

H e …… o

7
Linked Lists
✓Linked lists - similar to arrays, however:
➢Arrays are a fixed size
➢Linked lists have no limit to size
✓More nodes can be added as program executes
✓Insert At Front
(a) firstNode
7 11

new ListNode
12

(b) firstNode
7 11

new ListNode
12

8
Linked Lists
◼ Insert At Back

(a) firtNode lastNode New ListNode

12 7 11 5

(b) firstNode lastNode New ListNode

12 7 11 5

9
Linked Lists
◼ Remove From Front

(a) firstNode lastNode

12 7 11 5

(b) firstNode lastNode

12 7 11 5

removeItem

10
Linked Lists
◼ Remove From Back

(a) firstNode lastNode

12 7 11 5

(b) firstNode lastNode

12 7 11 5

removeItem

11
Stacks
◼ A stack is an Abstract Data Type (ADT), commonly used in most
programming languages.
◼ Stack – a special version of a linked list:
◼ Last-in, first-out (LIFO) data structure:
◼ Takes and releases new nodes only at top
◼ Stack ADT allows all data operations at one end only. At any
given time, we can only access the top element of a stack.
◼ A stack is used for the following two primary operations
◼ Push: adds new entry (node) to top of stack
◼ Pop: removes top entry (node) from stack
◼ Can be used for:
◼ Storing return addresses
◼ Storing local variables
◼ …

12
PUSH Operation
◼ Push operation involves a series of steps −
◼ Step 1 − Checks if the stack is full.
◼ Step 2 − If the stack is full, produces an error and exit.
◼ Step 3 − If the stack is not full, increment top to point next empty
space.
◼ Step 4 − Adds data element to the stack location, where top is
pointing.
◼ Step 5 − Returns success.
◼ A simple algorithm for Push operation can be derived as follows:
begin procedure push: stack, data
if stack is full
return null
endif
top ← top + 1
stack[top] ← data
end procedure

13
Pop operation
◼ A Pop operation may involve the following steps:
◼ Step 1 − Checks if the stack is empty.
◼ Step 2 − If the stack is empty, produces an error and exit.
◼ Step 3 − If the stack is not empty, accesses the data element at
which top is pointing.
◼ Step 4 − Decreases the value of top by 1.
◼ Step 5 − Returns success.

begin procedure pop: stack


if stack is empty
return null
endif
data ← stack[top]
top ← top - 1
return data
end procedure

14
Queues
◼ Queue is an abstract data structure, somewhat similar to
Stacks.
◼ Unlike stacks, a queue is open at both its ends. One end is
always used to insert data (enqueue) and the other is used to
remove data (dequeue).
◼ Queue follows First-In-First-Out methodology, i.e., the data
item stored first will be accessed first.
◼ Queue: First-in, first-out (FIFO) data structure
◼ Nodes added to tail, removed from head

◼ Many computer applications:


◼ Printer spooling
◼ Information packets on networks
◼ …
15
Basic Operations
◼ enqueue() − add (store) an item to the queue.
◼ dequeue() − remove (access) an item from the queue.
◼ The following steps should be taken to enqueue (insert) data into a
queue:
◼ Step 1 − Check if the queue is full.

◼ Step 2 − If the queue is full, produce overflow error and exit.

◼ Step 3 − If the queue is not full, increment rear pointer to point

the next empty space.


◼ Step 4 − Add data element to the queue location, where the rear is

pointing.
◼ Step 5 − return success.

16
Algorithm for enqueue operation
procedure enqueue(data)
if queue is full
return overflow
endif
rear ← rear + 1
queue[rear] ← data
return true
end procedure

◼ The following steps are taken to perform dequeue operation −


◼ Step 1 − Check if the queue is empty.
◼ Step 2 − If the queue is empty, produce underflow error and exit.
◼ Step 3 − If the queue is not empty, access the data where front is pointing.
◼ Step 4 − Increment front pointer to point to the next available data
element.
◼ Step 5 − Return success.

17
Algorithm for dequeue operation
procedure dequeue
if queue is empty
return underflow
end if
data = queue[front]
front ← front + 1
return true
end procedure

18
Hash Tables and Dictionaries

A dictionary consists of
key/element pairs in Example Key Element
which the key is used
to look up the English Word Definition
Dictionary
element.
Student Student Rest of
Records Number record:
Ordered Dictionary: Name, …
Elements stored in Symbol Table Variable Variable’s
sorted order by key in Compiler Name Address in
Memory

Unordered Dictionary:
Elements not stored
in sorted order
19
Dictionary as a Function
Given a key, return an element
Key Element
(domain: (range:
type of the keys) type of the elements)

20
Hashing
◼ Hashing is a technique to convert a range of key values into a
range of indexes of an array of hash table.
◼ We're going to use modulo operator to get a range of key values.
◼ Consider an example of hash table of size 20, and the following
items are to be stored. Item are in the (key, value) format:
(1,20), (2,70), (42,80), (4,25), (12,44), (14,32), (17,11), (13,78), (37,98)

Sr. No. Key Hash Array Index


1 1 1 = 20 % 1 1
2 2 2 = 20 % 2 2
3 42 2 = 20 % 42 2
4 4 4 = 20 % 4 4
5 12 12 = 20 % 12 12
6 14 14 = 20 % 14 14
7 17 17 = 20 % 17 17
8 13 13 = 20 % 13 13
9 37 17 = 20 % 37 17
21
Linear Probing
◼ Two different values collide when they produce the same hash
index.
◼ Handling collisions involves storing the new value elsewhere in
the hash table.
◼ In such a case, we can search the next empty location in the
array by looking into the next cell until we find an empty cell.
This technique is called linear probing.
◼ If h(k) is full, examine (h(k) + 1) % N, then (h(k) + 2) % N, then
..., (h(k) + N - 1) % N.

◼ Linear probing leads to clustering.


◼ Quadratic probing spreads out successive probes.
◼ (h(k) + i2) % N for 0 ≤ i < N.

22
Linear Probing

Array After Linear Probing,


S.N Key Hash
Index Array Index
1 1 1 = 20 % 1 1 1
2 2 2 = 20 % 2 2 2
3 42 2 = 20 % 42 2 3
4 4 4 = 20 % 4 4 4
5 12 12 = 20 % 12 12 12
6 14 14 = 20 % 14 14 14
7 17 17 = 20 % 17 17 17
8 13 13 = 20 % 13 13 13
9 37 17 = 20 % 37 17 18

23
Hash Table with Collision

h( k )
return k mod m
where k is the key and m is the size of the table

24
Collisions and their Resolution
◼ A collision occurs when two different keys hash to the
same value
◼ E.g. For TableSize = 17, the keys 18 and 35 hash to the same
value
◼ 18 mod 17 = 1 and 35 mod 17 = 1
◼ Cannot store both data records in the same slot in
array!
◼ Two different methods for collision resolution:
◼ Open Hashing (Separate Chaining): Use a dictionary
data structure (such as a linked list) to store multiple
items that hash to the same slot. Separate chaining =
Open hashing.
◼ Closed Hashing (or probing): search for empty slots
using a second function and store item in first empty slot
that is found. Closed hashing = Open addressing.
25
Collision Resolution Schemes: Chaining
The hash table is an array
of linked lists 0 0

Insert Keys: 0, 1, 4, 9, 16,


1 81 1
25, 36, 49, 64, 81 2
3
Notes:
4 64 4
◼ As before, elements would
be associated with the 5 25
keys
6 36 16
◼ We’re using the hash
function h(k) = k mod m 7
8
9 49 9

26
Collision Resolution Strategies: Open Addressing
All elements stored in the hash table itself (the array). If a
collision occurs, try alternate cells until empty cell is found.

Three Resolution Strategies:


◼ Linear Probing

◼ Quadratic Probing

◼ Double Hashing

All these try cells h(k,0), h(k,1), h(k,2), …, h(k, m-1)


where h(k,i) = ( h(k) + f(i) ) mod m, with f(0) = 0

The function f is the collision resolution strategy and the


function h is the original hash function.

27
Linear Probing

Function f is linear. Typically,


f(i) = i 0
So, h( k, i ) = ( h(k) + i ) mod m 1
Offsets: 0, 1, 2, …, m-1 2
With H = h( k ), we try the 3
following cells with wraparound: 4
H, H + 1, H + 2, H + 3, … 5
6
7
8
9

28
Rehashing
Problem with both chaining & probing:
When the table gets too full, the average search
time get worse from O(1) to O(n).

Solution: Create a larger table and then rehash all


the elements into the new table.

29
Choosing Hash Functions

A good hash function must be O(1) and must


distribute keys evenly.
Division Method Hash Function for Integer Keys:
h(k) = k mod m
Hash Function for String Keys?

30
Requirement: Prime Table Size for Division
Method Hash Functions
If the table is not prime, the number of alternative locations can
be severely reduced, since the hash position is a value mod
the table size

Example: Table Size 16, with Quadratic Probing

h(k) + Offset

0 + 1 mod 16 = 1
4 mod 16 = 4
9 mod 16 = 9
16 mod 16 = 0
25 mod 16 = 9
36 mod 16 = 4
49 mod 16 = 1

31
Important Factors for Designing Hash Tables

To Minimize Collisions:
◼ Make the table size, m, a prime number not near
a power of two if using a division method hash
function
◼ Use a load factor, λ = n / m, that’s appropriate
for the implementation.
◼ 1.0 or less for chaining ( i.e., n ≤ m ).
◼ 0.5 or less for linear or quadratic probing or
double hashing ( i.e., n ≤ m / 2 )

32
Collision Resolution Comparison
Let n = number of elements in hash table
Let m = hash table size
Let λ = n / m ( the load factor, i.e, the average number of
elements stored in a chain )

Recommended Load Factor


Chaining λ ≤ 1.0
Linear or Quadratic λ ≤ 0.5 (half full)
Probing
Double Hashing λ ≤ 0.5 (half full)

Note: If a table using quadratic probing is more than half full, it is not
guaranteed that an empty cell will be found

33
Linear Probing (insert 12)

12 = 1 x 11 + 1
12 mod 11 = 1

0 42 0 42
1 1 1 1
2 24 2 24
3 14 3 14
4 4 12
5 16 5 16
6 28 6 28
7 7 7 7
8 8
9 31 9
10 9 10 9

34
Search with linear probing (Search 15)

15 = 1 x 11 + 4
15 mod 11 = 4

0 42
1 1
2 24
3 14
4 12
5 16
6 28
7 7
8 NOT FOUND !
9 31
10 9

35
Collision Resolution by Closed Hashing

◼ Given an item X, try


cells h0(X), h1(X), h2(X), …, hi(X)
◼ hi(X) = (Hash(X) + F(i)) mod TableSize
◼ Define F(0) = 0

◼ F is the collision resolution function. Some


possibilities:
◼ Linear: F(i) = i

◼ Quadratic: F(i) = i
2

◼ Double Hashing: F(i) = i Hash2(X)

36
Closed Hashing I: Linear Probing

◼ Main Idea: When collision occurs, scan down the


array one cell at a time looking for an empty cell
◼ hi(X) = (Hash(X) + i) mod TableSize (i = 0, 1, 2, …)
◼ Compute hash value and increment it until a free
cell is found

37
Linear Probing Example
insert(14) insert(8) insert(21) insert(2)
14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2
0 0 0 0
14 14 14 14
1 1 1 1
8 8 8
2 2 2 2
21 12
3 3 3 3
2
4 4 4 4

5 5 5 5

6 6 6 6

probes: 1 1 3 2
38
Drawbacks of Linear Probing

◼ Works until array is full, but as number of items N


approaches TableSize (  1), access time
approaches O(N)
◼ Very prone to cluster formation (as in our
example)
◼ If a key hashes anywhere into a cluster, finding a
free cell involves going through the entire cluster –
and making it grow!
◼ Primary clustering – clusters grow when keys hash
to values close to each other
◼ Can have cases where table is empty except for a
few clusters
◼ Does not satisfy good hash function criterion of
distributing keys uniformly 39
Closed Hashing II: Quadratic Probing

◼ Main Idea: Spread out the search for an empty slot


Increment by i2 instead of i

◼ hi(X) = (Hash(X) + i2) % TableSize


h0(X) = Hash(X) % TableSize
h1(X) = Hash(X) + 1 % TableSize
h2(X) = Hash(X) + 4 % TableSize
h3(X) = Hash(X) + 9 % TableSize

40
Quadratic Probing Example
insert(14) insert(8) insert(21) insert(2)
14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2
0 0 0 0
14 14 14 14
1 1 1 1
8 8 8
2 2 2 2
2
3 3 3 3

4 4 4 4
21 21
5 5 5 5

6 6 6 6

probes: 1 1 3 1
41
Problem With Quadratic Probing
insert(14) insert(8) insert(21) insert(2) insert(7)
14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2 7%7 = 0
0 0 0 0 0
14 14 14 14 14
1 1 1 1 1
8 8 8 8
2 2 2 2 2
2 2
3 3 3 3 3

4 4 4 4 4
21 21 21
5 5 5 5 5

6 6 6 6 6

probes: 1 1 3 1 ??
42

You might also like