You are on page 1of 53

Algorithms

Sorting & Searching

Partha Pratim Das

Department of Computer Science and Engineering


Indian Institute of Technology, Kharagpur
ppd@cse.iitkgp.ernet.in

T10KT Coordinators’ Workshop

March 18, 2015

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 1 / 34
Problem

Arrangement of Points
You are given a planar geometric structure comprising vertices, edges
connecting vertices, and faces bounded by edges. The expected operations
are:
Walk around the boundary of a given face in CCW order
Access a face from an adjacent one
Visit all the edges around a given vertex

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 2 / 34
Doubly
Connected
Edge
List
(DCEL)

•  DCEL
is
one
of
the
most
commonly
used

representations
for
planar
subdivisions
such
as

Voronoi
diagrams.

•  It
is
an
edge‐based
structure
which
links
together
the

three
sets
of
records:

–  Vertex

–  Edge

–  Face

•  It
facilitates
traversing
the
faces
of
planar

subdivision,
visiting
all
the
edges
around
a

given
vertex

Doubly
Connected
Edge
List
(DCEL)

f1

f2

f4

f3

f5


•  Main
ideas:

–  Edges
are
oriented
counterclockwise
inside
each
face

–  Since
an
edge
borders
two
faces,
each
edge
is
replaced
by

two
half‐edges,
one
for
each
face

Doubly
Connected
Edge
List
(DCEL)

•  The
vertex
record
of
a
vertex
v
stores
the
 IncidentFace(e1)

coordinates
of
v.
It
also
stores
a
pointer

IncidentEdge(v)
to
an
arbitrary
half‐edge
 next(e1)

e1
 origin(e1)

that
has
v
as
its
origin
 e2
 e6


•  The
face
record
of
a
face
f
stores
a
pointer
 e3
 e5
 previous(e1)



e3.twin
 e4

to
some
half‐edge
on
its
boundary
which

can
be
used
as
a
starting
point
to
traverse
 e4.twin

f
in
counterclockwise
order


•  The
half‐edge
record
of
a
half‐edge
e
stores
pointer
to:

•  Origin
(e)

•  Twin
of
e,

e.twin
or
twin(e)

•  The
face
to
its
left
(
IncidentFace(e)
)

•  Next(e)
:
next
half‐edge
on
the
boundary
of
IncidentFace(e)

•  Previous(e)
:
previous
half‐edge


Doubly
Connected
Edge
List
(DCEL)

e7,2
 v6

v3

e7,1

e1,2
 f3

e5,2

e6,1
 f5

e1,1
 e5,1
 e6,2
 e
9,1
 e9,2

f1
 e3,1
 e3,2

v1
 f2
 f
e2,1
 e8,1
 4

e4,1
 v4

e2,2
 e4,2
 e8,2


v2
 v5


Vertex
 Coordinates
 IncidentEdge



v1
 (x1,
y1)
 e2,1

v2
 (x2,
y2)
 e4,1

v3
 (x3,
y3)
 e3,2

v4
 (x4,
y4)
 e6,1

v5
 (x5,
y5)
 e9,1

v6
 (x6,
y6)
 e7,1

Doubly
Connected
Edge
List
(DCEL)

e7,2
 v6

v3

e7,1

e1,2
 f3

e5,2

e6,1
 f5

e1,1
 e5,1
 e6,2
 e
9,1
 e9,2

f1
 e3,1
 e3,2

v1
 f2
 f
e2,1
 e8,1
 4

e4,1
 v4

e2,2
 e4,2
 e8,2


v2
 v5


Face
 Edge

f1
 e1,1

f2
 e5,1

f3
 e5,2

f4
 e8,1

f5
 e9,2

Doubly
Connected
Edge
List
(DCEL)

e7,2
 v6

v3

e7,1

e1,2
 f3

e5,2

e6,1
 f5

e1,1
 e5,1
 e6,2
 e
9,1
 e9,2

f1
 e3,1
 e3,2

v1
 f2
 f
e2,1
 e8,1
 4

e4,1
 v4

e2,2
 e4,2
 e8,2


v2
 v5


Half‐edge
 Origin
 Twin
 IncidentFace
 Next
 Previous



e3,1
 v2
 e3,2
 f1
 e1,1
 e2,1

e3,2
 v3
 e3,1
 f2
 e4,1
 e5,1

e4,1
 v2
 e4,2
 f2
 e5,1
 e3,2

e4,2
 v4
 e4,1
 f5
 e2,2
 e8,2

…
 …
 …
 …
 …
 …

Problem

Searching multiple items in a list


Find a set of elements in an unordered set or list of elements. That is,
let L be a set of n distinct integers which is not ordered in any
fashion. Let K be another list of m distinct integers. We are required
to search for the existence of elements of K in L.
Algorithm
Recurrence
Complexity

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 9 / 34
Problem

Searching multiple items in a list


Find a set of elements in an unordered set or list of elements. That is,
let L be a set of n distinct integers which is not ordered in any
fashion. Let K be another list of m distinct integers. We are required
to search for the existence of elements of K in L.
Algorithm
Recurrence
Complexity
Thoughts for improvement · · ·

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 9 / 34
Problem

Searching multiple items in a list


Find a set of elements in an unordered set or list of elements. That is,
let L be a set of n distinct integers which is not ordered in any
fashion. Let K be another list of m distinct integers. We are required
to search for the existence of elements of K in L.
Algorithm
Recurrence
Complexity
Thoughts for improvement · · ·
m is constant

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 9 / 34
Problem

Searching multiple items in a list


Find a set of elements in an unordered set or list of elements. That is,
let L be a set of n distinct integers which is not ordered in any
fashion. Let K be another list of m distinct integers. We are required
to search for the existence of elements of K in L.
Algorithm
Recurrence
Complexity
Thoughts for improvement · · ·
m is constant
m is O(n)

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 9 / 34
Problem

Searching multiple items in a list


Find a set of elements in an unordered set or list of elements. That is,
let L be a set of n distinct integers which is not ordered in any
fashion. Let K be another list of m distinct integers. We are required
to search for the existence of elements of K in L.
Algorithm
Recurrence
Complexity
Thoughts for improvement · · ·
m is constant
m is O(n)
Should you sort K ?

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 9 / 34
Problem

Searching multiple items in a list


Find a set of elements in an unordered set or list of elements. That is,
let L be a set of n distinct integers which is not ordered in any
fashion. Let K be another list of m distinct integers. We are required
to search for the existence of elements of K in L.
Algorithm
Recurrence
Complexity
Thoughts for improvement · · ·
m is constant
m is O(n)
Should you sort K ?
Discuss with TAs for complete solution

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 9 / 34
Problem

Searching an almost ordered list


Let L be a list of distinct integers that is almost sorted in ascending
order. By almost sorted we mean that every element is either in its
correct position in the sorted order or at most one place away from its
correct position. For example, the list 12, 1, 23, 55, 44, 60, 72, 91, 83
is such an almost sorted list.
Algorithm
Recurrence
Complexity

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 10 / 34
Problem

Searching an almost ordered list


Let L be a list of distinct integers that is almost sorted in ascending
order. By almost sorted we mean that every element is either in its
correct position in the sorted order or at most one place away from its
correct position. For example, the list 12, 1, 23, 55, 44, 60, 72, 91, 83
is such an almost sorted list.
Algorithm
Recurrence
Complexity
What is the impact if the notion of almost sorted is extended by k
positions?

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 10 / 34
Problem

Searching an almost ordered list


Let L be a list of distinct integers that is almost sorted in ascending
order. By almost sorted we mean that every element is either in its
correct position in the sorted order or at most one place away from its
correct position. For example, the list 12, 1, 23, 55, 44, 60, 72, 91, 83
is such an almost sorted list.
Algorithm
Recurrence
Complexity
What is the impact if the notion of almost sorted is extended by k
positions?
Discuss with TAs for complete solution

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 10 / 34
Problem

Searching a two dimensional partially ordered set


Given an n × n array A of distinct integers having the property
A[i, j] ≥ A[p, q] if and only if i ≥ p and j ≥ q, write an algorithm to
find some element k in A.
Algorithm
Recurrence
Complexity

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 11 / 34
Problem

Searching a two dimensional partially ordered set


Given an n × n array A of distinct integers having the property
A[i, j] ≥ A[p, q] if and only if i ≥ p and j ≥ q, write an algorithm to
find some element k in A.
Algorithm
Recurrence
Complexity
Can you improve the initial solution? Extend for higher dimensions?

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 11 / 34
Problem

Searching a two dimensional partially ordered set


Given an n × n array A of distinct integers having the property
A[i, j] ≥ A[p, q] if and only if i ≥ p and j ≥ q, write an algorithm to
find some element k in A.
Algorithm
Recurrence
Complexity
Can you improve the initial solution? Extend for higher dimensions?
Discuss with TAs for complete solution

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 11 / 34
Problem

Partial Sorting
Given a set L of n distinct integers, we need to partially sort the n
numbers to find the top k largest elements in sequence.
Which is a better approach to use, Mergesort or Quicksort or
anything else?
Algorithm
Recurrence
Complexity

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 12 / 34
Problem

Partial Sorting
Given a set L of n distinct integers, we need to partially sort the n
numbers to find the top k largest elements in sequence.
Which is a better approach to use, Mergesort or Quicksort or
anything else?
Algorithm
Recurrence
Complexity
Trivial solution would be to sort L and take the k largest elements.
This is O(n log n). Can you improve the initial solution? What if k is
a constant?

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 12 / 34
Problem

Partial Sorting
Given a set L of n distinct integers, we need to partially sort the n
numbers to find the top k largest elements in sequence.
Which is a better approach to use, Mergesort or Quicksort or
anything else?
Algorithm
Recurrence
Complexity
Trivial solution would be to sort L and take the k largest elements.
This is O(n log n). Can you improve the initial solution? What if k is
a constant?
Bubblesort?

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 12 / 34
Problem

Partial Sorting
Given a set L of n distinct integers, we need to partially sort the n
numbers to find the top k largest elements in sequence.
Which is a better approach to use, Mergesort or Quicksort or
anything else?
Algorithm
Recurrence
Complexity
Trivial solution would be to sort L and take the k largest elements.
This is O(n log n). Can you improve the initial solution? What if k is
a constant?
Bubblesort?
Mergesort?

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 12 / 34
Problem

Partial Sorting
Given a set L of n distinct integers, we need to partially sort the n
numbers to find the top k largest elements in sequence.
Which is a better approach to use, Mergesort or Quicksort or
anything else?
Algorithm
Recurrence
Complexity
Trivial solution would be to sort L and take the k largest elements.
This is O(n log n). Can you improve the initial solution? What if k is
a constant?
Bubblesort?
Mergesort?
Quicksort?

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 12 / 34
Problem

Partial Sorting
Given a set L of n distinct integers, we need to partially sort the n
numbers to find the top k largest elements in sequence.
Which is a better approach to use, Mergesort or Quicksort or
anything else?
Algorithm
Recurrence
Complexity
Trivial solution would be to sort L and take the k largest elements.
This is O(n log n). Can you improve the initial solution? What if k is
a constant?
Bubblesort?
Mergesort?
Quicksort?
Discuss with TAs for complete solution

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 12 / 34
Problem

Count Inversions in an array


Inversion Count for an array indicates how far (or close) the array is
from being sorted. If array is already sorted then inversion count is 0.
If array is sorted in reverse order that inversion count is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if
a[i] > a[j] and i < j.
Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).
Algorithm
Recurrence
Complexity

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 13 / 34
Problem

Count Inversions in an array


Inversion Count for an array indicates how far (or close) the array is
from being sorted. If array is already sorted then inversion count is 0.
If array is sorted in reverse order that inversion count is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if
a[i] > a[j] and i < j.
Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).
Algorithm
Recurrence
Complexity
For each element, count number of elements which are on right side
of it and are smaller than it.

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 13 / 34
Problem

Count Inversions in an array


Inversion Count for an array indicates how far (or close) the array is
from being sorted. If array is already sorted then inversion count is 0.
If array is sorted in reverse order that inversion count is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if
a[i] > a[j] and i < j.
Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).
Algorithm
Recurrence
Complexity
For each element, count number of elements which are on right side
of it and are smaller than it.
Naive algorithm: O(n2 ).

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 13 / 34
Problem

Count Inversions in an array


Inversion Count for an array indicates how far (or close) the array is
from being sorted. If array is already sorted then inversion count is 0.
If array is sorted in reverse order that inversion count is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if
a[i] > a[j] and i < j.
Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).
Algorithm
Recurrence
Complexity
For each element, count number of elements which are on right side
of it and are smaller than it.
Naive algorithm: O(n2 ).
Modified Mergesort: O(n log n).
Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 13 / 34
Search Data Structure

Operations:
insert: List good, Array bad
delete: List good, Array bad
find: Array good, List bad

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 14 / 34
Search Data Structure

Operations:
insert: List good, Array bad
delete: List good, Array bad
find: Array good, List bad
Use BST – Needs balancing:
Guaranteed Bound: AVL, 2-3-4 Tree, Red-Black Tree, B+-Tree
Probabilistic Bound: Randomized BST, Skip List
Amortised Bound: Splay

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 14 / 34
Search Data Structure

Operations:
insert: List good, Array bad
delete: List good, Array bad
find: Array good, List bad
Use BST – Needs balancing:
Guaranteed Bound: AVL, 2-3-4 Tree, Red-Black Tree, B+-Tree
Probabilistic Bound: Randomized BST, Skip List
Amortised Bound: Splay
Applications:
Associative containers like set, map
Indexing

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 14 / 34
Skip List

A skip list for a set S of distinct (key, element) items is a series of lists S0 ,
S1 , · · · , Sh such that
Each list Si contains the special keys +∞ and −∞
List S0 contains the keys of S in non-decreasing order
Each list is a subsequence of the previous one:

S0 ⊇ S1 ⊇ · · · ⊇ Sh

List Sh contains only the two special keys

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 15 / 34
Skip List

Searching in a Skip-List

2 linked list 2. n

3 linked list 3. 3 n

k linked list k. k n

log n linked list log n. log n n =
log n.n1/log n =
Θ(log n)

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 16 / 34
Problem

Searching in Higher Dimensional Space

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 17 / 34
Introduction Database queries
Kd-trees 1D range trees

Databases

Databases store records or objects

Personnel database: Each employee has a name, id code, date


of birth, function, salary, start date of employment, . . .

Fields are textual or numerical

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Database queries
G. Ometer
born: Aug 16, 1954
salary salary: $3,500

A database query may ask for


all employees with age
between a1 and a2 , and salary
between s1 and s2

19,500,000 19,559,999
date of birth
Computational Geometry Lecture 7: Kd-trees and range trees
Introduction Database queries
Kd-trees 1D range trees

Database queries

When we see numerical fields of objects as coordinates, a


database stores a point set in higher dimensions

Exact match query: Asks for the objects whose coordinates


match query coordinates exactly
Partial match query: Same but not all coordinates are
specified
Range query: Asks for the objects whose coordinates lie in a
specified query range (interval)

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Database queries

Example of a 3-dimensional 4,000


(orthogonal) range query:
children in [2 , 4], salary in
3,000
[3000 , 4000], date of birth in 4
[19, 500, 000 , 19, 559, 999] 2

19,500,000 19,559,999

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

1D range query problem

1D range query problem: Preprocess a set of n points on


the real line such that the ones inside a 1D query range
(interval) can be answered fast

The points p1 , . . . , pn are known beforehand, the query [x, x0 ]


only later

A solution to a query problem is a data structure, a query


algorithm, and a construction algorithm

Question: What are the most important factors for the


efficiency of a solution?

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Balanced binary search trees

A balanced binary search tree with the points in the leaves

49

23 80
10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Balanced binary search trees

The search path for 25

49

23 80
10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Balanced binary search trees

The search paths for 25 and for 90

49

23 80
10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Example 1D range query

A 1-dimensional range query with [25, 90]

49

23 80
10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Node types for a query

Three types of nodes for a given query:


White nodes: never visited by the query
Grey nodes: visited by the query, unclear if they lead to
output
Black nodes: visited by the query, whole subtree is
output

Question: What query time do we hope for?

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Node types for a query

The query algorithm comes down to what we do at each type


of node

Grey nodes: use query range to decide how to proceed: to


not visit a subtree (pruning), to report a complete subtree, or
just continue

Black nodes: traverse and enumerate all points in the leaves

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Example 1D range query

A 1-dimensional range query with [61, 90]

49
split node
23 80
10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Query time analysis

The efficiency analysis is based on counting the numbers of


nodes visited for each type
White nodes: never visited by the query; no time spent
Grey nodes: visited by the query, unclear if they lead to
output; time determines dependency on n
Black nodes: visited by the query, whole subtree is
output; time determines dependency on k, the output size

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Query time analysis

Grey nodes: they occur on only two paths in the tree, and
since the tree is balanced, its depth is O(log n)

Black nodes: a (sub)tree with m leaves has m − 1 internal


nodes; traversal visits O(m) nodes and finds m points for the
output

The time spent at each node is O(1) ⇒ O(log n + k) query


time

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Storage requirement and preprocessing

A (balanced) binary search tree storing n points uses O(n)


storage

A balanced binary search tree storing n points can be built in


O(n) time after sorting

Computational Geometry Lecture 7: Kd-trees and range trees


Introduction Database queries
Kd-trees 1D range trees

Result

Theorem: A set of n points on the real line can be


preprocessed in O(n log n) time into a data structure of O(n)
size so that any 1D range query can be answered in
O(log n + k) time, where k is the number of answers reported

Computational Geometry Lecture 7: Kd-trees and range trees


Problem

Searching Points in a Plane


You are given a set of points on a plane. The expected operations are:
Insert: Add a new point
Delete: Remove an existing point
Find points within the rectangle given by [Xlow · · · Xhigh ] ×
[Ylow · · · Yhigh ]
Nearest Neighbour Search: Find the point that is nearest to a given
input point

Partha Pratim Das (IIT, Kharagpur) Sorting & Searching March 18, 2015 34 / 34

You might also like