You are on page 1of 43

15-150 Fall 2014

Lecture 7
Stephen Brookes

Announcements
Read the notes!
Homework 3 due...

so far
sorting for integer lists

specifications and correctness

but what about efficiency?
work?
span?

Work
work is number of evaluation steps,
assuming a sequential processor

For a recursive function we can derive a


recurrence for its applicative work

Count basic arithmetic steps, etc



For dependent sub-expressions, add the work
dependent: one must be evaluated before the other
e1 + e 2

if e0 then e1 elsee2

Span
span is the number of evaluation steps,
assuming unlimited parallelism

For a recursive function we can derive a


recurrence for its applicative span

Count basic arithmetic steps, etc



For dependent sub-expressions, add the span

For independent sub-expressions, max the span
independent: neither depends on the other
so can be evaluated in parallel

mergesort
msort : int list -> int list

fun msort [ ] = [ ]

| msort [x] = [x]

| msort L = let

val (A, B) = split L

in

merge(msort A, msort B)

end

work to split
fun split [ ] = ([ ], [ ])
| split [x] = ([x], [ ])
| split (x::y::L) =
let val (A, B) = split L in (x::A, y::B) end
Let Wsplit(n) = work of split(L) when length(L)=n
Wsplit(n) = c0
for n=0, 1

Wsplit(n) = c1 + Wsplit(n-2)
for n>1
for some constants c0, c1
Closed form?

Wsplit(n) is O(n)

work to merge
fun merge (A, [ ]) = A

| merge ([ ], B) = B
| merge (x::A, y::B) = case compare(x, y) of

LESS => x :: merge(A, y::B)
| EQUAL => x::y::merge(A, B)
| GREATER => y :: merge(x::A, B)

Let Wmerge(n) = work of merge(A,B)



when length(A)+length(B)=n
Wmerge(n) is O(n)

work to sort
fun msort [ ] = [ ] | msort [x] = [x]
| msort L = let val (A, B) = split L in
merge (msort A, msort B) end
When length(L)=n>1, A and B have length n div 2

Let Wmsort(n) = work of msort(L) when length(L)=n


Wmsort(n) = c0

for n=0,1

Wmsort(n) = Wsplit(n) + 2Wmsort(n div 2) + Wmerge(n) + c1


= O(n) + 2Wmsort(n div 2)
for n>1
Wmsort(n) is O(n log n)

assessment
msort(L) does O(n log n) work,
where n is the length of L

List operations are inherently sequential


e ::e evaluates e first, then e

split and merge are not easily parallelizable

We could use parallel evaluation in msort(L)
1

for the recursive calls to msort A and msort B

How would this affect runtime?

msort L = let val (A, B) = split L in


merge (msort A, msort B) end
independent
work
For dependent sub-expressions, add the work

Wmsort(n) = Wsplit(n) + 2Wmsort(n div 2) + Wmerge(n)

span
For dependent sub-expressions, add the span

For independent sub-expressions, max the span
Smsort(n) = Ssplit(n) + Smsort(n div 2) + Smerge(n)

really?
Actually in SML tuples are sequential

(e , e ) evaluates e first, then e
1

To exploit parallel evaluation


we should really use sequences,

a data structure with parallel ops

e1, e2 allows parallel evaluation


(coming soon)

work and span


Wmsort(n) = O(n) + 2Wmsort(n div 2)

work

Wmsort(n) is O(n log n)


Smsort(n) = O(n) + Smsort(n div 2)
Smsort(n) is O(n)

span

O(n) O(n log n)


in principle, we can speed up msort

by a factor of log n

using parallel processing
To do even better, must use a different data structure

facts
span, span, work, eggs, bacon and span
work is the number of evaluation steps,
assuming sequential processing

span is the number of evaluation steps,
assuming unlimited parallelism

. span is always work


. For sequential code, span = work

next
Trees offer better support

for parallel evaluation

Sorting a tree

Specifications and proofs

Asymptotic analysis
Insertion
Parallel Mergesort

trees
datatype tree = Empty | Node of tree * int * tree

A user-defined type named tree



With constructors Empty and Node
Empty : tree

Node : tree * int * tree -> tree

tree values

An inductive definition

Every tree value is either Empty


or Node(t1, x, t2), where t1 and t2 are tree values
and x is an integer.
Contrast with integer lists:
Every integer list value is either nil
or x::L, where L is an integer list value
and x is an integer.

tree patterns
Empty
Node(p1, p, p2)
pattern

what it matches

Empty

empty tree

Node(_, _, _)

non-empty tree

Node(Empty, _, Empty)

tree with one node


Node(_, 42, _)

tree with 42 at root

tree patterns
Empty matches t iff t is Empty
Node(p1, p, p2) matches t iff

t is Node(t1, v, t2) such that

p1 matches t1, p matches v, p2 matches t2
and combines all the bindings

Node(A, x, B) matches 3
4

and binds x to 3,

A to Node(Empty,4,Empty)

B to Node(Empty,2,Empty)

structural induction for trees


To prove: For all tree values t, P(t) holds
by
structural
induction
on
t
!

Base case: Prove P(Empty).



Inductive case:

Assume Induction Hypothesis: P(t1) and P(t2).


Prove that, for all integer values x,
P(Node(t1, x, t2)) holds.

Thats enough! Why?


Contrast with

structural induction for lists

size
fun size Empty = 0

| size (Node(t1, _, t2)) = size t1 + size t2 + 1
Uses tree patterns
Recursion is structural

size(Node(t1, v, t2))

calls size(t1) and size(t2)

Easy to prove by structural induction


that for all tree values t,

size(t) = a non-negative integer
size(t) = number of nodes

size matters
For all trees t, size(t)0.

If t = Node(t , x, t ),
1

size(t1)<size(t) and size(t2)<size(t).


Many recursive functions on trees make

recursive calls on trees with smaller size.


Can use induction on size to prove


properties or analyze efficiency.

sorted trees
An inductive definition

Empty is sorted

Node(t1, x, t2) is sorted iff
say
t1 x t2

every integer in t1 is x,
every integer in t2 is x,
t1 and t2 are sorted
42
.

42
.

42
81
. .

14
81
. .

3. 14
57
99
. . .

3. 42
57
99
. . .

sorting a tree
If the tree is Empty, do nothing

Otherwise

(the tree has a root integer and two child subtrees)

(recursively) sort the two children, then


merge the sorted children, then
insert the root value

Well design helpers to insert and merge


merge will also need a helper

to split a tree in two

tree insertion
Ins : int * tree -> tree
REQUIRES t is a sorted tree
ENSURES Ins(x,t) is a sorted tree
consisting of x and all of t
fun Ins (x, Empty) = Node(Empty, x, Empty)

| Ins (x, Node(t1, y, t2)) =

case compare(x, y) of

GREATER => Node(t1, y, Ins(x, t2))

|
_
=> Node(Ins(x, t1), y, t2)
(contrast with list insertion)

value
equations
!
Ins(x, Empty) = Node(Empty, x, Empty)

!

Ins (x, Node(t1, y, t2)) = Node(t1, y, Ins(x, t2)) if x>y



= Node(Ins(x, t1), y, t2) if xy

These equations hold,



for all integer values x, y

and all tree values t1, t2
By definition of Ins

value
equations
!
Ins(x, Empty) = Node(Empty, x, Empty)

!

Ins (x, Node(t1, y, t2)) = Node(t1, y, Ins(x, t2)) if x>y



= Node(Ins(x, t1), y, t2) if xy
Ins(4 ,

1
.

6
2

5
. .

.
.

These equations hold,



for all integer values x, y

and all tree values t1, t2
By definition of Ins

value
equations
!

These equations hold,



for all integer values x, y

and all tree values t1, t2

Ins(x, Empty) = Node(Empty, x, Empty)



!

Ins (x, Node(t1, y, t2)) = Node(t1, y, Ins(x, t2)) if x>y



= Node(Ins(x, t1), y, t2) if xy
Ins(4 ,

1
.
.

6
2

5
. .

.
.

Ins(4,

2
. . .

6)

.
.

By definition of Ins

value
equations
!

These equations hold,



for all integer values x, y

and all tree values t1, t2

Ins(x, Empty) = Node(Empty, x, Empty)



!

By definition of Ins

Ins (x, Node(t1, y, t2)) = Node(t1, y, Ins(x, t2)) if x>y



= Node(Ins(x, t1), y, t2) if xy
Ins(4 ,

1
.
.

6
2

5
. .

.
.

Ins(4,

2
. . .

6)

1
.

2 Ins(4 ,5 ) .
. . . .

value
equations
!

These equations hold,



for all integer values x, y

and all tree values t1, t2

Ins(x, Empty) = Node(Empty, x, Empty)



!

Ins (x, Node(t1, y, t2)) = Node(t1, y, Ins(x, t2)) if x>y



= Node(Ins(x, t1), y, t2) if xy
Ins(4 ,

1
.
.

6
2

5
. .

.
.

Ins(4,

2
. . .

6)

.
.

By definition of Ins

value
equations
!

These equations hold,



for all integer values x, y

and all tree values t1, t2

Ins(x, Empty) = Node(Empty, x, Empty)



!

By definition of Ins

Ins (x, Node(t1, y, t2)) = Node(t1, y, Ins(x, t2)) if x>y



= Node(Ins(x, t1), y, t2) if xy
Ins(4 ,

1
.
.

6
2

5
. .

.
.

Ins(4,

2
. . .

=
6)

1
.

6
2

5
. 4 .
. .

value
equations
!

These equations hold,



for all integer values x, y

and all tree values t1, t2

Ins(x, Empty) = Node(Empty, x, Empty)



!

By definition of Ins

Ins (x, Node(t1, y, t2)) = Node(t1, y, Ins(x, t2)) if x>y



= Node(Ins(x, t1), y, t2) if xy
Ins(4 ,

1
.

6
2

5
. .

1
.

6
2

5
. 4 .
. .

value
equations
!

These equations hold,



for all integer values x, y

and all tree values t1, t2

Ins(x, Empty) = Node(Empty, x, Empty)



!

By definition of Ins

Ins (x, Node(t1, y, t2)) = Node(t1, y, Ins(x, t2)) if x>y



= Node(Ins(x, t1), y, t2) if xy
Ins(4 ,

1
.

6
2

5
. .

1
.

6
2

5
. 4 .
. .

value
equations
!

These equations hold,



for all integer values x, y

and all tree values t1, t2

Ins(x, Empty) = Node(Empty, x, Empty)



!

By definition of Ins

Ins (x, Node(t1, y, t2)) = Node(t1, y, Ins(x, t2)) if x>y



= Node(Ins(x, t1), y, t2) if xy
Ins(4 ,

1
.

6
2

5
. .

1
.

6
2

5
. 4 .
. .

value
equations
!

These equations hold,



for all integer values x, y

and all tree values t1, t2

Ins(x, Empty) = Node(Empty, x, Empty)



!

By definition of Ins

Ins (x, Node(t1, y, t2)) = Node(t1, y, Ins(x, t2)) if x>y



= Node(Ins(x, t1), y, t2) if xy
Ins(4 ,

1
.

6
2

5
. .

T sorted

1
.

6
2

. 4 .
. .
Ins(4, T) sorted

tree merge
Merge : tree * tree -> tree

REQUIRES t1 and t2 are sorted trees


ENSURES Merge(t1, t2) = a sorted tree

consisting of the items of t1 and t2
Merge (Node(l1,x,r1), t2) = ???!

We could split t2 into two subtrees (l2, r2),



then do Node(Merge(l1, l2), x, Merge(r1, r2))
But we need to stay sorted and not lose any data
So split should use the value of x

and build (l2, r2) so that l2 x r2 and

tree split
SplitAt : int * tree -> tree * tree

REQUIRES t is a sorted tree


ENSURES SplitAt(x, t) =

a pair of sorted trees (u1, u2) such that

t consists of the items in u1, u2

and u1 x u2
Not completely specific,
but thats OKAY!

Plan
Define SplitAt(t) using structural recursion

SplitAt(x, Node(t , y, t )) should


compare x and y

call SplitAt(x, -) on a subtree

build the result
1

SplitAt
SplitAt : int * tree -> tree * tree
REQUIRES t is a sorted tree
ENSURES SplitAt(x, t) = a pair of sorted trees (u1, u2)


such that u1 x u2 and t consists of the items in u1, u2
fun SplitAt(x, Empty) = (Empty, Empty)
|

SplitAt(x, Node(t1, y, t2)) =



case compare(y, x) of
GREATER =>
let val (l1, r1) = SplitAt(x, t1) in (l1, Node(r1, y, t2)) end
| _ =>
let val (l2, r2) = SplitAt(x, t2) in (Node(t1, y, l2), r2) end

Merge
Merge : tree * tree -> tree

REQUIRES t1 and t2 are sorted trees


ENSURES Merge(t1, t2) = a sorted tree

consisting of the items of t1 and t2
fun Merge (Empty, t2) = t2
| Merge (Node(l1,x,r1), t2) = !
let!
val (l2, r2) = SplitAt(x, t2)!
in
Node(Merge(l1, l2), x, Merge(r1, r2))
end

(as we promised!)

Msort
Msort : tree -> tree
REQUIRES true


ENSURES Msort(t) = a sorted tree


consisting of the items of t

fun Msort Empty = Empty


|

Msort (Node(t1, x, t2)) =


Ins (x, Merge(Msort t1, Msort t2))

(as we promised)

Correct?
Q: How to prove that Msort is correct?
A: Use structural induction.

First prove that the helper functions


Merge, SplitAt, Ins are correct.
Again use structural induction.

See
lecture
notes

The helper specs were carefully chosen to


make the proof of Msort straightforward.

(An easy structural induction, using the proven facts about helpers.)

Notes
Ins(x,t) is inductive on (structure of) t

SplitAt(x, t) is inductive on t

Merge(t , t ) is inductive on t

Msort(t) is inductive on t
1

look back at the code



and see why this is true