Coursera Algorithm Toolbox

Resources :
20-June >> Start Coursera
Big-O
Measuring Algorithms Perfromance - 1 (Arabic)

https://www.youtube.com/watch?
index=2&edufilter=NULL&list=PLPt2dINI2MIayAafeRHZPVhIoL7yZTyB9&v=EQzmtn4PzYQ
The Ultimate Big O Notation Tutorial (Time & Space

Complexity For Algorithms)
https://www.youtube.com/watch?edufilter=NULL&v=waPQP2TDOGE\
1.8.1 Asymptotic Notations Big Oh - Omega - Theta #1

https://www.youtube.com/watch?edufilter=NULL&v=A03oI0znAoc
Data Structures Tutorial #2 - Big O notation explained |

Measuring time complexity of an algorithm
https://www.youtube.com/watch?edufilter=NULL&v=IR_S8BC8KI0
#01 [Data Structures] - Complexity

https://www.youtube.com/watch?edufilter=NULL&v=sHhVsGQz9MI
#2.1- Time Complexity Analysis: Frequency Count | ‫بالعربي‬

https://www.youtube.com/watch?edufilter=NULL&v=Day3_mw1F-Y
‫ مثلة عن كيفية حساب الـ‬Big O complexity

https://www.youtube.com/watch?edufilter=NULL&v=Jv0_HVdVdJw
LECTURE 1 BIG Oh
https://www.youtube.com/watch?edufilter=NULL&v=sblr6SXgyLA
Introduction to Big O Notation and Time Complexity (Data
Structures & Algorithms #7)
https://www.youtube.com/watch?edufilter=NULL&v=D6xkbGLQesk
Welcome!
Welcome
Video: LectureWelcome!
3 min
Reading: Companion MOOCBook
10 min
Reading: What background knowledge is necessary?
10 min
Programming Assignment 1: Programming Challenges
Video: LectureSolving the Sum of Two Digits Programming Challenge (screencast)
6 min
Purchase a subscription to unlock this item.
Programming Assignment: Programming Assignment 1: Sum of Two Digits
1h
(OPTIONAL) Solving The Maximum Pairwise Product Programming Challenge in C++
Reading: Optional Videos and Screencasts
10 min
Video: LectureSolving the Maximum Pairwise Product Programming Challenge: Improving the Naive Solution,
Testing, Debugging
13 min
Video: LectureStress Test - Implementation
8 min
Video: LectureStress Test - Find the Test and Debug
7 min
Video: LectureStress Test - More Testing, Submit and Pass!

8 min
Reading: Alternative testing guide in Python
10 min
Maximum Pairwise Product Programming Challenge
Reading: Maximum Pairwise Product Programming Challenge
10 min
Practice Quiz: Solving Programming Challenges
5 questions
Programming Assignment: Programming Assignment 1: Maximum Pairwise Product
2h
Using PyCharm to solve programming challenges (optional experimental feature)
Reading: Using PyCharm to solve programming challenges
10 min
Acknowledgements (Optional)
Reading: Acknowledgements
2 min
Algorithms are everywhere. Whether you are writing software, analyzing a genome, predicting traffic
jams, producing automatic movie recommendations, or just surfing the Internet, you're dealing with
algorithms. Every single branch of computer science uses algorithms, so a course on algorithms and
data structures is an essential part of any CS curriculum. >> It's important that the algorithms we use
are efficient as users want to see the search results in a blink of an eye even if they search through
trillions of web pages. A poorly thought out algorithm could take literally centuries to process all the
webpages indexed by a search engine or all the Facebook posts. And thus, algorithmic improvements
are necessary to make these systems practical.
Play video starting at 43 seconds and follow transcript0:43
That's why tech companies always ask lots of algorithmic questions at the interviews. >> In data
science problems, like ranking internet search results, predicting road accidents, and recommending
movies to users, advanced algorithms are used to achieve excellent search quality, high prediction
accuracy, and to make relevant recommendations. However, even for a simple machine learning
algorithm like linear regression to be able to process big data is usually a challenge. When advanced
algorithms such as deep neural networks are applied to huge data sets they make extremely accurate
predictions. Recently starting to even outperform humans in some areas of vision and speech
recognition, but getting those algorithms to work in hours instead of years on a large dataset is hard.
And performing experiments quickly is crucial in data science. >> Algorithms are everywhere. Each of
trillions of cells in your body executes a complex and still poorly understood algorithm.
Play video starting at 1 minute 40 seconds and follow transcript1:40
And algorithm are the key for solving important biomedical problems such as what are the mutations
that differentiate you from me and how is it they relate to diseases. In this specialization you will
learn the theory behind the algorithm. Implement algorithm in the programming language of your
choice and apply them to solving practical problems such as assembling the genome from millions of
tiny fragments, the largest jigsaw puzzle ever assembled by humans. >> To conclude, algorithms are
everywhere. And it is important to design your algorithms and to implement them. To turn you into a
pro in algorithm design we will give you nearly 100 programming assignments in this class. Your
solutions will be checked automatically, and you will learn how to implement, test, and debug fast
algorithms solving large and difficult problems in seconds. We look forward to seeing you in this class.
We know it will make you a better programmer.
Play video starting at 2 minutes 43 seconds and follow transcript2:43
>> Algorithms are everywhere. In fact you just saw five algorithms solving the fundamental sorting
problem in computer science, and they all have different running times. In fact while four of them are
about to finish one will take a much longer time. In this specialization you'll be able to implement all
of these algorithms, and master the skill of answering both algorithmic and programming questions at
your next interview.
Companion MOOCBook
We invite you to use the following companion book for the specialization:
Alexander Kulikov and Pavel Pevzner. Learning Algorithms through Programming and Puzzle
Solving. 2018.
The book includes:
 some theory on algorithm design techniques;

 links to interactive puzzles that provide you with a fun way to invent the key algorithmic
concepts yourself;
 detailed descriptions of all programming challenges in “Algorithmic Toolbox”;
 description of good programming practices that will help you to avoid many frequently
made mistakes when implementing algorithms;
 detailed solutions, with Python code, of the following problems in the “Algorithmic
Toolbox”: Last Digit of the Sum of Fibonacci Numbers, Collecting Signatures, Organizing a
Lottery, and Maximum Amount of Gold;
 hints for many other problems.
Order the book through Amazon (printed, kindle), Leanpub (pdf, mobile friendly pdf),

or MyBookOrders (printed). Browse sample pages (including table of contents): printed
version, kindle version, leanpub pdf version, leanpub mobile friendly pdf version.
What background knowledge is necessary?
1. Basic knowledge of at least one programming language: Python, C++, Java, C#,
Javascript, C, Haskell, Ruby, Rust, Scala.
We expect you to be able to implement programs that: 1) read data from the standard input (in
most cases, the input is a sequence of integers); 2) compute the result (in most cases, a few loops
are enough for this); 3) print the result to the standard output. For each programming challenge in
this course, we provide starter solutions in C++, Java, and Python. The best way to check whether
your programming skills are enough to go through problems in this course is to solve two
problems from the first week. If you are able to pass them (after reading our tutorials), then you
will definitely be able to pass the course.
2. Basic knowledge of discrete mathematics: proof by induction, proof by contradiction.
Knowledge of discrete mathematics is necessary for analyzing algorithms (proving correctness,

estimating running time) and for algorithmic thinking in general. If you want to refresh your discrete
mathematics skills, we encourage you to go through our partner specialization — Introduction to
Discrete Mathematics for Computer Science. It teaches the basics of discrete mathematics in try-
this-before-we-explain-everything approach: you will be solving many interactive puzzles that were
carefully designed to allow you to invent many of the important ideas and concepts yourself.
The previous video ended on an unsuccessful attempt to submit our solution to the max pairwise
product problem. Even worse, we don't know what is the test on which our program fails because the
system doesn't show it to us. This is actually a pretty standard situation when solving algorithmic
programming assignments. In this, and the next few videos, we will learn how to overcome this
situation and to even avoid it in the first place. I will explain to you and show you in action a powerful
technique called stress testing. By the end, you will be able to implement a stress test for your
solution of an algorithmic problem, use it to find a small and easy test on which your program fails,
debug your program, and make sure it works correctly afterwards with high confidence. Also, you will
be able to list and apply the set of standard testing techniques, which should be always applied when
solving algorithmic programming assignments. So what is stress testing? In general, it is a program
that generates random tests in an infinite loop, and for each test, it launches your solution on this test
and an alternative solution on the same test and compares the results. This alternative solution you
also have to invent and implement yourself, but it is usually easy, because it can be any simple, slow,
brute force solution, or just any other solution that you can come up with. The only requirement is
that it should be significantly different from your main solution.
Then you just wait until you find a test on which your solutions differ. If one of them is correct and
another is wrong, then it is guaranteed to happen, because there is some test for which your wrong
solution gives the wrong answer, and your correct solution gives correct answer, and so they differ. If,
however, both of your solutions are wrong, which also happens often, they are still almost
guaranteed to have some test on which one of them gives wrong answer and another one gives
correct answer because they're probably wrong in different places. When you find a test on which
your solutions' answers differ, you can determine which one of them returns wrong answer and
debug it, fix it, and then repeat the stress testing. Now let's look at the practical implementation. I've
already implemented the stress test for this problem. It is in the file called stress_test.cpp. Let's look
into that. So it is almost the same as the solution that we've sent in the previous video, but I've added
some things. First, we add this #include <cstd.lib>. And this include just allows us to use a part of
standard library to generate some random numbers. And we will use it to generate some random
tests automatically. Then we have the same code of two functions, MaxPairwiseProduct and
MaxPairwiseProductFast, which we used in our last solution which was submitted in the system. But
now in the main function, we have a whole additional while loop.
Here it is, and this is where the stress test itself is. So what do we do in principle is we generate some
random tests, then we launch both solutions, MaxPairwiseProduct and MaxPairwiseProductFast on
this random test, and we compare the results. And the idea is if you have a correct solution and
another correct solution and the correct answer for your problem is the only correct answer, then any
two correct solutions will give the same answers for any test. And if you have some wrong solution
and some correct solution, then on some tests, their answers will differ. And also if you have two
wrong solutions, then probably they're wrong in a different way, and then there will also be some
test, hopefully, on which their answers differ. If you generate a lot of tests, then with some
probability, at some point, you will generate a test for which the answers of the solutions differ. You
can detect that situation, and then look at the input test and at the answers.
And you can determine which of the algorithms was right, and which was wrong, maybe both were
wrong. But anyway, you will find at least one of the algorithms which are wrong because if their
answers are different, then at least one of them gives wrong answer. And then you will be able to
debug that algorithm, fix the bug, and then run the stress test again. And either, you will again find
some difference, or you won't find any difference anymore, and then hopefully, you fixed all the bugs
in all the solutions, and you can submit it. So how it works in practice. First, we need to generate the
test for our problem. We'll start with generating number n, the number of numbers. And our problem
states that n should be at least 2. So we first generate a big random number using function rand. Then
we take it modulo 10, and it gives us some random number between 0 and 9, and then we add 2. And
so we get a random number between 2 and 11. Why is it so small? Well, we first want both our
solutions to work fast enough. And also if we create a big random test, it will be hard for us to debug
it, so we start with some relatively small value of n. We immediately output it on the screen, so that if
we find some tests for which our solution is wrong, we immediately see it on the screen.
After generating n, we should generate the array of numbers, a, itself. So we iterate n times, and we
add random numbers from 0 to 99,999 to the end of array a. So these are the numbers in the range
which is allowed.
And then we also output all these numbers in one line, separated by spaces and a newline character.
So by this time, we've already output the whole input test on the screen. Now what we need to do is
actually need to launch both our solutions on this input test and get two results, the result of the
main solution and the result of the fast solution. After that, we compare those two results. If they are
different, it means that at least one of the solutions was wrong. So we output words Wrong answer
on the screen, and we also output the first result, a space, and the second result and a newline
character. After that, we break our loop. We can notice that our while loop is a so-called infinite loop,
but we actually end it as soon as we find some test for which our solutions differ. If we don't find the
different sum test, we just output the word OK on the screen, to denote that actually both solutions
have already computed the answers, but the answers are the same. And then we continue. So we
continue our infinite loop, in the search of the test that will save us, that we can use to debug our
solution. And we wrote that code just in front of all the code to read the numbers from the input, to
compute the answer, and to output it to the screen. So we basically inserted this code in our regular
program. But instead of reading numbers from the input, it will first try to find the test for which our
two solutions differ. And in the next video, we will launch this stress test to find a test in which our
solutions differ, debug our fast solution, and fix it.
Alternative testing guide in Python
Test_your_solutions.pdf
Maximum Pairwise Product Programming Challenge
This section contains the Maximum Pairwise Product Programming Challenge. You can go ahead
and solve this programming challenge. However, if you encounter any problems with it, please go
through the previous optional section with videos describing how to solve it, test and debug your
solutions in C++. It also contains several very useful techniques for testing and debugging your
solutions to programming challenges in genereal. The next quiz will help you determine whether
you're ready to proceed with solving programming challenges, or is it better to go through the
previous optional section beforehand.
Solving Programming Challenges
TOTAL POINTS 5
1.Question 1
What will you typically need to implement yourself in the programming assignments if you program
in C++, Java or Python?
1 point
Reading input, writing output and the solution to the problem.
Just writing the output.
Just reading the input.
Just the solution of the problem.
2.Question 2
Your program in C, C++ or Java thinks that the product of
numbers 5000050000 and 5000050000 is equal to -1794967296−1794967296. What is the
most probable reason?
1 point
Compiler error.
The problem statement is wrong.
Integer overflow.
The input data is incorrect.
3.Question 3
Which tests should you perform before submitting a solution to the programming assignment?
1 point
Just submit the program and see if it passes the assignment.
Test on the examples from the problem statement. Then make a few other small tests, solve them
manually and check that your program outputs the correct answer. Generate a big input and
launch your program to check that it works fast enough and doesn't consume too much memory.
Test for corner cases: smallest allowed values and largest allowed values of all input parameters,
equal numbers in the input, very long strings, etc. Then make a stress test. After all these tests
passed, submit the solution.
Just check that the answers for the examples from the problem statement are correct.
Test on the examples from the problem statement. Then make a few other small tests, solve them
manually and check that your program outputs the correct answer. After all these tests passed,
submit the solution.
4.Question 4
Where does the input data come from when you implement a stress test?
1 point
You generate valid input data as a part of the stress test implementation.
You enter the input data manually.
You download and use the tests we've prepared to check your solution to the problem.
5.Question 5
If you submit a solution of a programming assignment, but it does not pass some of the tests,
what feedback will you get from the system?
1 point
If it is one of the first few tests, you will see the input data, the answer of your program and the
correct answer. Otherwise, you will only see either that the answer of your program is wrong or
that your program is too slow or that your program uses too much memory.
You will only get the feedback that your program either passed or did not pass.
You will see the input data, the answer of your program, the correct answer, how long did your
program work and how much memory did it use for each of the tests.
I understand that submitting work that isn’t my own may result in permanent failure of this course
or deactivation of my Coursera account. Learn more about Coursera’s Honor Code
Using PyCharm to solve programming challenges
If your primary programming language is Python, we encourage you to install the PyCharm Edu
IDE and try it (after installing, select Learn -> Start Coursera Assignment -> Algorithmic Toolbox).
The beautiful PyCharm IDE will allow you to code like a pro:
 implement a solution, implement unit tests and stress tests for it, and run the tests in the
IDE;
 use visual debugging tools;
 use various smart features of the IDE: code inspections, autocompletion, refactoring;
 when you are happy with your implementation, submit it to Coursera.
We hope that PyCharm Edu will make your learning process smooth and enjoyable! Please use
this forum thread to leave feedback, suggest improvements, and report bugs.
Week2
Algorithmic Warm-up
In this module you will learn that programs based on efficient algorithms can solve the same
problem billions of times faster than programs based on naïve algorithms. You will learn how to
estimate the running time and memory of an algorithm without even implementing it. Armed with
this knowledge, you will be able to compare various algorithms, select the most efficient ones, and
finally implement them as our programming challenges!
Key Concepts
 Estimate the running time of an algorithm
 Practice implementing efficient solutions
 Practice solving programming challenges
 Implement programs that are several orders of magnitude faster than straightforward
programs
Why Study Algorithms?
Video: LectureWhy Study Algorithms?
7 min
Resume
. Click to resume
Video: LectureComing Up
3 min
Fibonacci Numbers
Video: LectureProblem Overview
3 min
Video: LectureNaive Algorithm
5 min
Video: LectureEfficient Algorithm
3 min
Reading: Resources
2 min
Greatest Common Divisor
Video: LectureProblem Overview and Naive Algorithm
4 min
Video: LectureEfficient Algorithm
5 min
Reading: Resources
2 min
Big-O Notation
Video: LectureComputing Runtimes
10 min
Video: LectureAsymptotic Notation
6 min
Video: LectureBig-O Notation
6 min
Video: LectureUsing Big-O
10 min
Notebook: Big-O Notation: Plots
1h
Reading: Resources
2 min

Practice Quiz: Logarithms
6 questions
Practice Quiz: Big-O
7 questions
Practice Quiz: Growth rate
2 questions
Course Overview
Video: LectureCourse Overview
10 min
Programming Assignment 2
Programming Assignment: Programming Assignment 2: Algorithmic Warm-up
2h 30m
Due Jun 14, 11:59 PM PDT

Hello everybody. I'm Daniel Kane. Welcome to the data structures and algorithms specialization. For
this very first lecture, we're going to start at the very beginning and talk about why do you need to
study algorithms in the first place. So the basic goal in this lecture is to sort of talk about what are the
sorts of problems that we're going to be discussing in this algorithms class and why they're important.
And in the context of doing this, we're also going to discuss some problems that you might run into
when writing computer programs that might not actually require sophisticated techniques that we'll
be discussing in this course. And on the other hand, we'll discuss some other sorts of problems that
you might want to solve that go beyond the sort of material that we will be talking about here.
So, to begin with, suppose that you're writing a computer program. There are a lot of tasks that you
might want to perform that you don't really need to think about very hard. These are things like
displaying a given text on the screen, or copying a file from one location to another, or searching a file
for a given word. Each of these algorithms has essentially a linear scan. You go through every word in
the file, one at a time and you do the appropriate thing. And for each of these problems there's
essentially a linear scan
that you really can't do much better than. In order to do whatever task it is you're doing, you have to
go through all the data one at a time and process it appropriately. And so when you do more or less
the obvious thing, you have a program that works. It solves the problem that you need. And it does so
approximately as efficiently as you could expect.
So for these sorts of problems you might not have to think very hard about what algorithm you are
using.
On the other hand, there are some other problems, actual algorithms problems, where it's not so
clear what it is you need to do.
For example, you might be given a map and need to find the shortest path between two locations on
this map. Or you might be given, you might be trying to find the best pairing between students and
dorm rooms given some sort of list of preferences, or you might be trying to measure the similarity of
two different documents.
Now, for these problems it's a lot more complicated, it's not immediately clear how to solve these
problems. And even when you do come up with solutions, often the simple solutions to these
problems are going to be far too slow. You could end up with some simple algorithm, you could try all
possible pairings between people and dorm rooms and return the one that optimizes some function
that you're trying to deal with. On the other hand, if you did that, it would probably take a very, very,
very long time. And you might not have enough time to wait, and so you might need to do something
better.
And then even once you have a reasonably efficient algorithm for these problems, there's often a lot
of room for further optimization. Improve things so that things run in an hour rather than a day. Or a
minute rather than an hour. Or a second rather than a minute. And all of these improvements will
have a large effect on how useful this program you've written is.
Now, on the other hand, there are some things that you might want to try and do with your computer
that go a little bit beyond the sort of things we're discussing in this course. We might want to call
these Artificial Intelligence Problems. And these are problems where it's sort of hard to clearly state
what it is that you're trying to do.
An example of this might be, to try and write a computer program to understand natural language.
That is, write a program where I can type something in, some English sentence, asking it, you know,
what's the price of milk at the local food store today? And you want the computer to then take this
sentence that I wrote, and figure out what it means, figure out some way to parse it. And then do an
appropriate lookup and return a useful answer to me. And the problem with doing this isn't so much
that anything involved here is actually difficult to perform, but the problem is that fundamentally we
don't really understand what it means to interpret an English sentence. Now, I mean we can all speak
English, hopefully, if you're listening to this lecture, but we don't sort of really fundamentally
understand what it means. It's hard to put it into precise enough language that you can actually write
a computer program to do that.
Now you have similar problems, like if you want to identify objects in a photograph. You've got a
picture, with maybe with a dog and tree and a cloud and you want the computer to identify what's
what. Then once again, this is a thing that our brains have gotten very good at doing, and we
understand what the question is. However, it's hard to really put into words how you identify that this
thing's a dog and this thing's a tree. And this sort of business makes it very difficult to teach a
computer to do the same thing.
Another thing that you might want to do is teach a computer to play games well like play chess
effectively. And, once again, this is a thing where we can sort of identify what it means to do this. But,
actually how you want to do it, there's a lot of sort of very vague, intuitive things that go on there. It's
not a clearly defined problem that you're trying to solve.
And so, for all of these problems sort of the difficulty is not so much that it's hard to do things quickly.
But it's hard to even state what it is that you're trying to do and figure out how to approach it. Now,
these are problems that we're not going to really cover in this class, we're going to focus on
algorithms, how to do things quickly and efficiently. But if you do want to get into AI and want to try
and solve these problems, it will be very important that you have a solid grounding in algorithms, so
that once you have some idea of what does it mean to identify trees in pictures, you will have an idea
of what sort of algorithms can actually support these ideas, which sort of ideas you can actually
implement in a reasonable amount of time.
And so, what we're going to focus on in this course are the algorithms problems. So, we want
problems that are cleanly formulated, like clear mathematical problems. And some of the things we
looked at maybe aren't immediately clear, like if you want to find the shortest route between two
points on a map, that's not immediately a math problem. But, pretty quickly you can interpret it as
such. You can say, well, I want some sequence of intersections, I'm traveling between such that each
pair is connected by a road, and the sum of the lengths of the roads is as small as possible. And so,
pretty quickly this just becomes a problem where we can very clearly state what it is that we're trying
to do but, for which it is still nontrivial to solve it. And so, that's the sort of thing we're going to be
talking about in this class. And, hopefully, by the end of it you will have a good idea of how to solve
these problems, how to write programs that will solve them very quickly and very efficiently. And
that's what we'll be talking about. I hope you enjoy the rest of the class.
Hello everybody! Welcome back. Today we're going to start talk about Fibonacci Numbers and
algorithms to compute them. In particular, in this lecture we're just going to introduce the sequence
of the Fibonacci numbers and talk a little bit about their properties. So to begin with the Fibonacci
numbers is a fairly classically studied sequence of natural numbers. The 0th element of the sequence
is 0. The first element is 1. And from thereon, each element is the sum of the previous two. So 0 and 1
is 1. 1 + 1 is 2. 1 + 2 is 3. 2 + 3 is 5. And the sequence continues, 8, 13, 21, 34, and so on.
So it's a nice sequence of numbers defined by some pretty simple recursive rule and it's interesting for
a number of reasons. It has some interesting number theoretic properties, but originally this
sequence was developed by an Italian mathematician as a mathematical model. And it's a little bit
weird. You might try and wonder what sorts of things this could be a model for.
Well, it turns out that, originally, this was used as sort of a mathematical model for rabbit
populations. There was some idea that if you had a pair of rabbits, it would take them one generation
to mature and every generation thereafter, they'd produce a pair of offspring. And if you work out
what this means, then you find out the Fibonacci numbers, tell you how many pairs of rabbits you
have after n generations.
Now, because rabbits are known for reproducing rather quickly, you might assume that the sequence
therefore grows quickly, and in fact it does. It's not hard to show that the nth Fibonacci number is at
least 2 to the n over 2 for all n at least 6. And the proof can be made by induction. You prove this
directly for n 6 or 7 just by computing the numbers and showing that they're big enough. And after
that point, Fn is the sum of Fn-1 and Fn-2. By the inductive hypothesis, you bound that below and do
a little bit of arithmetic. And it's bounded below by 2 to the n/2. So that completes the proof.
In fact, with a little bit more work you can actually get a formula for the nth Fibonacci number as
roughly 1 point square root of 5 over 2 to the n. These things grow exponentially quickly.
And to sort of drive that home a little bit more, we can look at some examples. The 20th Fibonacci
number is 6765. The 50th Fibonacci number is approximately 12 billion. The 100th Fibonacci number
is much, much bigger than that. And the 500th Fibonacci number is this monster with something like
a 100 digits to it. So these numbers do get rather large quite quickly.
So the problem that we're going to be looking into for the next couple of lectures is, how do you
compute Fibonacci? So, if you want to use them to model rabbit populations or because of some
number theoretic interest. We'd like an algorithm that as input takes a non negative integer n and
returns the nth Fibonacci number. And we're going to talk about how you go about doing this. So,
come back next lecture and we'll talk about that.
Hello everybody. Welcome back. Today we'll talk a little bit more about how to compute Fibonacci
numbers. And, in particular, today what we're going to do is we're going to show you how to produce
a very simple algorithm that computes these things correctly. On the other hand, we're going to show
that this algorithm is actually very slow, and talk a little bit about how to analyze that.
So let's take a look at the definition again. The zero'th Fibonacci number is 0. The first Fibonacci
number is 1. And from there after each Fibonacci number is the sum of the previous two.
Now these grow pretty rapidly, and what we would like to do is have an algorithm to compute them.
So let's take a look at how we might do this. Well, there's a pretty easy way to go about it, given the
definition. So if n is 0, we're supposed to return 0. And if n is 1, we're supposed to return 1. So we
could just start with a case that says if n is at most 1, we're going to return n.
Otherwise what are we supposed to do? Otherwise, we're supposed to return the sum of the n- 1,
and n- 2 Fibonacci numbers. So we can just compute those two recursively, add them together, and
return them. So, this gives us a very simple algorithm four lines long that basically took the definition
of our problem and turned it into an algorithm that correctly computes the thing it's supposed to.
Good for us. We have an algorithm and it works. However, in this course, we care a lot more than
just, does our algorithm work? We also want to know if it's efficient, so we'd like to know how long
this algorithm takes to run, and there's sort of a rough approximation to this. We're going to let T(n)
denote the number of lines of code that are executed by this algorithm on input n.
So to count this is actually not very hard. So if n is at most one, the algorithm checks the if case, goes
to the return statement, and that's two lines of code.
Play video starting at 2 minutes 1 second and follow transcript2:01
Not so bad.
If n is at least two, we go to the if case. We go to the else condition, and then run a return statement.
That's three lines of code.
However ,in this case we also need to recursively compute the n-1, and n-2 Fibonacci numbers. So we
need to add to that however many lines of code those recursive calls take.
So all in all though, we have a nice recursive formula for T(n). It's two as long as n is at most one.
And otherwise, it's equal to T(n) minus one plus T(n) minus two plus three. So a nice recursive
formula.
Now, if you look at this formula for a little bit, you'll notice that it looks very similar to the original
formula that we used to define the Fibonacci numbers. Each guy was more or less the sum of the
previous two.
And in fact, from this you can show pretty easily that T( n) is at least the n'th Fibonacci number for all
n.
And this should be ringing some warning bells because we know that the Fibonacci numbers get very,
very, very large, so T(n) must as well. In fact, T(100) is already 1.77 times 10 to the 21. 1.77 sextillion.
This is a huge number. Now, suppose we were running this program on a computer that executed a
billion lines of code a second. It ran it at a gigahertz. It would still take us about 56,000 years to
complete this computation.
Now, I don't have 56,000 years to wait for my computer to finish. You probably don't either, so this
really is somehow not acceptable, if we want to compute Fibonacci numbers of any reasonable size.
So what we'd really like is we'd like a better algorithm. And we'll get to that next lecture. But first we
should talk a little bit about why this algorithm is so slow.
And to see that, maybe the clearest way to demonstrate it is to look at all of the recursive calls this
algorithm needs in order to compute its answer.
So, if we want to compute the n'th Fibonacci number, we need to make recursive calls to compute
the n-1,and n-2 Fibonacci numbers. To compute the n-1, we need the n-2 to the n-3. To compute the
n-2, we need the n-3, and n-4, and it just keeps going on and on. From there we get this big tree of
recursive calls.
Now if you'll look at this tree a little bit closer, it looks like we're doing something a little bit silly.
We're computing Fn-3, three separate times in this tree.
And the way with our algorithm works, every time we're asked to compute it, since this is a new
recursive call, we compute the whole thing from scratch. We recompute Fn-4, and Fn-5, and then,
add them together and get our answer. And it's this computing the same thing over and over again
that's really slowing us down. And to make it even more extreme, let's blow up the tree a little bit
more. Fn-4 actually gets computed these five separate times by the algorithm. And as you keep going
down more and more and more times, are you just computing the same thing over and over again?
And this is really the problem with this particular algorithm, but it's not clear immediately that we can
do better. So, come back next lecture and we'll talk about how to get around this difficulty, and
actually get a fairly efficient algorithm.
Hello, everybody, welcome back. We're still talking about algorithms to compute Fibonacci numbers.
And in this lecture, we're going to see how to actually compute them reasonably efficiently.
So, as you'll recall, the Fibonacci numbers was the sequence zero, then one, then a bunch of
elements, each of which is the sum of the previous two.
We had a very nice algorithm for them last time, which unfortunately was very, very slow, even to
compute the 100th Fibonacci number say. So we'd like to do better. And maybe you need some idea
for this new algorithm. And one way to think about it is what do you do when you compute them by
hand. And in particular, suppose we want to write down a list of all the Fibonacci numbers.
Well, there's sort of an obvious way to do this. You start off by writing zero and one because those are
the first two. The next one going to be zero plus one, which is one. The next one is one plus one which
is two, and one plus two, which is three, and two plus three, which is five. And at each step, all I need
to do is look at the last two elements of the list and add them together. So, three and five are the last
two, I add them together, and I get eight. And, this way, since I have all of the previous numbers
written down, I don't need to do these recursive calls that I was making in the last lecture, that were
really slowing us down. So, let's see how this algorithm works. What I need to do is I need to create an
array in order to store all the numbers in this list that I'm writing down. The zeroth element of the
array gets set to zero, the first element gets set to one, that's to set our initial conditions. Then as i
runs from two to n, we need to set the ith element to be the sum of the i minus first and i minus
second elements. That correctly computes the ith Fibonacci number.
Then, at the end of the day, once I've filled out the entire list, I'm going to return the last element
after that.
So, now we can say, this is another algorithm, it should work just as well, but, how fast is it? Well,
how many lines of code did you use? There are three lines of code at the beginning, and there's a
return statement at the end, so that's four lines of code.
Next up we have this for statement that we run through n minus one times, and each time we have to
execute two lines of code. So adding everything together we find out that t of n is something like 2n
plus two. So if we wanted to run this program on input n equals 100, it would take us about 202 lines
of code to run it. And 202 is actually a pretty small number even on a very modest computer these
days. So essentially, this thing is going to be trivial to compute the 100th or the 1,000th or the
10,000th Fibonacci number on any reasonable computer. And this is much better than the results that
we were seeing in the last lecture.
So in summary, what we've done in this last few lectures, we've talked about the Fibonacci numbers,
we've introduced them. We've come up with this naive algorithm, this very simple algorithm that
goes directly from the definition, that unfortunately takes thousands of years, even on very small
examples to finish.
On the other hand, the algorithm we just saw is much better, it's incredibly fast even on fairly large
inputs and it works quite well in practice.
And so, the moral of this story, the thing to really keep in mind is that in this case and in many, many
others the right algorithm makes all the difference in the world. It's the difference between an
algorithm that will never finish in your entire lifetime and one that finishes in the blink of an eye.
And so, that's the story with Fibonacci numbers. Next lecture we're going to talk about a very similar
story that comes with computing greatest common divisors. So I hope you come back for that. Until
then, farewell.
Resources
Slides
As usual, slides of the lectures can be downloaded under the video or under the first video of the
corresponding lesson.
Reading
Computing Fibonacci numbers: Section 0.2 of [DPV08]
If you find this lesson difficult to follow

If you need to refresh your knowledge of recursion: Section on recursion at Algorithms class by
Tom Cormen and Devin Balkcom at Khan Academy
Visualizations
Computing Fibonacci numbers by David Galles
To better appreciate the difference between polynomial time and exponential time algorithms, try
computing F_{20}F20 using this visualization. For this, enter "20" into the field and press
"Fibonacci Recursive". This calls a recursive algorithm that makes an endless number of recursive
calls. This call will never end even if you increase the visualization speed to maximum. Stop this
call by pressing "Skip Forward" and press "Fibonacci Table". This will call an iterative algorithm
that uses an array to compute Fibonacci numbers efficiently. The third button calls a recursive
algorithm with memoization. We will cover such algorithms in the Dynamic Programming module
later in this class.
(Note that the visualization uses a slightly different definition of Fibonacci numbers:
there, F_0=F_1=1F0=F1=1, and in the lecture, F_0=0,F_1=1F0=0,F1=1. This, of course, has
no influence on the running time.)
Advanced Reading
Properties of Fibonacci numbers: Exercises 0.2–0.4 in [DPV08]
References
[DPV] Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani. Algorithms (1st Edition).
McGraw-Hill Higher Education. 2008.
Greatest Common Divisor
Problem Overview and Naive Algorithm

Hello everybody. Welcome back. Today, we're going to be talking about computing greatest common
divisors. So, in particular, what we'd like to do this lecture, is we're going to define the greatest
common divisor problem. And, we're going to talk about an inefficient way to compute them. And,
next lecture we'll talk about how to do better. So, okay. What are GCDs?
So, suppose that you have a fraction, a over b. And, you want to put it in simplest form. Now, the
standard way of doing this, is we want to divide the numerator and denominator both by some d, to
get some equivalent fraction, a/d / b/d. Fair enough. Now, what d do we want to use for this? Well, it
needs to satisfy two properties.
Firstly, d had better divide both a and b, since the new numerator and denominator are both integers.
But, subject to that, we would like this d to be as large as possible. So, that we can reduce the fraction
as much as we possibly can.
Play video starting at 1 minute 1 second and follow transcript1:01
So, turning this into a definition, we say that for two integers, a and b, their greatest common divisor,
or GCD, is the largest integer d that divides both a and b. Okay, so this is a thing that you use to
reduce fractions. However, it turns out that GCDs are a critically important concept in the field of
number theory. The study of prime numbers, and factorization, and things like that. And, because it's
so important to number theory, it turns out that being able to compute GCDs is actually very
important in cryptography. And, the fact that you can perform secure online banking is, in part, due to
the fact that we can efficiently compute GCDs of numbers in order for our cryptographic algorithms to
work.
So, because of this importance, we're going to want to be able to compute GCDs. So, we'd like an
algorithm that, given two integers, a and b, at say, at least 0, we can compute the GCD of a and b.
And, just to be clear as to what kinds of inputs we care about, we'd actually like to be able to run this
on very large numbers. We don't just want something that works for GCD of 5 and 12, or 11 and 73.
We'd like to be able to do things like the GCD of 3,918,848 with 1,653,264. In fact, we'd also like to be
able to compute much bigger numbers, 20, 50, 100, 1000 digits. We'd still like to be able to get GCDs
of numbers of those sizes pretty quickly.
Well, let's get started. Let's start by just finding an algorithm that works. What we'd like is the largest
number that divides both a and b.
So, one thing we can do, is we can just check all of the numbers that are candidates for this, figure out
which ones divide a and b, and return the largest one.
So, there's an easy implementation for this. We create a variable called best, and set it to 0. This just
remembers the biggest thing we've seen so far.
We then let d run from 1 to a + b, since this is the range of numbers that are valid. Now, if d divides a,
and d divides b, well, since d is increasing, this has to be the new best that we've seen. So, we set best
equal to d, and then, at the end the of the day, we return back the best thing we've seen.
So, that's a perfectly good algorithm. It works. Unfortunately, it's a little bit slow, because we need to
run through this for loop a + b many times.
And, this means that, even once a and b are, say, 20 digit numbers, it's already going to be taking us
at least thousands of years in order to run this computation.
And, so that's not sufficient for the sorts of applications that we care about. We're going to need a
better algorithm. So, come back next lecture, and we'll talk about how to find a better algorithm for
this problem, and what goes into that.
Hello everybody, welcome back. Today we're going to be talking a little bit more about computing
greatest common divisors. In particular today, we're going to be talking about a much more efficient
algorithm than last time. This is know as the Euclidean Algorithm, we'll talk about that and we'll talk a
little bit about how its runtime works.
Just to recall for integers, a and b, their greatest common divisor is the biggest integer d that divides
both of them.
What we'd like to do is we'd like to be able to compute this, given two integers we want to compute
their GCD.
We found a bad algorithm for this and we'd like a better one. It turns out that in order to find a better
algorithm, you need to know something interesting. There's this Key Lemma that we have,
where suppose that we let a' be the remainder when a is divided by b, then the gcd(a,b) is actually
the same as the gcd(a',b), and also the same as the gcd(b, a'). The proof of this, once you know what
to prove, is actually not very difficult. The idea is that because a' is a remainder, this means a is equal
to a' plus some multiple of b plus b times q, for some q.
From that you can show that if d divides both a and b, that happens if, and only if, it divides both a'
and b. Because, for example, if d divides a' and b, it divides a' plus bq, which is a.
From this statement, we know that the common divisors of a and b are exactly the same as the
common divisors of a' and b. Therefore, the greatest common divisor of a and b is the greatest
common divisor of a' and b.
This is the idea for the algorithm.
Basically, we have the gcd(a,b) is the same as the gcd(b,a'), but a' is generally smaller than a. If we
compute that new GCD recursively, hopefully that will be an easier problem.
Now, we do need a base case for this, so we're going to start off by saying if b is equal to zero,
everything divides zero, so we just need the biggest thing that divides a. We're going to return a in
that case.
Otherwise, we're going to let a' be the remainder when a is divided by b, and we're going to return
the gcd(b,a'), computed recursively.
By the Lemma that we just gave, if this ever returns an answer, it will always give the correct answer.
At the moment, we don't even know that it will necessarily terminate, much less do so in any
reasonable amount of time.
Let's look at an example. Suppose that we want to compute the gcd(3918848,1653264). So b here is
not zero, we divide a by b, we get a remainder that's something like 612000, and now we have a new
GCD problem to solve.
Once again, b is not zero, we divide a by b, we get a new remainder of 428,000 some. We repeat this
process, gives us a remainder of 183,000 some, 61,000 some. Divide again we get a remainder of
zero.
And now b is 0, so we return the answer, 61232, and this is the right answer. You'll note though, this
thing took us six steps to get to the right answer. Whereas, if we'd used the algorithm from last time,
we would've had to check something like 5 million different possible common divisors
to find the best one.
This it turns out is a lot better, and to get a feel of how well this thing works, or why it works well,
every time we take one of these remainders with division, we reduce the size of the number involved
by a factor of about 2.
And if every step were reducing things by a factor of two, after about log(ab) many steps, our
numbers are now tiny or zero, and so, basically after log(ab) many steps, this algorithm is going to
terminate. This means that, suppose that we want to compute GCDs of 100-digit numbers, this is only
going to take us about 600 steps to do it. Each of the steps that we've used here is a single division
with remainder, 600 divisions with remainder is something you can do trivially, on any reasonable
computer.
This algorithm will compute quite large GCDs very quickly.
In summary, once again, we had this computational problem. There was a naive algorithm, one that
was very simple, came right from the definition, but it was far too slow for practical purposes. There's
a correct algorithm which is much, much better, very usable. Once again, finding the right algorithm
makes all the difference in the world. But here there was this interesting thing that we found. In order
to get the correct algorithm, it required that we actually know something interesting about the
problem. We needed this Key Lemma that we saw today.
This is actually a theme that you'll see throughout this course, and throughout your study of
algorithms. Very often, in order to find a better algorithm for a problem, you need to understand
something interesting about the structure of the solution, and that will allow you to simplify things a
lot.
In any case, that's all for today, come back next lecture, we'll start talking about how to actually
compute runtimes in a little bit more detail. Until then good bye.
Resources
Slides
Reading
Greatest common divisor: Section 1.2.3 of [DPV08], Section 31.2 of [CLRS]

An elementary introduction to greatest common divisor at Khan Academy
References
[CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to
Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009
Big-O Notation
Hello, everybody. Welcome back. Today, we're going to be talking about Big-O notation, which is the
specific, sort of asymptotic notation that we will be using most frequently here. So, the idea here is
we're going to introduce the meaning of Big-O notation and describe some of its advantages and
disadvantages. So to start with the definition. The idea is that we want something that cares about
what happens when the inputs get very large and sort of only care about things up to constants. So
we are going to come up with a definition. If you've got two functions, f and g. g(n) is Big-O of g(n). If
there are two constants, capital N and little c, such that for all n, at least N. f(n) is at most c*g(n). And
what this means is that at least for sufficiently large inputs, f is bounded above by some constant
multiple of g.
Which is really sort of this idea that we had from before. Now, for example, 3n squared plus 5n plus 2
is O of n squared,
because if we take any n at at least 1, 3n squared plus 5n plus 2 is at most 3n squared plus 5n squared
plus 2n squared, which is 10n squared.
Some multiple of n squared. So, and in particular if you look at these two functions, they really in
some sense do have the same growth rate. If you look at the ratio between them, sure it's large, its
10n equals 1, but as n gets large it actually drops down to about 3. And once you're putting in inputs,
at n equals 100, n squared is a million 3n squared + 5n + 2 is a little bit more than 3 million.
So, they're not the same function. One of them is distinctly larger than the other, but it's not larger by
much, not by more than a factor of about three.
Throughout this course, we're going to be using big-O notation to report, basically, all of our
algorithms' runtimes. And, this has a bunch of advantages for us.
The first thing that it does for us is it clarifies growth rate. As I've said before, often what we care
about is how does our runtime scale with the input size. And this is sort of an artifact to the fact that
we often really care about what happens when we put really, really, really big inputs to our algorithm.
How big can we deal with, before it starts breaking down?
And, if you gave me some sort of complicated expression in terms of the input, with lots of terms,
then it might be hard given two algorithms to really compare them. I mean, which one's bigger would
depend on exactly which inputs I'm using. It requires some sort of annoying computation to
determine where exactly one's better than the other. But, if you look at things asymptotically what
happens as n gets large? It often becomes much more clear that, once n is very, very large, algorithm
a is better than algorithm b.
The second thing it does for us is that it cleans up notation. We can write O(n²), instead of 3n² + 5n +
2. And that's a lot cleaner and much easier to work with. We can write O(n) instead of n + log₂(n) +
sin(n). We can write O(n log(n)) instead of 4n log₂(n) + 7. And note, that in the big O, we don't actually
need to specify the base of the logarithm that we use. Because log₂(n), and log₃(n), and log₁₀(n), and
log₇(n), They only differ by constant multiples. And up to the constant multiples, this big O that we
have really doesn't care.
Another consequence of this is that because our notation is cleaner, because we have fewer lower
order terms to deal with, this actually makes the algebra that we have to do easier. It makes it easier
to manipulate big O expressions because they're not as messy.
And the final thing this does is that this big O notation really does solve these problems we were
talking about a couple of lectures ago. In order to compute runtimes in terms of big O, we really don't
need to know things like how fast the computer is, or what the memory hierarchy looks like, or what
compiler we used, because, by and large, although these things will have a big impact on your final
runtime, that impact will generally only be a constant multiple. And if two things are only off by a
constant multiple, they've got the same big O.
That's all there is.
Now, I should say that there's a warning. Big-O is incredibly useful, we are going to be using it for
basically everything in this course, but it does lose a lot of information about your runtime. It forgets
about any constant multiples. So, if you have two algorithms, and one of them's a hundred times
faster, they have the same Big-O.
But, in practice, if you want to make things fast, a factor of 100 is a big deal. Even a factor of two is a
big deal. And so, if you really want to make things fast, once you have a good asymptotic runtime, you
then want to look into the nitty-gritty details. Can I save a factor of two here? Can I rearrange things
to make things run a little bit smoother? Can I make it interact better with the memory hierarchy? Can
I do x, y and z to make it faster by these constant factors that we didn't see beforehand? The second
thing that you should note along these lines is that big O is only asymptotic. In some sense, all it tells
you about are what happens when you put really, really, really, really, really big inputs into the
algorithm.
And,well, if you actually want to run your algorithm on a specific input. Big O doesn't tell you anything
about how long it takes in some sense. I mean, usually the constants hidden by the big O are
moderately small and therefore you have something useful. But sometimes they're big. Sometimes an
algorithm with worse big O runtime, that's worse asymptotically on very large inputs, actually, for all
practical sizes, is actually beaten by some other algorithm. And there are cases of this where you find
two algorithms where a works better than b on really, really, really big inputs. But sometimes really,
really, really big means more than you could ever store in your computer in the first place. And so, for
any practical input you want to use algorithm b.
In any case, though, despite these warnings, big O is incredibly useful. We're going to be using it
throughout this course. And so, next lecture, we're going to be talking a little bit about how to deal
with big O expressions, how to manipulate them, how to use them to compute runtimes, but once
you have that we'll really be sort of ready to do some algorithms.
In any case, that's all for this lecture, come back next time and we'll talk about that.
Big-O notations in details
DS::
Arabic Topics >> Udacity >> Coursera
>> Book (Cracking >> grokking >> problem solving >> Roberto reinfor + creitivty + project >> Packt
basant
Narasimha Karumanchi >>
Kulkov >> Khan-academy
))
Resources
Slides
Reading
Big-OO notation and growth rate: Section 0.3 of [DPV08]
Big-OO notation at Khan Academy

If you need to refresh your knowledge of logarithms: an elementary introduction to logarithms at
Khan Academy
References
Computing Runtimes
Hello everybody, welcome back to data structures and algorithms specialization.

Today, we're going to be talking about what really goes into computing runtimes
and really understanding how long it takes a program to work.
So in particular, today we're really going to dive in.
Up to this point we're using this sort of rough
number of lines of code executed count.
And today we're going to talk about how accurate this is and
what sorts of complications come in.
And in particular we'll see that if we actually want something that's sort of
fundamentally an accurate measure of runtime, it's going to be a huge mess.
We're going to have to bring in all sorts of extra data that aren't
really convenient for us.
And so, we're really sort of
talking about the problem that comes in with computing runtimes in algorithms.
Something that we're not going to resolve really until the next lecture.
So to start with, let's look at this algorithm that we had for
computing Fibonacci numbers.
Remember we created an array.
We assigned the 0th element to 0.
The first element to 1.
Then, we have this big for loop where we set the i'th element to the sum of the i
minus first and i minus second elements and
then at the end of the day we return the nth element.
So we determined that when we ran this program we executed about 2n + 2 lines
of code.
But we should really ask ourselves, is this number of lines of code executed
really sort of an accurate description of the runtime of the algorithm?
And, I mean, somehow, implicitly, this measure of lines of code
assumes that, sort of, any two lines of code are sort of comparable to each other.
They're sort of one basic operation.
And so, let's actually look at this program in some detail and
see what goes into some of these lines of code and see how valid this assumption is.
So to start with, we create this array.
And what happens when you try to initialize an array?
Well, this depends a lot on your memory management system.
Fundamentally, all you have to do is find some space in memory and
then get to pointer to the first location.
On the other hand, how exactly you find this, maybe you need to shuffle some other
things around to make room for it, maybe after you allocate the array, you then
want to zero out all of the entries so that you don't have junk sitting in there.
And so, it's not entirely clear.
It depends a little bit on how exactly your program is being interpreted.
But it could be pretty fast.
It could actually take a while, depending on circumstances.
Let's look at the next line.
This is just a simple assignment, we set the 0 entry to 0.
However, if you really look at this,
at the machine level, you're doing a bit more work, you need to load up the pointer
to the 0th element to the array, you maybe then have to do some pointer arithmetic,
you then need to store this literal 0 into that spot in memory.
It could actually be not one operation but a few operations.
Similarly when we set the first element to one,
you have to do this very similar set of things.
Next there's this for loop and with the for loop, again every time you
you have to do a few things, you need to increment the value of i.
You then need to compare i to n to see if you need to break out of the loop and
if it is, you need to branch,
you need to move to another instruction in your program after the for loop.
Next up there's this addition and here we have to do some things, we have to
look up two values in the array, we have to write to a third value in the array.
All of this involves the same sort of pointer arithmetic,
and memory lookups, and writes that we were talking about before, but
we also have to do this addition.
And if it were just a normal addition, maybe it wouldn't be such a big deal.
However, this is addition of two Fibonacci numbers, and
if you'll recall from a couple of lectures ago, we found that Fibonacci numbers were
pretty big, in fact, so big, they probably don't fit in a single machine word.
So adding two of them together actually takes a non-trivial amount of time.
So somehow, not only do you have to do all of these,
array arithmetic things but, the actual addition of the Fibonacci
numbers is actually a pretty non-trivial operation.
And then we do this return stuff where we have to do an array lookup which involves
all the sorts of things we talked about already and then have to do a return which
sort of is going to operate with the program stack and
pop it up a level and return an answer.
So in conclusion, this program has six lines of code to it but
the amount of work being done in various lines of code is very, very different.
Exactly what goes into each line of code is not sort of at all the same thing.
Maybe we want to reconsider the fact that this count,
that the number of lines of code, is sort of our runtime.
Maybe we need to measure something else.
So what else should we do?
Well, if you want to be totally correct about what we actually care about,
what you need to say is, well, we're going to take this program,
we're going to run it on some real life computer.
And we'd like to know how much actual time it will take for
this program to finish. That is fundamentally what we want to know.
Unfortunately, in order to figure that out we need to know all kinds of messy
details.
We need to know things like the speed of the computer that we're running it on.
If you run it on a big supercomputer,
it'll take a lot less time than if you run it on your cell phone.
The system architecture of the computer will matter.
Exactly what operations your CPU supports and exactly how long they take
relative to one another, those are all going to have an effect on your runtime.
The compiler being used is also going to make a difference.
In practice, what you'll do is, you'll write this program in some high-level
language, in C or Java or Python or
something, and then you'll run it through a compiler to turn it into machine code.
And then the compiler,
though, isn't just sort of doing something completely obvious.
It's performing all kinds of interesting optimizations to your code.
And which optimizations it performs, and
how they interact with exactly what you've written.
That's all going to have an impact on the final runtime.
Finally, you're going to care about details of the memory hierarchy.
If your entire computation fits into cache,
it will probably run pretty quickly.
However, if you have to start doing lookups into RAM,
things will be a lot slower.
RAM lookups actually take a fair bit of time.
If, on the other hand, you run out of memory in RAM and having to start writing
some of these memory operations to disk, things are going to go a lot slower.
Lookups to hard disk can take milliseconds which are forever in computer time.
And so, exactly how much memory is stored in these various levels of the hierarchy,
and exactly how long the lookups take, and how good the algorithms
about to predict what things you're going to look up in the future are.
Those are all going to affect your runtime.
And so, putting it all together, we found basically a problem.
Figuring out accurate runtimes is a huge mess.
You need to know all of these details and
you need to figure out how everything interacts.
And we're going to be talking about a lot of algorithms in the class.
And we're going to need to tell you about runtimes for all of them and
we don't want to have to do this huge mess
every single time we have a new algorithm that we want to analyze.
And this is an issue.
And another issue it was just that,
in practice I mean, this is all assuming that you did know these details.
In practice, you don't know a lot of these details, because you're writing a program,
it's going to be run on somebody else's computer, and
you've got no idea what their system architecture looks like on that computer,
because you don't know what the computer is.
You don't know who's running it.
In fact, there'll be several people running it on different computers with
different architectures and different speeds, and it'll be a mess.
And you really don't want to compute the runtime separately for
every different client.
So we've got a big problem here and we're not going to solve it today but
next lecture we're going to talk about how we get around this.
And what we really want is we want a new way to measure runtime that allows
us to get some reasonable answer without knowing these sorts of details.
And one of the key tricks that you should be looking at,
that we'll be using to solve this problem, is we're going to be getting things that
really give us results for very large inputs.
They tell us,
not necessarily how long it actually takes in terms of real seconds,
minutes, and hours, but tell us sort of how our runtime scales with input size.
And in practice this is a very important thing, because oftentimes,
we really do care what happens when we have huge inputs,
millions of data points that we need to analyze, how long does it take?
And so, come back next lecture,
we'll talk about how we resolve some of these issues, and talk about some very
useful notation that we will be using throughout the rest of this sequence.
So I hope you come back then, and I'll see you there.
Asymptotic Notation
Hello, everybody. Welcome back. Today we're going to start talking about
asymptotic notation.
So here we're going to sort of just introduce this whole idea of asymptotic
notation and describe some of the advantages of using it.
So last time we ran into this really interesting problem that computing
runtimes is hard, in that if you really, really want to know how long a
particular program will take to run on a particular computer, it's a huge
mess. It depends on knowing all kinds of fine details about how the program
works. And all kinds of fine details about how the computer works, how fast
it is, what kind of system architecture it is. It's a huge mess. And we don't
want to go through this huge mess every single time we try to analyze an
algorithm. So, we need something that's maybe a little bit less precise but
much easier to work with, and we're going to talk about the basic idea
behind that.
And the basic idea is the following. That, there are lots of factors that have
an effect on the final runtime but, most of them will only change the runtimes
by a constant. If you're running on a computer that's a hundred times faster,
it will take one-one hundreth of the time, a constant multiple. If your system
architecture has multiplications that take three times as long as additions,
then if your program is heavy on multiplications instead of additions, it might
take three times as long, but it's only a factor of three. If your memory
hierarchy is arranged in a different way, you might have to do disk lookups
instead of RAM lookups. And those will be a lot slower, but only by a
constant multiple.
So the key idea is if we come up with a measure of runtime complexity that

ignores all of these constant multiples, where running in time n and in
running in time 100 times n are sort of considered to be the same thing, then
we don't have to worry about all of these little, bitty details that affect
runtime.
Of course there's a problem with this idea, if you look at it sort of by itself,
that if you have runtimes of one second or one hour or one year, these only
differ by constant multiples. A year is just something like 30 million seconds.
And so, if you don't care about factors of 30 million, you can't tell the
difference between a runtime of a second and a runtime of a year. How do
we get around this problem?
Well, there's a sort of weird solution to this. We're not going to actually
consider the runtimes of our programs on any particular input. We're going
to look at what are known as asymptotic runtimes. These ask, how does the
runtime scale with input size? As the input size n gets larger, does the
output scale proportional to n, maybe proportional to n squared? Is it
exponential in n? All these things are different. And in fact they're sort of so
different that as long as n is sufficiently large, the difference between n
runtime and n squared runtime is going to be worse than any constant
multiple.
If you've got a constant multiple of 1000, 1000n might be pretty bad with that
big number in front. But, when n becomes big, it's still better than n squared.
And so, by sort of only caring about what happens in this sort of long scale
behavior, we will be able to do this without seeing these constants, without
having to care about these details.
And in fact, this sort of asymptotic, large scale behavior is actually what you
care about a lot of the time, because you really want to know: what happens
when I run my program on very large inputs?
And these different sorts of scalings do make a very large difference on that.
So suppose that we have an algorithm whose runtime is roughly
proportional to n and we want it to run it on a machine that runs at about a
gigahertz. How large an input can we handle such that we'll finish the
computation in a second?
Well if it runs at about size n, you can handle about a billion sized inputs,
before it takes more than a second.
If instead of n, it's n log n it's a little bit slower, you can only handle inputs
the size about 30 million. If it runs like n squared, it's a lot worse. You can
only handle inputs of size about 30,000 before it starts taking more than a
second.
If the inputs are of size 2 to the n, it's incredibly bad, you can only handle
inputs of size about 30 in a second. Inputs of size 50 already take two
weeks, inputs of size 100 you'll never ever finish. And so the difference
between n and n squared and 2 to the n is actually really, really significant.
It's often more significant than these factors of 5 or 100 that you're seeing
from other things.
Now just to give you another feel of sort of how these sort of different types
of runtimes behave, let's look at some sort of common times that you might
see. There's log n, which is much smaller than root n, which is much smaller
than n, which is much smaller than n log n, which is much smaller than n
squared, which is much smaller than 2 to the n. So, if we graph all of these,
you can see that these graphs sort of separate out from each other. If you
just look at them at small inputs, it's maybe a little bit hard to tell which ones
are bigger, there's a bit of jostling around between each other. But if we
extend the graph outwards a bit, it becomes much more clear. 2 to the n
starts after about 4 really taking off. Really just 2 to the n just shoots up
thereafter and becomes 20 or 30, it just leaves everyone else in the dust. N
squared keeps up a pretty sizable advantage though against everyone else.
N log n and n also are pretty well separated from the others. In this graph,
root n and log n seem to be roughly equal to each other, but if you kept
extending, if you let n get larger and larger, they'd very quickly differentiate
themselves. Square root of 1 million is about 1,000. Log of 1 million is about
20. And so really as you keep going out, very quickly the further out you go
the further separated these things become from each other, and that's really
the key idea behind sort of asymptotics. We don't care so much about the
constants, we care about what happens as your inputs get very large, how
do they scale.
So that's it for today. Come back next lecture. We'll talk about in sort of
detail what this actually means and how to actually get it to work. So until
next time.
Big-O Notation
Hello, everybody. Welcome back. Today, we're going to be talking about Big-O
notation, which is the specific, sort of asymptotic notation that we will be using most
frequently here. So, the idea here is we're going to introduce the meaning of Big-O
notation and describe some of its advantages and disadvantages. So to start with the
definition. The idea is that we want something that cares about what happens when
the inputs get very large and sort of only care about things up to constants. So we are
going to come up with a definition. If you've got two functions, f and g. g(n) is Big-O of
g(n). If there are two constants, capital N and little c, such that for all n, at least N. f(n)
is at most c*g(n). And what this means is that at least for sufficiently large inputs, f is
bounded above by some constant multiple of g.
Which is really sort of this idea that we had from before. Now, for example, 3n
squared plus 5n plus 2 is O of n squared,
because if we take any n at at least 1, 3n squared plus 5n plus 2 is at most 3n

squared plus 5n squared plus 2n squared, which is 10n squared.
Some multiple of n squared. So, and in particular if you look at these two functions,
they really in some sense do have the same growth rate. If you look at the ratio
between them, sure it's large, its 10n equals 1, but as n gets large it actually drops
down to about 3. And once you're putting in inputs, at n equals 100, n squared is a
million 3n squared + 5n + 2 is a little bit more than 3 million.
So, they're not the same function. One of them is distinctly larger than the other, but
it's not larger by much, not by more than a factor of about three.
Throughout this course, we're going to be using big-O notation to report, basically, all
of our algorithms' runtimes. And, this has a bunch of advantages for us.
The first thing that it does for us is it clarifies growth rate. As I've said before, often
what we care about is how does our runtime scale with the input size. And this is sort
of an artifact to the fact that we often really care about what happens when we put
really, really, really big inputs to our algorithm. How big can we deal with, before it
starts breaking down?
And, if you gave me some sort of complicated expression in terms of the input, with
lots of terms, then it might be hard given two algorithms to really compare them. I
mean, which one's bigger would depend on exactly which inputs I'm using. It requires
some sort of annoying computation to determine where exactly one's better than the
other. But, if you look at things asymptotically what happens as n gets large? It often
becomes much more clear that, once n is very, very large, algorithm a is better than
algorithm b.
The second thing it does for us is that it cleans up notation. We can write O(n²),
instead of 3n² + 5n + 2. And that's a lot cleaner and much easier to work with. We can
write O(n) instead of n + log₂(n) + sin(n). We can write O(n log(n)) instead of 4n
log₂(n) + 7. And note, that in the big O, we don't actually need to specify the base of
the logarithm that we use. Because log₂(n), and log₃(n), and log₁₀(n), and log₇(n),
They only differ by constant multiples. And up to the constant multiples, this big O that
we have really doesn't care.
Another consequence of this is that because our notation is cleaner, because we

have fewer lower order terms to deal with, this actually makes the algebra that we
have to do easier. It makes it easier to manipulate big O expressions because they're
not as messy.
And the final thing this does is that this big O notation really does solve these
problems we were talking about a couple of lectures ago. In order to compute
runtimes in terms of big O, we really don't need to know things like how fast the
computer is, or what the memory hierarchy looks like, or what compiler we used,
because, by and large, although these things will have a big impact on your final
runtime, that impact will generally only be a constant multiple. And if two things are
only off by a constant multiple, they've got the same big O.
That's all there is.
Now, I should say that there's a warning. Big-O is incredibly useful, we are going to be
using it for basically everything in this course, but it does lose a lot of information
about your runtime. It forgets about any constant multiples. So, if you have two
algorithms, and one of them's a hundred times faster, they have the same Big-O.
But, in practice, if you want to make things fast, a factor of 100 is a big deal. Even a
factor of two is a big deal. And so, if you really want to make things fast, once you
have a good asymptotic runtime, you then want to look into the nitty-gritty details. Can
I save a factor of two here? Can I rearrange things to make things run a little bit
smoother? Can I make it interact better with the memory hierarchy? Can I do x, y and
z to make it faster by these constant factors that we didn't see beforehand? The
second thing that you should note along these lines is that big O is only asymptotic. In
some sense, all it tells you about are what happens when you put really, really, really,
really, really big inputs into the algorithm.
And,well, if you actually want to run your algorithm on a specific input. Big O doesn't
tell you anything about how long it takes in some sense. I mean, usually the constants
hidden by the big O are moderately small and therefore you have something useful.
But sometimes they're big. Sometimes an algorithm with worse big O runtime, that's
worse asymptotically on very large inputs, actually, for all practical sizes, is actually
beaten by some other algorithm. And there are cases of this where you find two
algorithms where a works better than b on really, really, really big inputs. But
sometimes really, really, really big means more than you could ever store in your
computer in the first place. And so, for any practical input you want to use algorithm b.
In any case, though, despite these warnings, big O is incredibly useful. We're going to
be using it throughout this course. And so, next lecture, we're going to be talking a
little bit about how to deal with big O expressions, how to manipulate them, how to
use them to compute runtimes, but once you have that we'll really be sort of ready to
do some algorithms.
In any case, that's all for this lecture, come back next time and we'll talk about that.
Using Big-O
Hello everybody. Welcome back. Today we're going to be talking about using Big-O
notation. So the basic idea here, we're going to be talking about how to manipulate
expressions involving Big-O and other asymptotic notations. And, in particular, we're
going to talk about how to use Big-O to compute algorithm runtimes in terms of this
notation.
So recall, we said that f(n) was Big-O of g(n). If for all sufficiently large inputs f(n) was
bounded above by some fixed constant times g(n). Which really says that f is
bounded above by some constant times g.
Now, we'd like to manipulate expressions, we'd like to, given expressions write them
in terms of Big O in the simplest possible manner. So there's some common rules you
need to know.
The first rule is that multiplicative constants can be omitted. 7n cubed is O of n cubed.
n squared over 3 is O of n squared. The basic premise that we had when building this
idea was that we wanted to have something that ignores multiplicative constants.
The second thing to note is that you have two powers of n. The one with the larger
exponent grows faster, so n grows asymptotically slower than Big-O of n squared.
Root n grows slower than n, so it's O of n.
Hopefully this isn't too bad.
What's more surprising is that if you have any polynomial and any exponential, the
exponential always grows faster. So n to the fifth is O of root two to the n. n to the 100
is O of 1.1 to the n. And this latter thing is something that should surprise you a little
bit. Because n to the 100 is a terrible runtime. Two to the 100 is already so big that
you really can't expect to do it ever. On the other hand, 1.1 to the n grows pretty
modestly. 1.1 to the 100 is a pretty reasonable-sized number.
On the other hand, what this really says, is that once n gets large, maybe 100
thousand or so, 1.1 eventually takes over, and starts beating n to the 100. And it does
so by, in fact, quite a bit. But it doesn't really happen until n gets pretty huge.
In a similar vein, any power of log n grows slower than any power of n. So log n
cubed is O of root n. n log n is O of n squared.
Finally, if you have some sum of terms smaller terms in the sum can be omitted. So n
squared plus n. n has a smaller rate of growth.
So this is O of n squared. 2 to the n + n to the 9th. n to the 9th has a smaller rate of
growth, so this is O(2 to the n). So, these are common rules for manipulating these
expressions. Basically these are the only ones that you'll need most of the time to
write anything in terms of Big-O that you need.
Okay, so let's see how this works in practice. If we actually want to compute runtimes
using Big-O notation. So let's look at this one algorithm again. So we created an
array. We set the 0th element to 0 and the first element to 1. We then went through
this loop, where we set each element to the sum of the previous two. And then
returned the last element of the array. Let's try computing this runtime in terms of Big-
O notation.
So, we're just going to run through it operation by operation and ask how long it takes.
First operation is we created an array, and let's for the moment ignore the memory
management issues, assume that it's not too hard to allocate the memory. But let;s
suppose that what your compiler does is we actually want to zero out all of these cells
in memory and that's going to take us a little bit of work. Because for every cell,
basically what we have to do, is we need to zero out that cell, we then need to
increment some counter to tell us which cell we're working on next and then maybe
we need to do a check to make sure that we're not at the end. If we are at the end, to
go to the next line. Now for every cell we have to do some amount of work. We have
to do something like do a write, and the comparison, and an increment. And it's not
entirely clear how many machine operations this is. But it's a constant amount of work
per cell in the array. If there are n plus 1 cells. This is O of n time, some constant
times n. Next we set the zeroth elements of the array of zero. And this might just be a
simple assignment. We might have to load a few things into registers or do some
pointer arithmetic, but no matter whether this is one machine operation or five or
seven, that's still going to be a constant number of machine operations, O(1).
Similar is setting the first element to one again, O(1) time.
Next we run through this loop, for i running from two to n, we run through it n minus
one times, that's O(n) times.
The main thing we do in the loop is we set the ith element of the array to the sum of
the i minus first and i minus second. Now the lookups and the store, those are all of
the sorts of things we had looked at, those should be O of 1. But the addition is a bit
worse. And normally additions are constant time. But these are large numbers.
Remember, the nth Fibonacci number has about n over 5 digits to it, they're very big,
and they often won't fit in the machine word.
Now if you think about what happens if you add two very big numbers together, how
long does that take? Well, you sort of add the tens digit and you carry, and you add
the hundreds digit and you carry, and add the thousands digit, you carry and so on
and so forth. And you sort of have to do work for each digits place.
And so the amount of work that you do should be proportional to the number of digits.
And in this case, the number of digits is proportional to n, so this should take O(n)
time to run that line of code.
Finally, we have a return step, which is a pointer arithmetic and array lookup and
maybe popping up the program stack. And it's not quite clear how much work that is,
but it's pretty clear that it's a constant amount of work, it doesn't become worse as n
gets larger. So, that's O of one time.
So, now we just have to add this all together. O of N plus O of 1 plus O of 1 plus O of
N times through the loop times O of N times work per time through the loop plus O of
1, add it all together, the dominant term here, which is the only one we need, is the O
of n times O of n. That's O of n squared. So this algorithm runs in time O of n
squared.
Now, we don't know exactly what the constants are, but O of n squared means that if
we want to finish this in a second, you can probably handle inputs of size maybe
30,000. Now, depending on the computer that you had and the compiler and all of
these messy details, maybe you can only handle inputs of size 1,000 in a second.
Maybe you can handle inputs the size of million in a second. It's probably not going to
be as low as ten or as high as a billion but, I mean, 30,000's a good guess and well, it
takes work to get anything better than that.
And so, this doesn't give us an exact answer but it's pretty good.

Okay, so that's how you use Big-O notation. It turns out that occasionally you want to
say a few other things. Big O really just says that my runtime is sort of bounded above
by some multiple of this thing. Sometimes you want to say the reverse. Sometimes
you want to say that I'm bounded below. And so there's different notation for that.
If you want to say that f is bounded below by g, that it grows no slower than g, you
say that f(n) is Omega of g(n). And that says that for some constant c, f(n) is at least c
times g(n), for all large n.
Now instead of saying bounded above or bounded below, sometimes that you
actually want to say that they grow at the same rate.
And for that you'd see f is Big-Theta of g(n). Which means, that F is both Big-O of g,
and, Big-Omega of G. Which says, up to constants, that f and g grow at the same
rate.
Finally, sometimes instead of saying that f grows no faster than g, you actually have
to say that it grows strictly slower than g, and for that you say f(n) is Little-o of g(n).
And that says that, not only is the ratio between f(n) and g(n) bounded above by some
constant, but actually this constant can be made as small as you like. In particular this
means that the ratio f(n) over g(n) goes to zero as n goes to infinity.
So, these are some other notations that you'll see now and then. You should keep
them in mind. They're useful. Big-O is the one that usually shows up, because we
actually want to bound our runtimes above. It's sort of the big, important thing, but
these guys are also useful.
So, to summarize the stuff on asymptotic notation. What it lets us do is ignore these
messy details in the runtime analysis that we saw before.
It produces very clean answers that tell us a lot about the asymptotic runtime of
things.
And these together make it very useful. It means we're going to be using it extensively
throughout the course. So you really ought to get used to it.
But, it does throw away a lot of practical useful information. So if you really want to
make your program fast, you need to look at more than just the Big-O runtime.
But, beyond that, we're going to use it.
With this lecture, we basically finished the sort of introductory material that we need.
Next lecture I'll talk to you a little bit about sort of an overview of the rest of the course
and some our philosophy for it. But after that, we'll really get into the meat of the
subject. We'll start talking about key important ways to design algorithms. So, I hope
you enjoy it.
Youtube Channels
CS Dojo
New Baghdad
CS Dojo
Big-O Notation: Plots
The purpose of this notebook is to visualize the order of growth of some functions used frequently
in the algorithm analysis. Note that this is an interactive notebook meaning that besides of just
running all the code below you may also fool around with it. Try to plug in you favorite functions
and/or change the ranges below and see what happens. Proceed by repeatedly clicking the Run
button. To start over, select Kernel -> Restart and Clear Output.
Definitions
We start by reminding the definitions. Consider two functions f(n)f(n) and g(n)g(n) that are
defined for all positive integers and take on non-negative real values. (Some frequently used
functions used in algorithm design: lognlog⁡n, n⎯⎯√n, nlognnlog⁡n, n3n3, 2n2n). We say
that ff grows slower than gg and write f≺gf≺g, if f(n)g(n)f(n)g(n) goes to 0 as nn grows. We say
that ff grows no faster than gg and write f⪯gf⪯g, if there exists a constant cc such
that f(n)≤c⋅g(n)f(n)≤c⋅g(n) for all nn.
Three important remarks.
1. f≺gf≺g is the same as f=o(g)f=o(g) (small-o) and f⪯gf⪯g is the same

as f=O(g)f=O(g) (big-O). In this notebook, we've decided to stick to the ⪯⪯ notation,
since many learners find this notation more intuitive. One source of confusion is the
following: many learners are confused by the statement like "5n2=O(n3)5n2=O(n3)".
When seeing such a statement, they claim: "But this is wrong! In
fact, 5n2=O(n2)5n2=O(n2)!" At the same time, both these statements are
true: 5n2=O(n3)5n2=O(n3) and also 5n2=O(n2)5n2=O(n2). They both just say
that 5n25n2 grows no faster than both n2n2 and n3n3. In fact, 5n25n2 grows no faster
than n2n2 and grows slower than n3n3. In ⪯⪯ notation, this is expressed as
follows: 5n2⪯n25n2⪯n2 and 5n2⪯n35n2⪯n3. This resembles comparing integers:
if x=2x=2, then both statements x≤2x≤2 and x≤3x≤3 are correct.
2. Note that if f≺gf≺g, then also f⪯gf⪯g. In plain English: if ff grows slower than gg,
then ff certainly grows no faster than gg.
3. Note that we need to use a fancy ⪯⪯ symbol instead of the standard less-or-equal
sign ≤≤, since the latter one is typically used as follows: f≤gf≤g if f(n)≤g(n)f(n)≤g(n) for
all nn. Hence, for example, 5n2≰n25n2≰n2, but 5n2⪯n25n2⪯n2.
Plotting: two simple examples

We start by loading two modules responsible for plotting.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
/opt/conda/lib/python3.5/site-packages/matplotlib/font_manager.py:273:
UserWarning: Matplotlib is building the font cache using fc-list. This may
take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This
may take a moment.')
/opt/conda/lib/python3.5/site-packages/matplotlib/font_manager.py:273:
UserWarning: Matplotlib is building the font cache using fc-list. This may
take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This
may take a moment.')
Now, plotting a function is as easy as the following three lines of code. It shows the plot of a
function 7n2+6n+57n2+6n+5 in the range 1≤n≤1001≤n≤100. Note that the scale of the yy-axis
adjusts nicely.
In [2]:
n = np.linspace(1, 100)
plt.plot(n, 7 * n * n + 6 * n + 5)
plt.show()
Now, let us add a function 20n20n to the previous example to visualize that 20n20n grows slower
than 7n2+6n+57n2+6n+5.
In [3]:
plt.plot(n, 7 * n * n + 6 * n + 5, label="7n^2+6n+5")
plt.plot(n, 20 * n, label="20n")
plt.legend(loc='upper left')
plt.show()
Common rules
Before proceeding with visualizations, let's review the common rules of comparing the order of
growth of functions arising frequently in algorithm analysis.
1. Multiplicative constants can be omitted: c⋅f⪯fc⋅f⪯f.

Examples: 5n2⪯n25n2⪯n2, n23⪯n2n23⪯n2.
2. Out of two polynomials, the one with larger degree grows
faster: na⪯nbna⪯nb for 0≤a≤b0≤a≤b.
Examples: n≺n2n≺n2, n⎯⎯√≺n2/3n≺n2/3, n2≺n3n2≺n3, n0≺n⎯⎯√n0≺n.
3. Any polynomial grows slower than any exponential: na≺bnna≺bn for a≥0,b>1a≥0,b>1.
Examples: n3≺2nn3≺2n, n10≺1.1nn10≺1.1n.
4. Any polylogarithm grows slower than any
polynomial: (logn)a≺nb(log⁡n)a≺nb for a,b>0a,b>0.
Examples: (logn)3≺n⎯⎯√(log⁡n)3≺n, nlogn≺n2nlog⁡n≺n2.
5. Smaller terms can be ommited: if f≺gf≺g, then f+g⪯gf+g⪯g.
Examples: n2+n⪯n2n2+n⪯n2, 2n+n9⪯2n2n+n9⪯2n.
Rule 5: Smaller terms can be omitted

Consider 7n2+6n+57n2+6n+5 again. Both 6n6n and 55 grow slower than 7n27n2. For this
reason, they can be omitted. To visualize this, let's first plot the
functions 7n2+6n+57n2+6n+5 and 7n27n2 for 1≤n≤51≤n≤5.
In [4]:
plt.plot(n, 7 * n * n, label="7n^2")
plt.show()
As expected, 7n2+6n+57n2+6n+5 is always larger than 7n27n2 (as nn is positive). Next, we plot

the same two functions but for 1≤n≤1001≤n≤100.
In [5]:
plt.plot(n, 7 * n * n, label="7n^2")
plt.show()
We see that as nn grows, the contribution of 6n+56n+5 becomes more and more negligible.
Another way of justifying this, is to plot the function 7n2+6n+57n27n2+6n+57n2.
In [6]:
x = np.linspace(1, 100)
plt.plot(n, (7 * n * n + 6 * n + 5)/(7 * n * n))
plt.show()
As we see, as nn grows, the fraction approaches 1.
Rule 1: Multiplicative constants can be ommitted

In terms of big-O notation, 7n2+6n+5=O(n2)7n2+6n+5=O(n2), i.e., 7n2+6n+57n2+6n+5 grows
no faster than n2n2. This again can be visualized by plotting their fraction. As we see, their
fraction is always at most 18 and approaches 7. In other
words, 7n2+6n+5≤18n27n2+6n+5≤18n2 for all n≥1n≥1.
In [7]:
plt.plot(n, (7 * n * n + 6 * n + 5)/(n * n))
plt.show()
Rule 2: Out of two polynomials, the one with larger degree grows
faster
For constants a>b>0a>b>0, nana grows faster than nbnb. This, in particular, means
that nb=O(na)nb=O(na). To visualize it, let's plot nn, n2n2, and n3n3.
In [8]:
plt.plot(n, n, label="n")
plt.plot(n, n * n, label="n^2")
plt.plot(n, n * n * n, label="n^3")
plt.show()
Let's now see what happens on a bigger scale: instead of the range 1≤n≤101≤n≤10, consider the
range 1≤n≤1001≤n≤100.
In [9]:
plt.plot(n, n * n, label="n^2")
plt.plot(n, n * n * n, label="n^3")
plt.show()
Rule 3: Any polynomial grows slower than any exponential

Let's plot n4n4 and 2n2n in the range 1≤n≤101≤n≤10.
In [10]:
plt.plot(n, n ** 4, label="n^4")
plt.plot(n, 2 ** n, label="2^n")
plt.show()
The plot reveals that in this range n4n4 is always greater than 2n2n. This however does not mean
that n4n4 grows faster than 2n2n! To ensure this, let's take a look at a larger
range 1≤n≤201≤n≤20.
In [11]:
plt.plot(n, n ** 4, label="n^4")
plt.plot(n, 2 ** n, label="2^n")
plt.show()
Rule 4: Any polylogarithm grows slower than any polynomial

To visualize this rule, we start by plotting the two most standard representatives: lognlog⁡n and nn.
The following plot shows that lognlog⁡n indeed grows slower than nn.
In [12]:
plt.plot(n, np.log(n), label="log n")
plt.show()
Now consider a more exotic example: (logn)3(log⁡n)3 versus n⎯⎯√n (recall that n⎯⎯√n is a

polynomial function since n⎯⎯√=n0.5n=n0.5).
In [13]:
plt.plot(n, n ** .5, label="n^.5")
plt.plot(n, np.log(n) ** 3, label="(log n)^3")
plt.show()
This looks strange: it seems that (logn)3(log⁡n)3 grows faster than n⎯⎯√n. Let's do the standard
trick: increase the range from [1,100][1,100] to, say, [1,1000000][1,1000000].
In [14]:
n = np.linspace(1, 10 ** 6)
plt.show()
Surprisingly, the logaritmic function is still above the polynomial one! This shows that it is in fact
dangerous to decide which function grows faster just by looking at how they behave for some not
so large values of nn. The rule "any polynomial grows faster than any polylogarithm" means
that eventually the polynomial function will become larger and larger than polylogarithmic. But the
rule does not specify for what value of nn this happens for the first time.
To finally ensure that n⎯⎯√n outperforms (logn)3(log⁡n)3 eventually, let's increase the range
to 108108.
In [15]:
n = np.linspace(1, 10 ** 8)
plt.show()
Also, let's consider an even large interval to make sure that these two functions don't switch back.
In [16]:
n = np.linspace(1, 10 ** 15)
plt.show()
Exercise
As the final exercise, try to find the value of nn where n0.1n0.1 becomes larger
than (logn)5(log⁡n)5.
In [17]:
plt.show()
Course Overview
Hello everybody, welcome back to the Data Structures and Algorithm specialization and the
Algorithmic Toolbox course within it. This is the last lecture in the introductory unit and here we're
going to give sort of an overview of the course. And in particular, what we're going to do is we're
going to talk about sort of the philosophy of the course, and how it fits into the what we're going to
be teaching you within the rest of this course. So, there's a problem. Algorithm design is hard, and in
particular it's hard to teach. And by this I actually mean something pretty specific. Now, algorithms
solve many, many different problems. You can use them to find paths between locations on a map, or
find good matchings with some property, or identify images in a photograph. Many, many different
sort of unrelated sounding problems can all be solved by algorithms.
And because the sorts of things that an algorithm problem might ask you to do are so varied, there's
no unified technique that will allow you to solve all of them.
And this is different from what you see in a lot of classes, when you're learning linear algebra they talk
about how do you solve systems of linear equations. And they teach you some technique, like row
reduction, and then you're sort of done. You just sort of need to practice it, and you can solve any
system of linear equations. They give you a system of linear equations, you turn the crank on this row
reduction technology and out pops an answer.
For algorithms there isn't that sort of thing. There's no general procedure where I give you an
algorithms problem and you sort of plug it into this machine and turn a crank and out pops a good
algorithm for it. And this makes it hard to teach. If there was such a thing, we could just teach you,
here's this thing that you do. You do this, and you'll have a good algorithm for any problem you might
run into.
And it's harder than that. I mean, sometimes, in order to find a good algorithm, it requires that you
have a unique insight. You're working on some problem that no one's ever looked at before. In order
to find a good algorithm for it, you need to come up with some clever idea that no one else has ever
come up with before. This is why sort of algorithms are so well studied, why they're such an active
field of research. There are still so many different new things yet to be discovered there. And we
certainly can't teach you things that haven't been discovered yet. And we also can't teach you things
custom tailored to the problems that you are going to run into in your life as a programmer.
So since we can't teach you everything you need to know about how to solve all of your algorithm
problems, what can we teach you?
Well, there are sort of two things. One thing that we can definitely give you is practice designing
algorithms. We're going to have lots of homework problems with lots of things for you to work on,
and this will give you practice, how do you, given a problem you haven't seen before, come up with a
good algorithm for it? Once you have the algorithm, how do you implement it and make sure
everything works and runs reasonably well? That's something you can practice. And it turns out that
for the type of problems where they're sort of very general and can be many different things, I mean,
it's possible to solve a lot of them, and one of the ways to be able to solve them is practice.
But we're also going to do more. We're not just going to throw you in the deep end and say, try to
swim, try to program all of these algorithms. There is something useful.
We can't teach you a generic procedure that will solve any algorithms problem for you. But what we
can do is we can give you some common tools. Some very useful tools for algorithm design. And
especially in this first course in our specialization we're really going to focus on helping to build up
your algorithmic toolbox.
And in particular, this course is going to focus on three of the most common and most generally
applicable algorithmic design techniques.
The first of these is greedy algorithms. This is something where you're trying to construct some big
object, and the way you do it is you sort of make one decision in the most greedy, locally optimal way
you can.
And once you've made that decision you make another decision in the most greedy, locally optimal
way you can. And you just keep making these decisions one at a time until you have an answer. And
surprisingly somehow making these locally optimal decisions gives you a globally optimal solution.
And when this happens it gives you very clean algorithms and it's great.
That's the first thing we'll talk about. Next, we'll talk about divide and conquer, which is a technique
where you've got some big problem you're trying to solve. What you do is you break it into a bunch of
little pieces, you solve all the pieces, and then you put their answers together to solve the original
thing. Finally we'll talk about dynamic programming. This is a little bit more subtle of a technique. This
is what you get when you've got some sort of large problem, that has sort of a lot of, not sub-
problems, but sort of related problems to it. And this sort of whole family of related problems, their
solutions sort of depend on one another in a particular type of way.
And when you have it there's this great trick that you have, where you sort of start at the small
problems at the bottom of the pile. And you solve all of them. And you sort of keep track of all of your
answers. And you use the answers to the small problems, to build up to obtain answers to the larger
and larger problems.
So these are what we're going to talk about. Each of the techniques we're going to talk about, how
you recognize when it applies, how do you analyze it when it applies, and some practical techniques
about how to implement, how to use them. All that good stuff.
So there's one other thing before we let you go into the fun world of greedy algorithms that you
should keep in mind throughout this course, and that's that there are these, maybe, different levels of
algorithm design. There's sort of different levels of sophistication that go into it.
At sort of the very lowest level, or top of this slide, I guess, there is the naive algorithm. This is sort of
a thing where you take the definition of a problem and you turn it into an algorithm, and we saw this
for Fibonacci numbers and greatest common divisors. You sort of interpreted the definition of the
thing you wanted to compute as an algorithm, and you were done. Now, these things are often very
slow, as we saw. Often they look like in order to find the best way of doing something, we enumerate
all ways to do it, and then figure out which one's the best. On the other hand, these are slow, but it's
often a good idea to first come up with a naive algorithm, just make sure you have some algorithm
that works.
Sometimes this works well and often you can just be done with it. Other times, it's too slow, but at
least you made sure that you understood what problem you were working on and have something
that runs.
But after that, the next thing that you want to do, if this naive algorithm is too slow, is you try and
look at your tool box. You say, are there any standard techniques that I know that apply here? Maybe
there's a greedy algorithm that solves this problem, or maybe I have to use a dynamic program. But if
you can find one of these standard techniques that work, often that doesn't involve too much effort
on your part, and gives you something that works pretty well.
Now once you have something that works, you often want to optimize it. And there are lots of ways
to improve an existing algorithm. Reduce the runtime from n-cubed to n-squared or n-squared to n.
And to do this, there are just a whole bunch of things. Maybe sometimes you could just sort of
rearrange the order in which you do the operations to cut out some of the work that you do.
Sometimes you have to introduce a data structure to speed things up. There are a bunch of ways to
do this. We'll talk a little bit about how this works. And these three levels are things that you should
be comfortable with and able to apply pretty well by the end of this course.
However, sometimes these three are not enough.
Sometimes a naive algorithm is just too slow, the standard tools don't apply, there's nothing that you
can really optimize to improve things. Sometimes in order to get a workable algorithm, what you
need is magic. You need some unique insight that no one else has ever had before.
You need some sort of clever new idea and these, there's only so much we can do to teach you how
to produce magic. We will show you some examples of things that really did have clever ideas that
maybe you can't reproduce the thought process like, how do you come up with this crazy idea, that
just happens to make this work? You should at least be able to appreciate the sort of thought that
goes into this sort of thing. In any case it's something to keep in mind when looking on that, when
thinking about our problems, and what sort of things are expected of you.
In any case, that is basically it for the introductory segment. We've talked a lot about sort of why
algorithms are important and given you some examples. We've talked about asymptotic notation, but
now it's time to let you go to the rest of the course. The rest of the course will keep giving you
exercises to hone your skills, and each unit of this course will cover one of these major techniques.
After I leave you with the end of the introduction, Michael will pick up and talk to you about greedy
algorithms. Next off, Neil will talk to you about divide and conquer. Finally, Pavel will have a unit on
dynamic programming. Each of these, they will talk to you about where the technique applies, how to
analyze it, how to implement it, all that good stuff.
But this is where I leave you, I hope you enjoyed the introduction, and I will put you in Michael's very
capable hands to start learning about greedy algorithms starting in the next lecture. So, until then,
farewell.
Week3
Greedy Algorithms
Week 3
Algorithmic Toolbox
Week 3
Discuss and ask questions about Week 3.
192 threads · Last post 32 minutes ago

Go to forum
Greedy Algorithms
In this module you will learn about seemingly naïve yet powerful class of algorithms called greedy
algorithms. After you will learn the key idea behind the greedy algorithms, you may feel that they
represent the algorithmic Swiss army knife that can be applied to solve nearly all programming
challenges in this course. But be warned: with a few exceptions that we will cover, this intuitive
idea rarely works in practice! For this reason, it is important to prove that a greedy algorithm
always produces an optimal solution before using this algorithm. In the end of this module, we will
test your intuition and taste for greedy algorithms by offering several programming challenges.
Less
Key Concepts
 Practice implementing greedy solutions
 Build greedy algorithms
 Create a program for changing money optimally
 Create a program for maximizing the value of a loot
 Create a program for maximizing the number of prize places in a competition
Less
Introduction

Ungraded External Tool: Ungraded External ToolInteractive Puzzle: Largest Number
1h
Resume
. Click to resume
Video: LectureLargest Number
2 min
Ungraded External Tool: Ungraded External ToolInteractive Puzzle: Car Fueling
10 min
Video: LectureCar Fueling
7 min
Video: LectureCar Fueling - Implementation and Analysis
9 min

Video: LectureMain Ingredients of Greedy Algorithms
2 min
Practice Quiz: Greedy Algorithms
3 questions
Grouping Children
Video: LectureCelebration Party Problem
6 min
Video: LectureEfficient Algorithm for Grouping Children
5 min
Video: LectureAnalysis and Implementation of the Efficient Algorithm
5 min
Fractional Knapsack

Video: LectureLong Hike
6 min
Video: LectureFractional Knapsack - Implementation, Analysis and Optimization
6 min
Video: LectureReview of Greedy Algorithms
2 min
Reading: Resources
2 min
Practice Quiz: Fractional Knapsack
3 questions
Ungraded External Tool: Ungraded External ToolInteractive Puzzle: Balls in Boxes

10 min
Ungraded External Tool: Ungraded External ToolInteractive Puzzle: Activity Selection
10 min
Ungraded External Tool: Ungraded External ToolInteractive Puzzle: Touch All Segments
1h
Programming Assignment: Programming Assignment 3: Greedy Algorithms
3h
Survey
Survey
10 min
http://dm.compsciclub.ru/app/list
Interactive Puzzles
These interactive puzzles will help you develop your problem
solving skills. Try them before attempting to solve the coding
challenges described in Learning Algorithms Through
Programming and Puzzle Solving textbook that powers our
online specialization Data Structures and Algorithms at
Coursera and MicroMasters at edX. Some of these puzzles will
also help you to solve problems in our Introduction to Discrete
Mathematics for Computer Science Specialization at Coursera.
Interactive Puzzle: Car Fueling
A car can travel at most 3 miles on a full tank. You want to make as few refills
as possible while getting from A to B. Select gas stations where you would like
to refill and press "Start journey" button.
Discuss this puzzle at the forum thread.
(These interactive puzzles are taken from the Introduction to Discrete Mathematics for Computer
Science specialization. They are optional, but we strongly encourage you to solve them: they will
help you to "invent" the key algorithmic ideas on your own and will help you to solve the
programming challenges. Even if you fail to solve some puzzles, the time will not be lost as you
will better appreciate the beauty and power of the underlying ideas.)
Passed 100%
This course uses a third-party tool, Interactive Puzzle: Car Fueling, to enhance your learning
experience. No personal information will be shared with the tool.
I agree to use this tool responsibly.

Hi.
In this video, we will consider the problem to find the minimum number of
refills during a long journey by a car.
You will see the similarities between this problem and
the largest number problem from the previous video.
By the end, you will be able to describe how greedy algorithms work in general and
define what is a safe move and a subproblem.
Consider the following problem.
You have a car such that if you fill it up to full tank,
you can travel with it up to 400 kilometers without refilling it.
And you need to get from point A to point B, and
the distance between them is 950 kilometers.
Of course, you need to refill on your way, and
luckily, there are a few gas stations on your way from A to B.
These are denoted by blue circles, and the numbers above them mean the distance from
A to the corresponding gas station along the way from A to B.
And you need to find the minimum number of refills to get from A to B.
One example of such route is to get from point A to the first gas station,
200 kilometers, then to get from first station to the third gas station,
350 kilometers distance.
Then from third gas station to the fourth gas station,
200 km, and then from the fourth gas station to B, 200 kilometers.
But that's not optimal.
We can do better.
Here is another route, which only uses two refills.
We get from A to the second gas station, less than 400 kilometers, then we get from
the second gas station to the fourth gas station, again less than 400 kilometers.
And then, from the fourth gas station to B, only 200 kilometers.
And this route uses only 2 refills, and
it turns out that in this problem, the minimum number of refills is exactly 2.
More formally, we have the following problem.
As the input, we have a car which can travel at most L kilometers,
where L is a parameter if it's filled up to full tank.
We have a source and destination, A and B, and we have n gas station at distances
from x1 to xn in kilometers, from A along the path from A to B.
And we need to output the minimum number of refills to get from A to B,
not counting the initial refill at A.
We want to solve this problem using a greedy strategy, and
greedy strategy in general is very easy.
You first make some greedy choice, then you reduce your problem to
a smaller subproblem, and then you iterate until there are no problems left.
There are a few different ways to make a greedy choice in this particular problem.
For example, you can always refill at the closest gas station to you.
Another way is to refill at the farthest reachable gas station, and by reachable,
I mean that you can get from your current position to this gas station
without refills.
Another way is, for example, to go until there is no fuel and
then just hope that there will be a gas station in there.
So what do you think is the correct strategy in this problem?
And of course, the third option is obviously wrong.
The first option is also wrong, if you think about it, but
the second option is actually correct.
It will give you the optimal number of refills.
We will prove it later.
For now, let's define our greedy algorithm as the whole algorithm.
So we start at A and we need to get to B with the minimum number of refills.
We go from A to the farthest reachable gas station G so
that we can get from A to G with full tank without any refills in the middle.
And now, we try to reduce this problem to a similar problem.
We make G the new A, and now our problem is to get from the new A to B,
again with the minimum number of refills.
And by definition, a subproblem is a similar problem of smaller size.
One example of subproblem is from the previous video.
When we need to construct the largest number out of a list of digits,
we first put the largest digits in front, and then we reduce our problem
to the problem of building the largest number out of the digits which are left.
In this problem, to find the minimum number of refills on the way from A to B,
the first refill at the farthest reachable gas station G.
And then solve a similar problem which is a subproblem to get
from G to B with the minimum number of refills.
Another important term is safe move.
We call a greedy choice a safe move if it is consistent with some optimal solution.
In other words, if there exists some optimal solution in which first move
is this greedy choice, then this greedy choice is called a safe move.
And we will prove a lemma that to refill at the farthest reachable gas station
is a safe move.
Let us first prove it visually.
Let's consider some optimal route from A to B,
and let the first stop on this route to refill B at point G1.
And let G be the farthest gas station reachable from A.
If G1 and G coincide, then our lemma is proved already.
Otherwise, G1 has to be closer to A than G,
because G is the farthest reachable from A, and G1 is reachable from A.
Now, let's consider the next stop on the optimal route, and that would be G2.
And the first case is that G is closer to A than G2,
then the route can look like this.
In this case, we can actually refill at G instead of G1,
and then we will have another optimal route
because it has the same number of refills and G is reachable from A.
And G2 is actually reachable from G,
because it was reachable from G1, but G is closer to G2 than G1.
So this is a correct route, and in this case, our lemma is proved.
And the second case is when G2 is actually closer to A than G, and
then the route can look like this.
But in this case we can avoid refilling at G1 at all and
refill at G2 or even refill at G in the first place.
And then we will reduce the number of refills of our optimal route,
which is impossible.
So the second case actually contradicts our statement
that we are looking at an optimal route, and we've proved our lemma.
To recap, we consider the optimal route R with a minimum number of refills.
We denote by G1 the position of the first refill in R, and by G2,
the next stop was R, which is either a refill or the destination B.
And by G we denote the farthest refill reachable from A, and
we considered two cases.
In the first case, if G is closer than G2 to A,
we can refill at G instead of G1, and it means that refill at G is a safe move.
Otherwise, we can avoid refill at G1.
So this case contradicts that the route R
is the route with the minimum number of refills.
So there is no such case, and we proved our lemma.
And in the next lecture, we will implement this algorithm in pseudocode and
analyze its running time.
What is the maximum possible value of the variable numRefills in the end?
n
Correct
We cannot refill outside of gas stations, we don't need to refill twice at the same gas station and
we don't consider the initial refill at A, so there can be at most n refills. In some cases, we will
need to refill at every gas station: for example, if the distance between A and the first gas station
is L and the distance between any two neighboring gas stations is also L.
Hi, in this video you will learn to implement the greedy algorithm from the previous video, and
analyze its running time. Here we have the pseudocode for this algorithm, and the procedure is called
MinRefills. And the main input to this procedure is array x. From the problems statement, we know
that the positions of the gas stations are given by numbers from x1 to xn. And those are measured in
kilometers in terms of distance from A to the corresponding gas station along the path from A to B.
For convenience, we actually add to array x positions of point A which is x0 and is the smallest value
in the array. And point B, which is Xn plus 1, and it is the largest value in the array x.
Along our route from A to B, we will visit some points. Of course we will start from point A. And then
we'll probably go to some gas station, refilll there. And then go to another gas station and to another
gas station and then to another and then at some point we will get to the point B or point x n plus 1.
So we see that we only need to store the positions in the array x. We don't need to consider any
positions between the elements of array x. And so, we will store in the variable currentRefill,
the position in the array x where we're currently standing. And we will initialize it with 0. Because we
start from point A, which is the same as x0, and has index 0 in the array x.
And later currentRefill will store the index in the array x, where we're currently standing.
We'll also store the answer to our problem in the variable numRefills.
At each point in the execution of the algorithm, it will contain the number of refills we have already
made. And we initialize it with zero because the problem statement asks us to count the minimum
number of refills we need to do. Not counting the initial refill at point A. So when we are standing at
point A, we consider that we haven't made any refills yet.
Then the main external while loop goes.
And it goes on while we're still to the left from point B, because then we need to go right to reach our
destination B. And we check this condition with this inequality, that currentRefill is at most n. This
means that the position or index in the array x is at most n, and so we're to the left from point B
currently. In this case we still need to go to the right. And first we save our current position in the
array x in the variable lastRefill. This means that we made our lastRefill in the position currentRefill.
And now we need to go to the right from there, and either get to destination B or get to the rightmost
reachable gas station and refill there. And the next internal while loop does exactly that.
It gradually increases our currentRefill position in the array x until it reaches the rightmost point in the
array x which is reachable from the lastRefill position.
So first we check that currentRefill position is at most n because if it is n plus 1 already it means that
we reached our destination B. And there is no point increasing it further. If it's still to the left from B,
then we'll look at the next position to the right, x currentRefill plus 1.
We need to check whether it's reachable from lastRefill position or not. And first we can build the
distance from the lastRefill position to the currentRefill plus one position by subtracting the values of
the array x. And if this distance is at most L, then it means that we can travel this distance with full
tank, without any refills. And of course, at the lastRefill position, we could fill our tank up to the full
capacity. And then we'll be able to travel for L kilometers. So, this inequality checks if actually position
currentRefill plus 1 is reachable from the lastRefill position. If it is, we increase the value of
currentRefill and go on with our internal while loop. When we exit this internal while loop we're
already maybe in the point B, or we may be in some point which is the farthest reachable gas station.
Now we compare it with our lastRefill position. And if it turns out that it is the same, it means that we
couldn't go to the right. We don't have enough fuel even to get to the next gas station. And then, we
cannot return the minimum number of refills that we need to do on the way from A to B, because it is
impossible to get from A to B at all. And so we return this result IMPOSSIBLE. Otherwise, we moved at
least a bit to the right, and then we need to see. If we are already in the point B, we don't need to do
anything else. Otherwise, we need to refill there. So, we check that we're to the left from point B with
this inequality. And if it's true then we're at some gas station and we need to refuel. So we increase
the numRefills variable by one. And then we return to the start of our external while loop. And there
we again check if we're to the left from point B we need another iteration. And if currentRefill is
already n plus 1, then we've reached point B and we need to exit the external while loop. And in that
case, we just return the answer which is number of refills we've made so far.
We've implemented the greedy algorithm from the previous lecture. Now let's analyze its running
time. From the first look it can seem that it works in n squared time because we have the external
while loop which can make n iterations and internal loop which can make n iterations. So, n by n is n
squared, but actually we will show that it only makes O of n actions for the whole algorithm. To prove
that let's first look at the currentRefill variable.
We see that it only changes one by one here.
And it starts from zero. And what is the largest value that it can attain?
Of course, the largest value is n plus 1 because this is the largest index in the array x, and currentRefil
is index in the array x. So, variable currentRefil starts from zero, changes one by one. And the largest
value it can have is n plus 1. It means that it is increased at most n plus 1 times which is Big-O of n. But
that's not all we do. Also, we increase variable numRefills here.
But we also increase it always one by one. It also starts from zero. And what is the largest number
that this variable can attain?
Well, of course, it is n because when we have n gas stations. There is no point to refuel twice at the
same gas station. So we can refuel at most n times. So variable numRefills goes from 0 to n and
changes one by one. So it is only changed at most n times.
And so, it is also linear in terms of n. And so, we have at most n plus 1 iterations of the external while
loop. Everything but the internal while loop takes there constant time. This assignment, this if, and
this if with assignment.
And the external loop and internal loop combined also spend at most linear time of iterations.
Because they change variable currentRefill and it changes at most linear number of times. So all in all
our algorithm works in Big-O n time.
Let's go through this proof once again. The Lemma says that the running time of the whole algorithm
is big O of n. And we prove this by first noticing that the currentRefill variable changes only from zero
to at most n plus 1. And the change is always one by one. That the numRefills variable changes from
zero to at most n. And it also changes one by one. So, both these variables are changed Big-O of n
times. And everything else that happens is constant time. Each iteration of the external loop, and
there are at most n plus 1 iterations of the external loop. Thus, our algorithm works in linear time. In
the next video, we will review what we've learned about greedy algorithms in general.
Hi. In this video, we will briefly review the main ingredients of greedy algorithms and the first of them
is reduction to a subproblem. Basically, when you have some problem, you make some first move and
thus reduce your problem to a similar problem, but which is smaller. For example, you have fewer
digits left or fewer gas stations left in front of you and this similar problem, which is smaller is called a
subproblem. Another key ingredient is a safe move and the move is called safe if it is consistent with
some optimal solution. In other words, if there exists some optimal solution in which the first move is
the same as your move, then your move is called a safe move and not all first moves are actually safe.
For example, to go until there's no fuel is not a safe move in the problem about car fueling. And often,
greedy moves are also not safe, for example, to get to the closest gas station and refuel at it is not a
safe move while to get to the farthest gas station and refuel there is a safe move. Now the general
strategy of solving a problem goes like this. First, you analyze the problem and you come up with
some greedy choice and then the key thing is to prove that this greedy choice is a safe move and you
really have to prove it. Because, otherwise, you can come up with some greedy choice and then come
up with a greedy algorithm and then even implement it and test it and try to submit it in the system.
Only to learn that the algorithm is incorrect, because the first move is actually not a safe move and
there are cases in which this first move is not consistent with any optimal solution. And in that case,
we will have to invent a new solution and implement it from scratch. All the work you've done before
will be useless. So please prove your algorithms and prove that the first move is a safe move. When
you prove that, you reduce a problem to a subproblem. And hopefully, that is a similar problem,
problem of a same kind and then you start solving this subproblem the same way. You make your
greedy choice and you reduce it to subproblem, and you iterate until there are no problems left or
until your problem is so simple that you can just solve it right away. And in the next lessons, we will
apply greedy algorithms to solve more difficult problems.
Celebration Party Problem
Hi, in this lesson we will discuss the problem of
organizing children into groups.
And you will learn that if you use a naive algorithm to solve this problem,
it will work very,
very slowly, because the running time of this algorithm is exponential.
But later in the next lesson, we will be able to improve the training time
significantly by coming up with a polynomial time algorithm.
Let's consider the following situation.
You've invited a lot of children to a celebration party, and
you want to entertain them and also teach them something in the process.
You are going to hire a few teachers and
divide the children into groups and assign a teacher to each of the groups
this teacher will work with this group through the whole party.
But you know that for a teacher to work with a group of children efficiently
children of that group should be of relatively the same age.
More specifically age of any two children in the same group
should differ by at most one year.
Also, you want to minimize the number of groups.
Because you want to hire fewer teachers, and
spend the money on presents and other kinds of entertainment for the children.
So, you need to divide children into the minimum possible number of groups.
Such that the age of any two children in any group differs by at most one year.
Now, let's look at the pseudo code for the naive algorithm that solves this problem.
Basically, this algorithm will consider every possible partition
of the children into groups and find the partition which both
satisfies the property that the ages of the children
in any group should differ by at most one and contains the minimum number of groups.
We start with assigning the initial value of the number of
groups to the answer m and this initial value is just the number of children.
Because we can always divide all the children into groups of one, and
then of course each group has only one child so the condition is satisfied.
Then we consider every possible partition of all children into groups.
The number of groups can be variable, so this is denoted by a number k,
and we have groups G1, G2 and up to Gk.
And then we have a partition,
we first need to check whether it's a good partition or not.
So, we have a variable good which we assigned to true initially,
because we think that maybe this partition will be good.
But then we need to check for each group whether it satisfies our condition or not.
So, we go in a for group with index i of the group from 1 to k,
and then we consider the particular group GI, and
we need to determine whether all the children in this group differ by at most
1 year, or there are two children that differ more.
To check that, it is sufficient to compare the youngest child with the oldest child.
If their ages differ more than by one, then the group is bad.
Otherwise, every two children differ by at most one year, so the group is good.
And so we go through all the groups in a for loop.
If at least one of the groups is bad,
then our variable good will contain value false by the end.
Otherwise, all the groups are good, and the variable good will contain value true.
So, after this for loop, we check the value of the variable good, and
if it's true, then we improve our answer.
At least try to improve it.
With a minimum of its current value and
the number of the groups in the current partition.
And so, by the end of the outer for loop which goes through all the partitions,
our variable m will contain the minimum possible number of groups in a partition
that satisfies all the conditions.
It is obvious that this algorithm works correctly
because it basically considers all the possible variants and
selects the best one from all the variants which satisfy our condition on the groups.
Now, let us estimate the running time of this algorithm.
And I state that the number of operations that this algorithm makes
is at least 2 to the power of n, where n is the number of children in C.
Actually, this algorithm works much slower and
makes much more operations than 2 to the power of n, but
we will just prove this lower bound to show that this algorithm is very slow.
To prove it, let's consider just partitions of the children in two groups.
Of course there are much more partitions than that.
We can divide them in two, three, four, and so on.
Much more groups.
But, let's just consider partitions in two groups and
prove that even the number of such partitions is already
at least two to the power of n.
Really, if C is a union of two groups,
G1 and G2, then basically we can make such partition for
any G1 which is a subset of such C of all children.
For any G1, just make
group G2 containing all the children which are not in the first group.
And then all the children will be divided into these two groups.
So, now the size of the set of all children is n.
And if you want to compute the number of possible groups G1 then we should
note that each item of the set, or
each child, can be either included or excluded from the group G1.
So, there can be 2 to the power of n different
groups G1. And so there are at least 2 to the power of n
partitions of the set of all children in two groups.
and it means that our algorithm will do at
least 2 to the power of n operations because this considers every partition.
Among all the partitions, there are all the partitions into two groups.
So, how long will it actually work?
We see that the Naive algorithm works in time Omega (2n),
so it makes at least 2 to the power of n operations.
And for example for just 50 children this is at least 2 to the power of 50 or
the larges number which is on the slide.
This is the number of operations that we will need to make and I estimate that
on a regular computer, this will take at least two weeks for you to compute
this if this was exactly the number of operations that you would need.
So, it works really, really slow.
But in the next lesson we will improve this significantly.
Efficient Algorithm for Grouping Children
Hi, in this lesson you will learn how to solve the problem of
organizing children into groups more efficiently.
Most specifically, we will come up with a polynomial time algorithm for
this problem as opposed to the exponential type algorithm from the previous lesson.
But in order to do this, we first need to do a very important thing
that you should probably do every time before solving an
Hi, in this lesson you will learn how to solve the problem of organizing children into groups more
efficiently. Most specifically, we will come up with a polynomial time algorithm for this problem as
opposed to the exponential type algorithm from the previous lesson. But in order to do this, we first
need to do a very important thing that you should probably do every time before solving an
algorithmic problem. You should reformulate it in mathematical terms. For example, in this problem
we will consider points on the line instead of children. For example, if we have a child of age three
and a half years we will instead consider a point on the line with coordinate 3.5 and if we have
another child of age 6, we will instead consider a point with coordinate 6 on the line.
Now, what do groups of children correspond to? If we have a group of children, it consists of several
children and several points on the line correspond to this group and the fact that the age of any two
children in the group differs by at most one, means that there exists a segment of length one on this
line that contains all those points.
Now the goal becomes to select the minimum possible number of segments of length one, such that
those segments cover all the points. Then, if we have such segments, we can just take all the points
from that segment in the same group, and any two children in the group who differ by, at most, one
year.
Now let's look at an example.
We have a line with a few points on it and we want to cover all the points with segments of length
one. Here is one example of such covering. All the segments on the picture are of the same length and
we consider that this is length one of this line.
This is not the optimal solution because below there is another example of covering and we have only
three segments and they still cover all the points.
Now we want to find a way to find the minimum possible number of segments to cover all the points
in any configuration.
We want to do that using greedy algorithm and you probably remember from the previous lessons
that to come up with a greedy algorithm, you need to do a greedy choice and to prove that this
greedy choice is a safe move.
I state that in this problem, safe move is to cover the leftmost point with a segment of length one
which starts or has left end in this point.
To prove that this is really a safe move, we need to prove that there exists an optimal solution with
the minimum possible number of unit length segments such that one of the segments has its left end
in the leftmost point.
Let's prove that.
To do that, let's consider any optimal solution of a given problem with a given point.
Let's consider the leftmost point colored in green.
It is covered by some segment.
Colored in red.
Now, let's move this red segment to the right until it's left end is in this leftmost point. I say that we
didn't miss any of the points in the process, because this green point is the leftmost point so there are
no points to the left from it and while we are moving the segment to the right, we didn't miss any of
the points.
It means that what we have now is still a correct covering because all of the points are still covered
and the number of segments in this covering is the same as in some optimal solution from which we
started and that means that it is also an optimal covering. So we have just found an optimal solution
in which there is a segment which starts in the leftmost point. So, we proved that covering the
leftmost point with a segment which starts in it is a safe move.
Now that we have a safe move, let's consider what happens after it. We have the leftmost point
covered, and also maybe some other points covered. So we don't need to consider these points
anymore. We are not interested in them and we need to cover all the other points with the minimum
possible number of unit length segments. So this is the same kind of problem which we started with,
so this is a subproblem.
Basically it means that we have a greedy algorithm. First, make a safe move. Add a segment to the
solution with the left hand starting in the leftmost point. Then remove all the points which are already
covered by the segment from the set and if there are still points left, repeat the process and repeat
this process until there are no points left in the set.
Analysis and Implementation of the Efficient Algorithm
What does it mean when we say that index ii points to some point PP?
It means that the point PP has coordinate x_ixi.
It means that the coordinate of PP is equal to ii.
This should not be selected

No, we use the index ii interchangeably with the notion of a pointer which points to position ii in
the array x_ixi. It means that the coordinate of point PP is equal to x_ixi. We use terms "index"
and "pointer" interchangeable in this case, because it is common to call indices "pointers" while
describing algorithms where we scan something, for example an array, from left to right or from
right to left. For example, a "two pointers algorithm" often means that we use two
indices ii and jj which scan an array in the same direction as each other or go in opposite
directions and meet each other.
It means that the coordinate of PPis equal to ii.
is selected.This is wrong. It should not be selected.

No, we use the index iiinterchangeably with the notion of a pointer which points to position iiin the
array x_ixi. It means that the coordinate of point PPis equal to x_ixi. We use terms "index" and
"pointer" interchangeable in this case, because it is common to call indices "pointers" while
describing algorithms where we scan something, for example an array, from left to right or from
right to left. For example, a "two pointers algorithm" often means that we use two
indices iiand jjwhich scan an array in the same direction as each other or go in opposite
directions and meet each other.
It means that the address of the point PP in memory is equal to ii.

If the input for PointsCoverSortedPointsCoverSorted consists of 55 points x_1 = 5, x_2 = 5.5,
x_3 = 5.8, x_4 = 6, x_5 = 7x1=5,x2=5.5,x3=5.8,x4=6,x5=7, what will be the value of ii after the
first iteration of the external while loop?
5
Correct
l = x_1 = 5, r = x_1 + 1 = 6l=x1=5,r=x1+1=6. ii will start from 11, then it will be incremented in
the line before the inner while loop, then it will be incremented in the inner while loop
until x_ixi becomes greater than r = 6r=6, and that will happen for i = 5i=5, because x_4 = 6
\leq 6x4=6≤6 and x_5 = 7 > 6x5=7>6.
5
is selected.This is correct.
l = x_1 = 5, r = x_1 + 1 = 6l=x1=5,r=x1+1=6. iiwill start from 11, then it will be incremented in
the line before the inner while loop, then it will be incremented in the inner while loop until x_ixi
becomes greater than r = 6r=6, and that will happen for i = 5i=5, because x_4 = 6 \leq 6x4
=6≤6and x_5 = 7 > 6x5=7>6.
6
Now let us consider the pseudocode that implements this algorithm. For the sake of simplicity we
assume that the points in the input are given in sorted order from smallest to largest. We'll start with
an empty set of segments denoted by R and we start with index i pointing at the first point which is
the leftmost because the points are sorted.
Now we go through the points, and we find the leftmost point. Currently i is pointing to the leftmost
point in the set. And at the start of the while loop i will always point to the leftmost point which is still
in the set. Now we cover it with the segment from l to r which has unit length, and the left end in the
point xi, so this is a segment from xi to xi+1. We add this segment to the solution set, and then we
need to remove all the points from the set which already covered. Instead of removing the points, we
will just move the pointer to the right and forget about the points, which are to the left from the
pointer. So the next while loop, does exactly that. We know that for any i that is larger than the
current i, xi is to the right from the left end of the segment, because the points are sorted. So if xi is
also to the left from R, then it is covered by the segment.
So we just go to the right, and to the right with pointer i. And while xi is less than or equal to r, we
know that the point is covered. And as soon as we find some xi which is bigger than r, it means that
this point is not covered and all the points further in the array are also not covered. So we stop. And
then we repeat again the iteration of the outer while loop. Or maybe our pointer i is already out of
the array of the input points. And then we stop and return R, which is the set of segments that we've
built in the process. Now let's prove that this algorithm works in linear time.
Indeed, index i changes just from 1 to n. And we always increase it by one. For each value of i, we add
at most one segment to the solution. So overall, we increase i at most n times and add at most n
segments to the solution.
And this leads to a solution which works in Big-O of n time. Now, we had an assumption that the
points in the input are already sorted. What if we drop this assumption? Then we will have to sort the
points first, and then apply our algorithm PointsCoverSorted.
Later in this module, you will learn how to sort points in time n log n.
Combining that with our procedure PointsCoverSorted will give you total running time of n log n.
Now let's look at our improvement.
We first implemented a straightforward solution, which worked in time at least 2 to the power of n.
And it worked very, very slowly. So that even for 50 children, we would have to spend at least 2
weeks of computation to group them.
Our new algorithm, however, works in n log n time. And that means that even if we had 10 million
children coming to a party, it would spend only a few seconds grouping them optimally into several
groups. So that's a huge improvement.
Now let's see how we went to this end. First, we've invented a naive solution which worked in
exponential time. It was too slow for us so we wanted to improve it. But to improve it, the very first
important step was to reformulate it in mathematical terms.
And then we had an idea to solve it with a greedy algorithm. So, we had to find some greedy choice
and prove that it will be a safe move. In this case, the safe move turns out to be to add to the solution
a segment with left and in the leftmost point. And then we prove that this is really a safe move. It is
very important to prove your solutions before even trying to implement them. Because otherwise, it
could turn out that you implemented the solution, tried to submit it, got wrong answer or some other
result, different from accepted. And then, you've made a few more changes, but it still didn't work.
And then, you understand that your solution was wrong, completely from the start. And then you
need a new solution, and you will have to implement it from scratch. And that means that you've
wasted all the time on implementation on the first wrong solution. To avoid that, you should always
prove your solution first.
So after we've proved the safe move, we basically got our greedy solution.
Which works in combination with a certain algorithm in time n log n.
Which is not only polynomial, but is very close to linear time, and works really fast in practice.
Fractional Knapasack
Long Hike
In the initial "Long Hike" problem, we were given food items, their total weights and energy values,
and we wanted to maximize the total energy value of fractions of food items that fit into the
knapsack of capacity W. In the new mathematical formulation, we are again given a knapsack of
capacity W and some items. We know the weights and values of the items, and we want to
maximize the total value of fractions of items that fit into the knapsack. What do the weights and
values in this mathematical formulation correspond to in the initial Long Hike problem?
Weights correspond to the weights of the food items and values correspond to the energy values
(calories).
Correct
Indeed, we maximize the total value, which is total energy value in this case, with restriction on the
total weight.
Weights correspond to the weights of the food items and values correspond to the energy values
(calories).
Indeed, we maximize the total value, which is total energy value in this case, with restriction on the
total weight.
Weights correspond to the energy values (calories) and values correspond to the weights of the
food items.
You are given a knapsack of capacity 7 kg and 3 items. First item has value $20 and weight 4 kg,
second item has value $18 and weight 3 kg, third item has value $14 and weight 2 kg. What is the
maximum total value of the fractions of items that fit into the knapsack in this case?
38
40
42
Correct
Turns out that $42 is the optimal value! In this video you will learn an algorithm that solves the
problem optimally. If you apply that algorithm to this case, you will get total value $42.
42
Turns out that $42 is the optimal value! In this video you will learn an algorithm that solves the
problem optimally. If you apply that algorithm to this case, you will get total value $42.
43
Hello. In this lesson, you will learn an algorithm to determine which food items and in which amounts
should you take with yourself on a really long hike so that to maximize their total energy value.
So, you're planning a long hike. It will take a few days or maybe a few weeks, but you don't know
exactly how long will it take. So, to be safe, you need to get enough food with you. And you have a
knapsack which can fit up to 15 kilograms of food in it. And you've already bought some cheese, some
ham, some nuts, and maybe some other food items. You want to fit them all in the knapsack, so as to
maximize the amount of calories that you can get from them. Of course you can cut the cheese. You
can cut the ham. You can select only some of the nuts. And then fit all of that into your knapsack.
To solve this maximization problem, we again need to first reformulate it in mathematical terms. And
then it becomes an instance of a classical fractional knapsack problem, which goes like this. We have
n items with weights w1 through wn and values v1 though vn.
And a bag of capacity big W. And we want to maximize the total value of fractions of items that fit
into this bag.
In this case, weights are also weights in the real life and values are the energy values of the food items
that you've bought.
So, here's an example, and we will denote by dollars the value of the item, and the weight just by
numbers. So, for example, the first item has value $20 and has weight 4, the second item has value
$18 and weight 3, and the third item has value $14 and weight 2. And we have a knapsack of capacity
7. There are a few ways with which we can fill this knapsack. For example, of them is put the whole
first item and the whole second item in the knapsack. Then the total value is the sum of the values of
the first item and the second, which is $38. We can improve on that. For example, take the whole first
item, the whole third item, and only one third of the second item for a total value of $40. We can do
even better by taking the whole third item, the whole second item, and only half of the first item, and
that will give us $42. And actually it turns out that this is the optimal thing to do.
So now we want to create a greedy algorithm that will solve this maximization problem, and we need
to get some greedy choice and make a safe move. And to do that, we have to look at the value per
unit of weight. So, for example for the first item, value per unit of weight is $5, for the second item,
it's $6 per unit, and for the third one it's $7 per unit. So although the first item is most valuable, the
third item has the maximum value per unit. And of course there is an intuition that we should
probably fit first the items with the maximum value per unit.
And really, the safe move is to first try to fit the item with the maximum value per unit. And there's a
lemma that says that there always exists some optimal solution to our problem that uses as much as
possible of an item with the maximum value per unit of weight. And what do we mean by as much as
possible? Well, either use the whole item, if it fits into the knapsack, or, if the capacity of the
knapsack is less than how much we have of this item, then just fill the whole knapsack only with this
item.
Let's prove that this is really a safe move.
We will prove looking at this example. So, first let's suppose we have some optimal solution,
and let's suppose that in this optimal solution, we don't use as much as possible of the best item with
the highest value per unit of weight. Then take some item which we used in this solution and separate
its usage into two parts, one part of the same size of how much we have of the best item, and the
second part is everything else. Then we can substitute the first part with the best item. So, for
example, in this case, we substitute half of the first item with second item.
Of course, in this part, the total value will increase, because the value per unit of weight is better for
the best item than for the item currently used. And in the general case, this will also work. So, either
we will be able to replace some part of the item already used by the whole best item, or we can
replace the whole item that is already used by some part of the best item. And in any case, if we can
make such a substitution, of course the total value will increase, because the best item just has better
value per unit of weight, so for each unit of weight, we will have more value. So this gives us a greedy
algorithm to solve our problem.
What we'll do is while knapsack is still not full, we will do a greedy choice. We will choose the item
number i which has the maximum value of vi over wi, which is the value per unit of weight. And then
if this item fits into knapsack fully, then take of all this item. Otherwise, if there is only few space left
in the knapsack, take so much of this item as to fill the knapsack to the end. And then in the end, we'll
return the total value of the items that we took and how much did we take of each item.
Fractional Knapsack - Implementation, Analysis and Optimization

If we initially have 4 items with v_1 = 2, w_1 = 1, v_2 = 3, w_2 = 2, v_3 = 4, w_3 = 3, v_4 =
5, w_4 = 4v1=2,w1=1,v2=3,w2=2,v3=4,w3=3,v4=5,w4=4, what will be the new order of these
items after sorting them in decreasing order by value per unit of weight? For example, if the
second item has the largest value per unit, then goes third, then fourth and then first, this gives
order "2,3,4,1".
1,3,2,4
This should not be selected

For the first item, value per unit of weight is \frac{2}{1} = 212=2, for the second it is \frac{3}
{2} = 1.523=1.5, for the third it is \frac{4}{3} = 1.333\dots34=1.333…, and for the fourth it
is \frac{5}{4} = 1.2545=1.25. 2 > 1.5 > 1.333\dots2>1.5>1.333…> 1.25>1.25, so the
correct order is "1,2,3,4".
1,2,3,4
Correct
For the first item, value per unit of weight is \frac{2}{1} = 212=2, for the second it is \frac{3}
{2} = 1.523=1.5, for the third it is \frac{4}{3} = 1.333\dots34=1.333…, and for the fourth it
is \frac{5}{4} = 1.2545=1.25. 2 > 1.5 > 1.333\dots2>1.5>1.333…> 1.25>1.25, so the
correct order is "1,2,3,4".
Hi.
In this lesson you will learn how to implement the Greedy Algorithm for
the Fractional Knapsack.
How to estimate its running time and how to improve its asymptotics.
Here is the description of the greedy algorithm from the previous lesson.
While knapsack is still not full, we select the best item left.
The one with the highest value per unit of weight.
And either fit all of this item in the knapsack or
if there is only few space left in the knapsack, cut this item and
fit as much as you can in what's left in the knapsack, and
then repeat this process until the knapsack is full.
In the end return the total value of the items taken and the amounts taken.
We've proven that the selection of best item is a safe move.
Then after we've selected the best item what we've got left is a knapsack with
a capacity which is less, but the problem is the same: you have some items and
you have a knapsack of some capacity and you should fill it optimally so
as to maximize the total value of the items that fit.
So this greedy algorithm really works.
Now let's implement it.
Here we have a procedure called Knapsack.
It starts with filling the array A with amounts of items taken with 0 and
the total value we also initialize to 0 and then,
as we said on the slide, we repeat for n times the following iterations.
If the knapsack is already full than in the variable W,
we will have 0 because in the start we have in the variable W the total
capacity of the knapsack, but each time we will put something in the knapsack, we will
update W will decrease it by the amount of weight that we put already in.
And so in the end, when the knapsack is full, W will be 0.
If W became 0, it means that we should just return the total value and
the amounts of the items that we took.
Otherwise we should select the best item.
The best item is the item which is still left so, wi is more than 0 and
out of such items, it's the item with the maximum value per weight,
so the one which maximizes vi over wi.
When we've selected the i, we determine the amount which will it take,
it is either the whole wi, the whole of this item if it fits in the knapsack.
Otherwise, if the capacity of the knapsack is already less,
then we just fill it up to the end.
So A is minimum of wi and big W.
After we select the amount, we just update all the variables.
So we update wi by decreasing it by a, because we took already a of this item.
We're also increased the amount
A of i corresponding to the item number i by the value A.
And we'll also decrease the capacity left, because we just decrease it by A.
By putting it A of item i.
Also we increase value V by this formula:
a multiplied by vi and divided by wi.
Why is that?
Because we took A of item number i, we took A units and
one unit brings us amount of value equal to vi over wi.
So if you take A units, the total value by these items,
a multiplied by vi and divided by wi.
After we do n iterations, or maybe less, if the knapsack is full before
we do all n iterations, we'll return the total value and the amounts in the array.
Now the running time of this algorithm is Big-O of n squared.
Why is that?
Well, first we have the inner selection of best item, which works in linear time.
Because basically, we have to go through all the items to select the best one.
And we have the main loop, for loop, which is executed n times at most, maybe less.
So in each iteration we do some linear time computation and
we do this at most n times.
That means that the total running time is Big-O of n squared.
Now we can't improve on that because if we sort the items by decreasing value of
vi over wi, then it will be easier to select the best item which is still left.
Let's look at this pseudo code.
Let's assume that we've already sorted the input items,
size that v1 over w1 is, more than or equal to v2 over w2 and that is greater,
or equal to the fraction for the next item and up to vn over wn.
And we can start with the same array of amounts and
the same total value filled with zeroes.
But then we make a for loop for i going from 1 to n.
And on each iteration i will be the best unit which is still left.
So on the start of the iteration we check whether we still have some capacity
in the knapsack.
If it is already filled we just return the answer.
Otherwise we know that i is the best item
because we didn't consider it previously and it is the item with the maximum
value per unit out of those which we didn't consider before.
So we determine the amount of this item with the same formula and
we update the weights, the amounts, the capacity, and
the total value in the same way as we did in the previous pseudocode.
The only change is that we change the order in which we consider the items.
And this allows us to make each iteration in constant time instead of linear time.
So, this new algorithm now works in linear time, because it has at most n iterations,
and each iteration works at most in constant time.
So, if we apply first some sorting algorithm
to sort items by decreasing value of vi over wi.
And then apply this new knapsack procedure.
Total run time will be n log n, because sorting will work in n log n.
And the knapsack itself will work in linear time.
Review of Greedy Algorithms

Hi, in this lesson, we will review what we saw in this module about greedy
algorithms and specify what is common and important to all greedy algorithms.
The main ingredients of any greedy algorithm are greedy choice and
reduction to a subproblem.
You have to prove that your greedy choice is a safe move.
And also, you have to check that the problem that is left after your safe move
is really a subproblem.
That is, a problem of the same kind but with fewer parameters.
After you have both, you have a greedy algorithm.
Then, you need to estimate its running time and
check that it's good enough for you.
Safe moves in different problems are different.
Basically, you have to invent something each time you have a new problem.
In the first problem it was, select the maximum digit and put it first.
In the last problem it was select the item with the maximum total value per weight.
And you see that in every safe move, there's something like maximum,
or minimum, or first, or leftmost, or rightmost.
So, always safe move is greedy, but not all greedy moves are safe.
So, you really have to prove every time
that the move that you invented is really safe.
Also, you can notice that sometimes we can
optimize our initial greedy algorithm if we sort our object somehow.
So, you can maybe try to solve problem,
assuming that everything is sorted in some convenient order.
And if you see that, because of that,
your greedy algorithm can be implemented asymptotically faster,
then you can just apply sorting first and then your greedy algorithm.
The general strategy is when I have a problem,
you can try to come up with some greedy choices, and then for some of them,
you'll be able to prove that they're really safe moves.
And if you've proven that this is a safe move,
then you've reduced your problem to something.
And then you have to check that this something is a subproblem.
That is, the problem about the same thing,
optimizing the same thing with the same restrictions.
And then, this is a subproblem.
And then you can solve it in the same way that you solved your initial problem.
And you have this loop from problem to subproblem and
back to the problem, always reducing it by the number of parameters.
And in the end of this loop, you will have a problem so
simple that you can solve it right away for one object or zero objects.
And then you have your greedy algorithm.
Resources
Slides
Reading
Knapsack: Section 6.5 of [BB]
References
[BB] Gilles Brassard and Paul Bratley. Fundamentals of Algorithms. Prentice-Hall. 1996.
Programming assignment 3
http://dm.compsciclub.ru/app/quiz-balls-in-boxes
http://dm.compsciclub.ru/app/quiz-activity-selection
http://dm.compsciclub.ru/app/quiz-touch-all-segments
Week 4
Algorithmic Toolbox
Week 4
128 threads · Last post 2 hours ago
Go to forum
Divide-and-Conquer
In this module you will learn about a powerful algorithmic technique called Divide and Conquer.
Based on this technique, you will see how to search huge databases millions of times faster than
using naïve linear search. You will even learn that the standard way to multiply numbers (that you
learned in the grade school) is far from the being the fastest! We will then apply the divide-and-
conquer technique to design two efficient algorithms (merge sort and quick sort) for sorting huge
lists, a problem that finds many applications in practice. Finally, we will show that these two
algorithms are optimal, that is, no algorithm can sort faster!
Key Concepts
 Express the recurrence relation on the running time of an algorithm
 Create a program for searching huge lists
 Create a program for finding a majority element
 Create a program for organizing a lottery
Introduction
Ungraded External Tool: Ungraded External ToolInteractive Puzzle: 21 questions

game
10 min
Resume
. Click to resume
Video: LectureIntro
3 min
Video: LectureLinear Search
7 min
Ungraded External Tool: Ungraded External ToolInteractive Puzzle: Two Adjacent Cells of Opposite
Colors
10 min
Video: LectureBinary Search
7 min
Video: LectureBinary Search Runtime
8 min
Reading: Resources
10 min
Practice Quiz: Linear Search and Binary Search
4 questions
Polynomial Multiplication
Video: LectureProblem Overview and Naïve Solution
6 min
Video: LectureNaïve Divide and Conquer Algorithm
7 min

Video: LectureFaster Divide and Conquer Algorithm
6 min
Reading: Resources
5 min
Practice Quiz: Polynomial Multiplication
3 questions
Master Theorem
Video: LectureWhat is the Master Theorem?
4 min
Video: LectureProof of the Master Theorem
9 min

Reading: Resources
10 min
Practice Quiz: Master Theorem
1 question
Sorting Problem
Video: LectureProblem Overview
2 min
Video: LectureSelection Sort
8 min
Video: LectureMerge Sort
10 min

Video: LectureLower Bound for Comparison Based Sorting
12 min
Video: LectureNon-Comparison Based Sorting Algorithms
7 min
Reading: Resources
5 min
Practice Quiz: Sorting
4 questions
Quick Sort
Video: LectureOverview
2 min

Video: LectureAlgorithm
9 min
Video: LectureRandom Pivot
13 min
Video: LectureRunning Time Analysis (optional)
15 min
Video: LectureEqual Elements
6 min
Video: LectureFinal Remarks
8 min
Reading: Resources
10 min

Practice Quiz: Quick Sort
4 questions
Programming Assignment: Programming Assignment 4: Divide and Conquer
3h
Ungraded External Tool: Ungraded External ToolInteractive Puzzle: Local Maximum

Hi, I'm Neil Rhodes. Welcome to the divide and conquer module. In the last module, you learned
about how to use greedy algorithms to solve particular classes of problems.
In this module you'll learn about ways of solving problems using divide and conquer algorithms.
The term divide and conquer is quite old, and when applied to war, suggests that it's easier to defeat
several smaller groups of opponents than trying to defeat one large group.
In a similar fashion, divide and conquer algorithms take advantage of breaking a problem down into
one or more subproblems that can then be solved independently.
Just as not all problems can be solved with a greedy algorithm, not all problems can be solved using
divide and conquer. Instead, these are both techniques that are part of a toolbox of strategies to
solve problems. As you're designing an algorithm, you'll need to consider whether or not a greedy
algorithm might work. If not, would a divide and conquer algorithm work?
Let's look at the general structure of a divide and conquer algorithm. Here, we have a problem to be
solved represented abstractly as a blue rectangle.
We break the problem down into a set of non-overlapping subproblems. Represented here, by
colored rectangles.
It's important that the subproblems be of the same type as the original.
For example, here's a way to break down the original rectangle problem into a set of subproblems
that are not of the same type. These subproblems are triangles. Thus this does not represent the
divide and conquer algorithm.
In this case, we've broken down the original rectangle problem into a set of subproblems that are
themselves rectangles. The difficulty is that these subproblems overlap with one another. Thus it too
does not represent the divide and conquer algorithm.
We return now to breaking down our problem into a set of non-overlapping subproblems of the same
original type.
We break it apart, then we go ahead and solve each subproblem independently. We solve the first
problem, represented by a check mark. We then continue solving each problem, in turn.
Once we've successfully solved each of the subproblems, we combine the results into a solution to
the original problem.
One question that comes up, how do we solve each subproblem?
Since each subproblem is of the same type as the original, we can recursively solve the subproblem
using the same divide and conquer strategy. Thus, divide and conquer algorithms naturally lead to a
recursive solution.
In practice, while you can program a divide and conquer algorithm recursively, it's not uncommon to
rewrite the recursive program into an iterative one. This is often done both because some
programmers aren't as comfortable with recursion as they are with iteration, as well as because of
the additional space that a recursive implementation may take in terms of additional stack space. This
can be language and implementation dependent. In summary, the divide and conquer algorithm
consists of one: breaking the problem into non-overlapping subproblems of the same type. Two:
recursively solving those subproblems. And three: combining the results. In the next video, we'll see
an extremely simple example of divide and conquer.
We're going to start our divide and conquer algorithms with what might be considered a degenerate
form of divide and conquer: Searching in an unsorted array using linear search.
Here's an example of an array. To find a particular element of the array, we look at the first element,
if it's not there, we look at the second element. We continue until we either find the element we're
interested in, or until we reach the end of the array.
This same type of search is also used to free elements that are stored in a linked list.
Let me describe a real-life use of linear search. Twenty years ago I did consulting for a company
developing software for one the first hand-held computers, the Apple Newton.
The application translated words between any two of the languages: English, French, Italian, German,
or Spanish. Its data was stored in five parallel arrays. So for example, car was in English at the second
position. In Spanish, car is auto, so the second position of the Spanish array contained auto.
The program would take the user's input word along with from and to languages. Then it would
search through the corresponding from array, English for example with trying to translate car from
English to Spanish. If it found a match, it returned the element at the same index location and target
language.
With a small dictionary of three words, as in this example, this linear search is quick. However, I was
brought in as a consultant to speed up the application. When users clicked on the translate button,
it'd take seven to ten seconds to retrieve the translated word, an eternity as far as the user was
concerned.
There were about 50,000 words in the dictionary, so on average it took 25,000 word checks in order
to find a match.
The next video, I'll show you how we sped up this application using binary search. The problem
statement for linear search is as follows: given an unsorted array with n elements in it and a key k,
find an index, i of the array element that's equal to k. If no element array is equal to k, the output
should be NOT_FOUND.
Note that we say an index rather than the index, to account for the fact that there may be duplicates
in the array. This might seem pedantic, but it's important to be as careful as possible in specifying our
problem statement.
The well known solution to this problem is a linear search. Iterate through the array until you find the
chosen element. If you reach the end of the array and haven't yet found the element, return
NOT_FOUND.
We can construct a divide and conquer recursive algorithm to solve this problem. Our recursive
function will take four parameters: A, the array of values; low, the lower bound of the array in which
to search; hgh, the upper bound of the array in which to search; and k, the key for which to search. It
will return either: an index in the range low to high, if it finds a matching value; or NOT_FOUND, if it
finds no such match.
As with all recursive solutions, we'll need to accurately handle the base case.
In particular, base cases for this problem will be either: be given an empty array, or finding a match on
the first element.
The subproblem is to search through the sub array constructed by skipping the first element.
We'll recursively search through that smaller sub array, and then just return the result of the recursive
search.
Although this is a recursive routine that breaks the problem into smaller problems, some would argue
that this shouldn't be called divide and conquer. They claim that a divide and conquer algorithm
should divide the problem into a smaller subproblem, where the smaller subproblem is some
constant fraction of the original problem. In this case the su-problem isn't 50%, or 80%, or even 95%
of the original problem size. Instead, it's just one smaller than the original problem size. I don't know,
maybe we should call this algorithm subtract and conquer rather than divide and conquer.
In order to examine the runtime of our recursive algorithm it's often useful to define the time that the
algorithm takes in the form of a recurrence relation.
A recurrence relation defines a sequence of values in terms of a recursive formula.
The example here shows the recursive definition of the values in the Fibonacci sequence.
You can see that we defined the value for the n'th Fibonacci as the sum of the preceding two values.
As with any recursive definition, we need one or more base cases. Here, we define base cases when
evaluating F(0) and F(1).
From this recursive definition, we've defined values for evaluating F(n) for any non-negative integer,
n. The sequence starts with 0, 1, 1, 2, 3, 5, 8, and continues on.
When we're doing run-time analysis for divide and conquer algorithms, we usually define a
recurrence relation for T(n). where T stands for the worst time taken for the algorithm, and n is the
size of the problem. For this algorithm, the worst-case time is when an element isn't found because
we must check every element of the array. In this case we have a recursion for a problem of size n
which consists of a subproblem of size n minus one plus a constant amount of work. The constant
amount of work includes checking high versus low, checking A at low equals key, preparing the
parameters for the recursive call, and then returning the result of that call. Thus the recurrence is T(n)
equals T(n-1) plus c, where c is some constant.
The base case of the recursion is in an empty array, there's a constant amount of work: checking high
less than low and then returning NOT_FOUND. Thus T(0) equals c. Let's look at a recursion tree in
order to determine how much total time the algorithm takes. As is normal, we're looking at worst-
case runtime, which will occur when no matching element is found.
In a recursion tree, we show the problem along with the size of the problem. We see that we have an
original problem of size n which then generates a subproblem of size n-1, and so on all the way down
to a subproblem of size zero: an empty array. The work column shows the amount of work that is
done at each level. We have a constant amount of work at each level which we represent by c, a
constant.
Alternatively, we could have represented this constant amount of work with big theta of one.
The total work is just the sum of the work done at each level that's a summation from zero to n of a
constant c. Which is n plus one times c, or just big theta of n.
This analysis seems overly complicated for such a simple result. We already know that searching
through n elements of the array will take big theta of n time.
However, this method of recurrence analysis will become more useful as we analyze more
complicated divide and conquer algorithms.
Many times a recursive algorithm is translated into an iterative one. Here we've done that for the
linear search. We search through the elements of array A from index low to index high. If we find a
match, we return the associated index. If not, we return NOT_FOUND.
To summarize, what we've done is one: created a recursive solution;
two: defined a corresponding recurrence relation, T;
three: solved T of n to determine the worst-case runtime; and four: created an iterative solution from
the recursive one. What you've seen in this video, then, is an example of a trivial use of our divide and
conquer technique in order to do a linear search.
In our next video we'll look at a non-trivial use of the divide an conquer technique for searching in a
sorted array: the well known binary search.
http://dm.compsciclub.ru/app/quiz-opposite-colors
Binary Search
For not found :
Based on midpoint idea
Hi, so let's talk now about binary search.
A dictionary is a good example of a ordered list.
Okay, basically where every word is in order.
And that makes finding words much easier.
You can imagine how difficult it would be to search a dictionary
if the order of the words was random.
You'd have to just search through every single page, and in fact,
every word on every page.
It'd take quite a long time.
So let's look at the problem statement for searching in a sorted array.
So what we have coming in is, A, an array, along with a low and
upper bound that specify the bounds within the array in which to search.
What's important about the array is that it's in sorted order.
What we mean by that is if we look at any index i at an element.
And then the next element, that this first element is no more than the next element.
We don't say less than because we want to allow for
arrays that have repeated elements.
So officially this is called a monotonic non-decreasing array.
The other input is the key to look for.
The output for this is an index such that the element at
that index in the array is equal to the key.
We say an element and not the element just as we did in linear search.
because of the fact that there may be more than one element--
more than one element that matches because there may be duplicates in the array.
If we don't have a match,
instead of returning NOT_FOUND as we did in the linear search case,
we're going to actually return somewhat more useful information,
which is where in the array would you actually
insert the element if you wanted to insert it?
Or where would it have been, if it were there?
So what we're going to return is the greatest index,
such that A sub i is less than k.
That is, if the key is not in the array,
we're returning an index such that if you look at the element at that index,
it's less than the key but the next element is greater than the key.
And we do have to take account of the fact that what if every element in
the array is greater than the key?
In that case, we're going to go ahead and return low- 1.
So look at an example.
We've got this array with 7 elements in it, and the element 20 is repeated in it.
So if we search in this array for 2, we want to go ahead and return 0,
saying that every element in the array is larger than this.
If on the other hand, we look for 3, we're going to return 1.
If we look for 4, we're also going to be returning 1.
which really signifies between 1 and 2.
That is, it's bigger than 3 but it's less than 5.
If we search for 20, we return 4.
Or we might also return 5.
Either one of those is valid because 20 is present at each of those indexes.
And if we search for 60, we'll return 7.
But if we search for 70, we'll also return 7.
So let's look at our implementation of BinarySearch.
So we're going to write a recursive routine, taking in A, low, high and key,
just as we specified in the problem statement.
First our base case.
If we have an empty array, that is if high is less than low, so
no elements, then we're going to return low-1.
Otherwise, we're going to calculate the midpoint.
So we want something halfway between low and high.
So what we're going to do is figure the width, which is high- low,
cut it in half, so divide by 2, and then add that to low.
That might not be an integer
because of the fact that high- low divided by 2 may give us a fractional portion,
so we're going to take the floor of that.
For example, in the previous case, we had 1 to 7, it'll be 7- 1 is 6,
divided by 2 is 3 + our low is 1 is 4, so the midpoint would be 4.
We'll see an example of this shortly.
And now we check and see is the element at that midpoint equal to our key.
If so, we're done, we return it.
If not, the good news is of course,
we don't have to check all the other elements, we've ruled out half of them.
So if the key is less than the midpoint element,
then all the upper ones we can ignore.
So we're going to go ahead and now return the BinarySearch in A from low to mid- 1,
completely ignoring all the stuff over here.
Otherwise, the key is greater than the midpoint, and again,
we can throw away the lower stuff and go from midpoint + 1, all the way to high.
Let's look at an example.
So let's say we're searching for the key 50 in this array with 11 elements.
So we'll do a binary search on this array, from 1 to 11, looking for 50.
Low is 1, high is 11.
We'll calculate the midpoint, the midpoint will be 11- 1 is 10,
divided by 2 is 5, add that to 1, the midpoint is 6.
And now we check and see is the midpoint element equal to 50?
Well, no.
The midpoint element is 15 and the element we are looking for,
the key we're looking for, is 50.
So we're going to go ahead and ignore the lower half of the array and
now call binary search again, with the low equal to 7, so one more than the midpoint.
So now we've got a smaller version of the problem.
We're looking for 50 within the elements 7 to 11, we'll calculate the midpoint.
11- 7 is 4 divided by 2 is 2, so
we'll add that to 7 to get a midpoint of 9.
We check, is the element at index 9 equal to our key?
The element at index 9 is 20, our key is 50, they're not equal.
However, 50 is greater than 20, so we're going to go ahead and
make a new recursive call with midpoint + 1, which is 10.
So, again, we do our binary search from 10 to 11.
We calculate the midpoint.
High- low, 11- 10 is 1, divided by 2 is one-half + 10 is 10 and a half,
we take the floor of that, we get 10 and a half, so our midpoint is 10 and a half.
I'm sorry, our midpoint is 10.
And now we check.
Is the value at element 10 equal to our key?
Well the value at element 10 is 50, our key is 50 so yes.
We're going to go ahead and return that midpoint which is 10.
In summary then, what we've done is broken our problem into
non-overlapping subproblems of the same type.
We've recursively solved the subproblems.
And then we're going to combine the results of those subproblems.
We broke the problem into a problem of size half
(slightly less than half).
We recursively solved that single subproblem and
then we combined the result very simply just by returning the result.
In the next video, we're going to go ahead and look at the runtime for
binary search,
along with an iterative version.
And we'll get back to actually discussing that problem that I discussed with
the dictionary translation problem.
We'll see you shortly.
Binary Search Runtime

Hi, in this video we're going to be looking at the run time of BinarySearch along with looking at an
iterative version of it. So here's our BinarySearch algorithm again. We look in the middle, if it's not
found, then we either look in the lower half or the upper half.
So whats our recurrence relation for the worst-case runtime? Well, the worst case is if we don't find
an element. So were going to look at T(n) Is equal to T of roughly n over 2 + c. We have a floor there
of n over 2 because if n is odd, let's say there are five elements, then the question is: how big is the
problem size on the next call. So if we have five elements we're going to either end up looking in the
upper half of the array. Those two elements or the lower half of the array, those two elements
because we skipped the midpoint. We already checked them. Plus some constant amount of work to
add together, to calculate the midpoint. As well as checking the midpoint against the key. And then
our base case is when we have an empty array. And that's just a constant amount of time to check.
So what's the runtime look like? We got our original size n, and we're going to break it down, n over 2,
n over 4. All the way down. How many of these problems are there. Well, if we're cutting something
in two over and over again. It's going to take log base two such iterations until we get down to 1. So
the total here, is actually log base two of n + 1. The amount of work we're doing is c. So at each level,
we're doing c work. So the total amount of work if we sum it, is just the sum from i=0 to log base 2 of
n of c.
That is just log base 2 of n + 1, that is log base 2 of n, that quantity, plus one times c.
And that is just theta of log based two of n, but really what we'd normally say is theta of log n,
because the base doesn't matter. That's just a constant multiplicative factor.
All right, what's the iterative version look like. The iterative version has the same parameters low,
high, and key. And we have a while loop that goes through similar to the base case so in the base case
of the recursive version we were stopping if high is less than low. Here, we have a while loop where
the while loop stops if high is less than low. We calculate the midpoint and then again check the key.
If it matches the element at the midpoint we return the midpoint. Otherwise, if the key is less than
the element, we know we're in the first half of the array and so instead of making a new recursive call
like we did in the recursive version we have the original array. And we want to look at the first half of
it so we're going to change the value of high and that will be mid minus one because we already
checked mid. Otherwise, we want to look in the upper half of the array so we move low up.
If we reach the end of the while loop. That is if we drop out of the while loop because high is less than
low. That meant we have nothing more to search. We have an empty array. And therefore, we didn't
find the element in the array. We're going to return low minus 1. So the same result as the recursive
version. The difference is we won't be using the stack space that the recursive version uses. You
remember we talked two videos ago about this real-life example where we had five languages and we
were translating words between any two of those languages. The way we had that represented was
parallel arrays, so that at any given index, each of the element in the arrays represented words that
were the same in all those languages. So for instance, chair in English is at index two, and in Spanish
that's silla and in Italian it's sedia. The problem was it took a long time to look, right? We had 50,000
elements in our arrays, and it took like ten seconds for searching, because we had to really search
through all of them if it wasn't there, on average, half of them, just 25,000. So one question might be,
why didn't we use a sorted array? Right? You could imagine, for instance, sorting these arrays. Here
they're sorted. The good part is, it's easy to find a particular word in a particular language. So I can
find house in English, for instance, and find what index that is at very quickly, using binary search. The
problem is, I no longer have this correspondence, because the order of the words that are sorted in
English is different from the order of the words sorted in Spanish. So if I look at chair, for instance, in
English, it no longer maps to silla. So instead, if I look at chair and that's to casa. So although we can
find a particular word in our source language, we don't know the corresponding word in the target
language. So the solution was to try and find some way we could do sorting and yet still preserve this
relationship where everything at an index meant the same translated word. The way to do that was
an augmented set of arrays. So what we really did was keep these augmented arrays which were
pointers back into the original arrays in sorted order. So we're having a kind of level of indirection. So
if I look at English for example, the order of the words in English is chair, house, pimple. Well, what
order is that in the original array? It is first element 2, and then element 1, and then element 3. So if
you want to do a binary search, you can use this sorted array. Whenever you want to look at what an
element
is in that represented sorted array. So for instance, if we looked at the middle element, which in the
sorted array is 2, it has the value 1 and that says go find house. So we basically, say house is sort of at
element 2 and chair is at element 1 and pimple's at element 3. The Spanish, of course, has different
mapping, so in Spanish, the first sorted word happens to be the first word in the array. The second
sorted word is the third word in the Spanish array; and the third sorted word, silla, is the second
element.
So what happened when we ran this? Well what happened, we had a space time trade off.
We had to pay extra space. And there were, of course, not only just English and Spanish sorted but
also French, Italian, and German. So, five arrays, extra arrays. Each array, had 50,000 entries in it and
what was the size of each element of the array? Well, it represented a number from one to 50,000
that can be represented in 16-bits which is two bytes. So we had 50,000 elements times 2 bytes, that
is 100,000 bytes times 5 is 500,000 bytes. So about a half a megabyte, which today is almost nothing.
And even then, was certainly doable 20 years ago.
That's the cost we have in space. What is the benefit that we get. Well, instead of having to do let's
say 50,000 look ups in the worst-case. Instead, we have to do log base two of 50,000 lock ups. So log
base 2 of 50,000, that's about, let's see, log base of 1,000 is about ten because two to the ten equals
1024, so we have another factor of 50 to go. Log base 2 of 50 is around, let's say six because I know
that 2 to the 5th is equal 32, 2 to the 6th equals 64. So, what that means is, we have 16 references we
have to do the array instead of 50,000. That's almost a factor of a thousand, so what that ended up
meaning is that when the user clicks translate, instead of taking ten seconds, it was what appeared to
be instantaneous. It was well under a tenth of a second.
So in summary, what we've seen is that the runtime of binary search is big theta of log n. Substantially
quicker than the big theta of n that linear search takes. So sorted arrays really help. In the next lesson
we're going to be looking at a more complicated application of divide and conquer, where we actually
have multiple subproblems instead of just one subproblem.
Resources
Slides

An elementary introduction to binary search at Khan Academy
Linear Search and Binary Search

Practice Quiz • 10 min
Linear Search and Binary Search

TOTAL POINTS 4
1.Question 1
You have an array with 1023 numbers. You use linear search to determine whether number 239 is
in this array or not. How many elements of the array will you look at if number 239 is not present in
the array?
1 point
1023
10
11
2.Question 2
Can you use binary search to find number 8 in the array [1, 24, 25, 23, 17, 8, 9]?
1 point
Yes, you can.
No, you cannot.
3.Question 3
You have a sorted array with 1023 elements. You use binary search to determine whether number
239 is present in this array or not. How many elements of the array will you compare it with if
number 239 is not present in this array?
1 point
10
1023
4.Question 4
What is the maximum number of iterations a binary search will make to find some number in the
array [1, 2, 3, 5, 8, 13, 21, 34]?
1 point
I understand that submitting work that isn’t my own may result in permanent failure of this course
or deactivation of my Coursera account. Learn more about Coursera’s Honor Code
Polynomial multiplications:
Video: LectureProblem Overview and Naïve Solution
6 min
Video: LectureNaïve Divide and Conquer Algorithm
7 min
Video: LectureFaster Divide and Conquer Algorithm
6 min
Reading: Resources
5 min
Practice Quiz: Polynomial Multiplication
3 questions
Problem Overview and Naïve Solution
In this lecture we're going to talk about a more complicated divide-and-conquer algorithm to solve
polynomial multiplication. So first we'll talk about what polynomial multiplication is. So polynomial
multiplication is basically just taking two polynomials and multiplying them together. It's used in a
variety of ways in computer science. Error correcting codes, if you want to multiply large integers
together. Right, so if you've got thousand digit integers and you want to multiply them together,
there's a quicker way than doing it the normal way you learned in elementary school. And that uses
the idea of multiplying polynomials. It is used for generating functions, and for convolution. Let's look
at an example. So let's say you have polynomial A, which is 3 x squared + 2x + 5, and polynomial B,
which is 5 x squared + x + 2. If you multiply them together you get 15 x to the fourth + 13 x cubed + 33
x squared + 9x + 10. Why is that? Well, let's look, for instance the 15 x to the fourth comes from
multiplying 3 x squared times 5 x squared, that's 15x to the fourth. The 10 comes from multiplying 5
by 2. The 13 x cubed comes from 3 x squared times x, which is 3 x cubed, plus 2x times 5 x squared,
which is 10 x cubed. For a total of 13 x cubed. So let's look at the problem statement. So we're going
to have two n- 1 degree polynomials, all right? a sub n-1 is the coefficient of the x to the n-1 all the
way down to a0 which is the coefficient of the x to the 0 term or the one term.
And then we similarly have a b polynomial as well.
Now first you may wonder what happens if you actually want to multiply polynomials that don't
happen to have the same degree? What if you want to multiply a degree three polynomial times a
degree two polynomial? Right, where the degree is just the exponent of the highest term.
Well in that case, what you you could do is just pad out the smaller polynomial, the lower degree
polynomial, to have zeros for its earlier coefficients. I'll give an example of that in just a second. And
then the product polynomial is the result that we want to come up with so that's a higher degree
polynomial, right? If our incoming polynomials, are degree n- 1, then we're going to get a term of the
x to the n- 1 in a, times x to the n- 1 in b, and that's going to give us an x to the 2n- 2 in the c term. So,
the c sub 2n-2 term, comes about from multiplying the a sub n-1 term and the b sub n-1 term. The c
sub 2n-3 term comes from the a sub n-1, b sub n-2, and a sub n-2, b sub n-1. So it's got two terms that
multiply together. The c sub 2n-4 term would have three terms that multiply together. And we have
more and more terms that get multiplied and summed together, and then fewer and fewer back
down. So c sub 2 has three pairs which get added together, c sub 1 has two pairs and c sub 0 has one
pair. So here's an example. This is actually the same example we had before. So n is three and all we
need, notice, are the coefficients. We don't actually need to have the x's written out. So 3, 2, and 5
means 3 x squared plus 2x plus 5. 5, 1, 2 means 5 x squared plus x plus 2.
What if B were only a degree one polynomial? It was just x plus 2. Well then we would set B equal 0,
1, 2. That is, B's x squared term is 0 x squared. So A(x) is this, B(x) is that. When you multiply them
together, we get the same result we got before. And now we just pluck off the coefficients here, so
the 15, the 13, the 33, the 9, and the 10. And that's our resulting answer: those coefficients. So let's
look at a naive algorithm to solve this.
The naive algorithm basically just says, well first off, let's create a product array. This is basically going
to be the C, the result, and it's going to be of highest degree 2n-2. So it's going to have 2n-1 terms all
the way from the 0 term up to the 2n-2 term. So we'll initialize it to 0, and then we'll have a nested for
loop. For i equals 0 to n-1, for j equals 0 to n-1. And at every time, what we'll do is we will calculate a
particular pair. So we'll calculate the A[i], B[j] pair, multiply them together and add them into the
appropriate product. Which is the appropriate product to put it in? It's the i + j case. As an example,
when i is 0 and j is 0, we calculate A at 0 times B at 0 and we add that to product at 0. So that says the
two zero-degree terms in A and B get multiplied together to the zero-degree term in C. At the other
extreme, if i is n-1 and j is n-1, we take A at n-1 times B at n-1 and we store that in the product of 2n-
2. As you can see, the intermediate values in product are going to have more terms added to them
than the edges.
And, of course, then we return the product. How long does this take? Well, this takes order n
squared. Clearly, we've got two for loops, one smaller for loop that's from 0 to 2n-2, so that's order n.
And then a nested for loop, where i goes from 0 to n-1, j goes from 0 to n-1, so those each go through
the first one n times, the second one n squared times. So our runtime is O(n squared).
In the next video, we're going to look at a divide and conquer algorithm to solve this problem.
Although, we'll see that it too will be somewhat naive.
Naïve Divide and Conquer Algorithm

Need understand
So let's look at a naive divide and conquer algorithm, to solve polynomial multiplication problem.
The idea is, we're going to take our long polynomial and we're going to break it in two parts. The
upper half and the lower half. So A(x) is going to be D sub one of X ,times x sub n over 2, plus d sub 0
of x, the bottom half.
D sub 1 of x, since we've pulled out x sub n over 2 terms, it's lowest term is actually, just a sub n over
2. So we have two parallel sub polynomials, the high and the low.
We do the same thing for B. So we break that into E sub 1 of x, and E sub 0 of x. Again, where E sub 1
of x is the high terms, E sub 0 of x is the low terms.
When we do our multiplication, then, we just multiply together D 1, x sub n over 2 plus D 0 and E 1
times x sub n over 2 plus E 0. And that then yields for terms, D sub 1 E sub 1 times x sub n, D sub 1 E
sub 0 + D sub 0 E sub 1 times x sub n/2 + D sub 0 E sub 0.
The key here is that, we now just need to calculate D1 E1, D1 E0, D0 E1, and D0 E0. Those are all
polynomials of degree n over 2. And so, now we can go ahead and use a recursive solution to solve
this problem. So it gives us a divide and conquer problem. Its run time is T of n, equals 4 T of n over 2.
Why 4? 4, because we're breaking into 4 subproblems. Each of them takes time T of n over 2 ecause
the problem is broken in half. Plus, then, in order to take the results and do our addition that's going
to take order n time. So some constant k, times that. Let's look at an example. So we have, n is 4, so
we have degree three polynomials. And we're going to break up A of x into the top half, 4x plus 3, and
the bottom half, 2x plus 1. Similarly, we're going to break up the top half of B of x. X cubed plus 2 x
squared just becomes x plus 2. And 3x plus 4, stays at 3x plus 4. Now, we compute D1 E1. So
multiplying together, 4x + 3, times x plus 2, gives us 4 x squared + 11x + 6. Similarly, we calculate D1
E0, D0 E1, and D0 E0. Now we've done all four of those computations, AB is just D1 E1, 4 x squared +
11x + 6 times x to the 4th, plus the sum of D1 E0 and D0 E1, times x squared, plus finally D0 E0. If we
sum this all together, we get 4 x to the 6th, plus 11 x to the 5th, plus 20 x to the 4th, plus 30 x cubed,
plus 20 x squared, plus 11x plus 4. Which is our solution. Now, how long's this take to run? We're
going to look at that in a moment. Let's look at the actual code for it. So we're going to compute a
resulting array, from 0 to 2n-2, so is all the results coefficients.
And our base case is that if n of size 1, we're going to multiply together A at a sub l, plus B at b sub l.
Let's look at those parameters again. So A and B are our arrays of coefficients, n is the size of the
problem, a sub l is the first coefficient that we're interested in. And b sub l is the coefficient in B, that
we're interested in.
So we're going to be going from b sub l, b sub l plus one, b sub l plus two, etc. And for n times. First
thing we'll do, is multiply together the D sub one and E sub one. So basically what we're doing, I'm
sorry, the D sub zero and E sub zero. So, what we're doing is taking A and B, we're reducing the
problem size by 2 and we're starting with those same coefficients. And we're going to assign those to
the lower half of the elements in R.
Then we're going to do something similar, where we take the upper halves of each of A and B.
So again, the problem size becomes n/2, but now we're moving the lower coefficient we're interested
in from a sub l to a sub l + n/2 and b sub l to b sub l + n/2. And we're going to assign those to the high
coefficients in our result.
Then, what we have to do is calculate D sub 0 E1, and D1 E0. And then, sum those together. When we
sum those together, we're going to assign those to the middle elements of the resulting array. And
we'll then return that result.
Now the question comes up, how long does it take? So we have an original problem of size n, we
break it into four problems of size n over 2. So, level 0 we have size n, level 1 we have size of n over 2,
at level i, our problems are of size n over 2 to the i. And all the way down to the bottom of the tree is
at log base 2 of n, and each of the problems are of size 1.
How many problems do we have? At level 0, we have 1 problem. We have then 4 problems. If we go
to the i'th level, we have 4 to the i problems. And at the very bottom, then we have 4 to the log base
2 of n problems.
How much work is there? Well, we just need to multiply together the number of problems times the
amount of work, so we have kn here, and 4 times, 4 because there are 4 problems, kn over 2, because
the problem size is n over 2 and the amount of work we're doing at each level is k times n over 2 per
problem. So 4 kn over 2 just equals k times 2n, At the ith level for the i problems, each problem takes
k times n over 2 to the i
to deal with, we multiply together, k 2 to the i times n. And at the very bottom, we have k amount of
work, we have a problem size of 1, times 4 to the log base 2 of n. Well four the log base two of n, is
just n squared. So we have k n squared. Our total as we sum up all the work is going to be summation
from i equals zero to log base two of n of four to the i k times n over two to the i. And that just gets
dominated by the very bottom term which is big theta of n squared. So that's what our runtime takes.
This is kind of weird. We went through all this work to create a divide and conquer algorithm. And yet,
the run time is the same run time as it was with our naive original algorithm.
We're going to see in the next video, a way to redo our divide and conquer algorithm, so we have less
work to do at each level, and so we actually get a better final run time.
Faster Divide and Conquer Algorithm

In this video we'll look at creating a faster divide and conquer algorithm in order to solve the
polynomial multiplication problem.
This problem, this approach was invented by Karatsuba in the early 1960s. So he was a graduate
student of Komolgorov, a famous Russian mathematician. And Komolgorov theorized that n squared
was the best that one could do. So there was a lower bound of n squared, doing polynomial
multiplication. Karatsuba, a grad student, heard the problem, went away, came back a week later
with a solution. So let's look at what is involved. So if we look at A(x) it's just a very simple polynomial,
a1x + a0. And B(x) = b1x + b0, and then C(x) is, what would match in there? a1b1x squared + (a1b0 +
a0b1)x + a0b0. So we'll notice here we need four multiplications. We need to multiply a1 times b1.
We need to multiply a1 times b0, a0 times b1, and a0 times b0. This is how we did the divide and
conquer in fact in our last video. So we need four multiplications. Karatsuba's insight was that there
was a way to re-write C(x), so that you only needed to do three multiplications. So basically what he
did is he re-wrote that inner term, a1b0 + a0b1 as something slightly more complicated. So he added
together, (a1 + a0) (b1 + b0). So (a1 + a0) (b1 + b0) is just a1b1 + a0b1 + a1b0 + a0b0. And then he
subtracted out the a1b1 and the a0b0, so he's left with a1b0 + a0b1. Which is exactly what's there to
begin with. The key here though, is how many multiplications are needed. It only needs three
multiplications. We need to compute a1 b1, even though we use it twice. We need to compute a0 b0,
even again, though we use it only twice. And then we need to multiply together (a1 + a0) and (b1 +
b0). So we do have some extra additions. But the key is, when we have three multiplications instead
of four.
Why does this matter?
Well, why it matters is because we are reducing the number of problems at each level. But let's first
look at an example. So here we've got A(x). We're going to have 4 x cubed + 3 x squared + 2x +1. B(x)
= x cubed + 2 x squared + 3x + 4. We're going to go ahead and pull out D1 and D0 like we did before.
In our divide and conquer. The key is what we're going to actually do in terms of the subproblems. So
we have D1 and D0. We have E1 and we have E0. We're going to compute D1 E1, again, just like we
did before. We're going to compute D0 E0, again just like we did before. But now we won't compute
D1 E0 and D0 E1. Instead we're going to sum together D1 and D0. Sum together E1 and E0.
So (D1 + D0) is going to be (6x + 4). (E1 + E0) is going to be (4x plus 6). And then we multiply those two
polynomials together, yielding 24 x squared + 52x + 24. So, so far, how many multiplications have we
done? Three. And then, our final result for A(x) B(x) is D1E1 times x to the fourth +, now what do we
do here? We take that (D1 + D0) (E1 + E0). (24x squared + 52x + 24), okay?
Add that in the second term. And then subtract out D1 E1. Subtract out D0 E0.
And then our final term will be D0 E0. If we simplify that middle portion, and all of it. We just end up
with 4 x to the sixth + 11 x to the fifth + 20 x to the fourth + 3 x cubed + 20 x squared + 11x + 4. Which
is the exact same result we got doing it in the more naive divide and conquer. And also the same way
we'd do it if we did a straight naive problem, okay? So we get the same result, three multiplications
instead of four multiplications. That extra multiplication makes a big difference. Let's look at our
runtime. So our initial problem is of size n. When we break it down, we have three problems of size n
over 2, again, rather than 4.
So level 0, problem size n. Level 1, a problem of size n over 2. At level i, our problems are of size n
over 2 to the i, just like they were in the other divide and conquer problem. And we have the same
number of leaves.
So at log base 2 of n level, all the problems are of size 1. And the number of problems that we have, 1
of them at level 0, 3 instead of 4 at level 1, 3 to the i. instead of 4 to the i, at level i. And 3 to the log
base 2 of n, instead of 4 to the log base, 2 of n at the bottom level.
How much work? We'll multiply together, so we'll figure out for each problem how much it takes. In
this case at level 0 it's kn.
At level 1, each problem takes k(n/2) work. And there are three of them. So it's k(3/2) n. At the ith
level, we end up with k times (3/2) to the i times n. And at the bottom level, k times 3 to the log base
2 of n. a to the log base b of c, is the same thing as c to the log base b of a. So therefore this is the
same as kn to the log base 2 of 3.
We sum those, summation from i = zero to log base 2 of n of 3 to the i times k times n over 2 to the i.
This is bounded, it's this geometric series bounded by the last term. Which is big Theta of n to the log
base 2 of 3. Log base 2 of 3 is about 1.58. So, we now have a problem where our solution is big Theta
of n to the 1.58. Compared to our original problem, which had a big Theta of n squared solution. So
this makes a huge difference as n gets large, in terms of our final runtime.
It's not uncommon for divide and conquer algorithms sometimes to require sort of a way of looking at
it in terms of breaking up a problem. So that you have fewer subproblems.
And because of the compounding of the fact that the more subproblems at a level, you have more,
and more, and more.
Reducing the number of subproblems, reduces the final runtime.
PRACTICE QUIZ • 15 MIN
Submit your assignment
Master Theorem

Video: LectureWhat is the Master Theorem?
4 min
Video: LectureProof of the Master Theorem
9 min
Reading: Resources
10 min
Practice Quiz: Master Theorem
1 question
What is the Master Theorem?

Here we're going to talk about the master theorem. We'll describe what the master theorem is and
how to use it. And we'll reserve to the next video a proof.
So we've had many occasions where we have had to write a recurrence relation for a divide and
conquer problem. This is an example of one for binary search. We break a problem down into a
problem half as big and we do a constant amount of work at each level. And this gives us a solution
T(n) = O(log n). The problem is for each one of these we have to create a recurrence tree, figure out
how much work is done at each level, sum up that work. That's a lot to do to solve each recurrence
relation. Here's an example that we used for the polynomial multiplication. So we broke a problem
into four sub-problems, each half the size, and did a linear amount of work. And the solution was T(n)
= O(n squared). When we had the more efficient algorithm, where we had only three sub-problems
instead of four, we then got a solution of O(n to the log base 2 of 3). Sometimes we break a problem
into only two subproblems and there the solution is O(n log n). So, wouldn't it be nice if there was a
way that we just had a formula to tell us what the solution is rather than having to create this
recurrence tree each time? And that's what the Master Theorem basically does.
So, the Master Theorem says if you have a recurrence relation T(n) equals a, some constant, times
T( the ceiling of n divided by b) + a polynomial in n with degree d.
And that ceiling, by the way, could just as well be a floor or not be there at all if n were a power of b.
In any case, the a is a constant greater than 0. b is greater than 1 because we want to actually make
sure the problem size gets smaller. And d is greater than equal to 0. Well, in that case, we have a
solution for T of n. There are three sub cases. Case number 1, and all of these cases depend on the
relationship between d, a, and b. In particular, is d greater than log base b of a? If so, the solution is
just this polynomial in n, O(of n to the d).
If d is exactly equal log base b of a, then the solution is big O of n to the d with an extra factor of log n.
And finally, if d is less than log base b of a, then the solution is big O of n to the log base b of a.
So let's look at some applications of this theorem. So here's one where we go back to the polynomial
multiplication. Here a is 4, b is 2, and d is 1. Because O(n) is just O(n to the 1). And we look at the
relationship between d, which is 1, and log base b of a, which is log base 2 of 4 or 2. Well clearly d is
less than log base b of a, so we're in case three. Therefore T(n) = O(n to the log base b of a), or just
O(n squared). If now we change the 4 to a 3, a is 3, b is 2, d is 1. Now d is still less than log base b of a
because log base 2 of 3 is greater than 1, and so again we're in case three. T(n) equals O(n to the log
base b of a), which equals O(n to the log base 2 of 3).
If we reduce the 3 down to a 2 what happens? Well here, a is 2, b is 2, d is 1. Log base b of a is log
base 2 of 2, which is just 1. So now d is equal log base b or a. We're in case two now. And so, T of n
equals O(n log n). And now this shows an example also of case two. So this is the binary search
example. A is 1, b is 2, d is 0. Well the log base two of one, log base b of a, is equal to zero. So d is
equal to log base b of a. We're in case two, T(n) = O(n to the d log n), which is in the 0 log n, which is
just O(log n). And a final example where we are actually in case one. So here a is 2, b is 2, and d is 2.
So log base b of a is log base 2 of 2, which is one. So d is now greater than log base b of a. We are now
in case one, T(n) equals O(n to the d), which is O(n squared). So what we've seen now is that we have
this master theorem that allows us, for most recurrences, when you do a divide and conquer which fit
into this general formula, allows us to easily figure out which case we are based on the relationships
between a, b, and d. And then figure out the result quite quickly.
In our next video we'll look at a proof of why the master theorem works.
Proof of the Master Theorem
In this video, we'll look at a proof of how the Master Theorem works.
So a reminder, the Master Theorem states that if T(n) equals a T of
ceiling of n over b plus a polynomial, then we have these three cases.
So let's do as we normally do with a recurrence relation and
let's create a recurrence tree.
So we'll have our recurrence at the top to just remind ourselves what that is.
Let's assume for the sake of argument that n is a power of b.
That's a reasonable assumption since we can always just pad n to be larger, right,
if we increase it by no more than b we can get to the next closest power of b and
then this will be a simpler analysis.
So we have our problem n.
At the next level, we break the problem down into
a copies of a problem n over b large.
So level zero.
We have a problem of size n.
Level 1 we have problems of size n/b.
At the general level i we have problems of size n over b to the i.
At the bottom level, which is level log base b of n, we have problems of size 1.
How many problems are there?
At level 0 there's of course one problem.
At level 1, a problems.
And in general at the ith level, a to the i problems.
At the log base b of n level, it's a to the log base b of n.
How much work do we have to do?
Well work is just a function of how many problems we have and
the amount of work for each problem.
So at level zero we have just O(n to the d) work.
There's one problem and it takes O(n to the d) time.
And level one we have a problems.
And each of them takes O(n over b to the d) work.
Okay, we can pull out the a and the b and the d to be all together, and
that's just O(n to the d) times a over b to the d.
At the ith level we have a to the i problems and
each one is O(n over b to the i to the d).
Again, we can pull out the a to the i, the b to the i, and
we're left with O(n to the d)
times a over b to the d to the i.
And finally, at the bottom level it's just a to the log base b of n because
the problems are all size 1.
It's just O(n to the log base b of a).
So the total amount of work is the summation from 0 to the log base b of n.
O(n to the d) times the quantity a over b to the d, all that to the i.
So let's look at what seems like a slight digression, and that is geometric series.
So a Geometric Series is a series of numbers that
progresses by some multiplicative factor.
I'll give you an example.
If we take 1 + 2 + 4 + 8 + 16 + 32 + 64, that's a geometric series
where our factor is a factor of 2 at each time.
Just as well, we could have a geometric series that goes down.
So we could have, for instance, let's say 10,000,
1,000, 100, 10, 1.
Where we're going down by a constant factor of ten at each increment.
Now it turns out, our multiplicative factor, let's call that r,
as long as r is not equal to one we have a simple closed form for this.
This is just a times
(1-r) to the n over 1 minus r.
And it turns out that big O notation,
what happens is we care about the largest term.
So our sum is going to be bounded by a constant times our largest term.
So, if r is less than 1 then our largest term is the first element a and
therefore our solution is O(a).
Okay, because it's our largest term, it gets smaller,
smaller, smaller, smaller, smaller.
And as long as it's by this multiplicative factor,
then all that really matters is this first term,
because the rest of it sums to no more than a constant times that first term.
If on the other hand, r is greater than 1, then what matters is the very last term,
because that's the biggest term and all the previous ones are smaller and smaller.
So it's smallest, larger, larger, larger, largest.
And so that largest term is a r to the (n-1).
So in a geometric series we care about either the first term or
the last term, whichever one is bigger.
Now if we take that back to the case of our recurrence tree,
we notice our summation here.
This is the same summation we had from our recurrence tree and we see that we have
a geometric series.
a is taking the place of big O then to the d and
r is taking the place of a over b to the d.
So our multiplicative factor is a over b to the d.
And there are three cases.
You remember as we stated the solution to the Master Theorem.
Case one is d is greater than log base b of a.
Well it's equivalent to saying a over b to the d is less than 1.
So now we have our multiplicative term is less than 1.
So it's getting smaller and smaller and smaller.
That means that the largest term is the first term.
And that's the one that we have an order of.
So this is big O of, officially big O of big O of n to the d,
which is just the same as big O of n to the d.
Case 2, where d equals log base b of a and equivalently,
a over b to the d is equal 1.
Well, if a over b to the d is equal to one, remember our geometric
series formula didn't hold, so we're going to just have to calculate this.
But if a over b to the d is 1, then a over b to the d to any power is still 1.
So that means, that our summation
is just a summation from i equals 0 to log base b of n of O(n to the d).
And that's just 1 plus log base b of n,
because that's the number of terms in our summation times O(n to the d).
Well the 1 is a low order term we don't care about, and log base b of n can
just be treated as log n, because a base change is just some multiplicative factor,
and that disappears in our big O notation.
So we end up with, as we see in the theorem, O(n to the d times log n).
And then our final case, is d is less than log base b of a,
which is equivalent to saying a over b to the d is greater than 1.
So here, our multiplicative factor is greater than 1.
So our smallest term is the first term and our largest term is the last term.
So in this case, this is big O of our
last term is O(n to the d) times a over b to the d to the log b of n.
So, i is log base b of n.
This is a bit of a mess.
Let's see whether we can fix this a little bit.
So let's go ahead and apply the log base b of n power separately to a and b to the d.
So we have, in the numerator, a to the log base b of n.
And then the denominator, b to the d times log base b of n.
Well, b to the log base b of n is just n.
So, that's going to disappear down to n to the d in the denominator.
In the numerator, a to the log base b of n,
by logarithmic identity is equal to n to the log base b of a.
So we can swap those other two.
And now, if we compare big O of n to the d and n to the d,
we know big O of n to the d is bounded by some constant, k times n to the d.
So we have k n to the d divided by n to the d, which is just some k.
And that constant can go away because we're still talking about big O notation.
So we're left just with big O of n to the log base b of a,
which is what we have for the final case.
So the Master theorem is a shortcut.
Our master theorem again as a restatement is here.
I have a secret to tell you, however.
I do not remember the master theorem and
I don't actually even look up the master theorem.
Here's what I do.
When I have a recurrence of this rough form, I look at the amount of work
done at the first level and at the second level (which is a very easy calculation) and
then I just say to myself Is that the same amount of work?
If it's the same amount of work it's going to be the same amount of work
all the way down and so we're going to be in case two.
So it's going to be the amount of work at the first level,
which we known is O(n to the d), times log n because there are that many levels.
On the other hand, if the first term is larger than the second term
I know the first term is going to dwarf all the other terms.
And so, we're left with just O(n to the d).
And finally, if the first term is less than the second term,
I know they're going to keep increasing and it's the bottom term that I need.
And that is just going to be the number of leaves which is n to the log base b of a.
The master theorem is really handy to use whether you memorize it or you have it
written down and use it or in my case you sort of recreate it every time you need it.
Thanks.
Resources
Slides
Reading
Master Theorem: Section 2.2 of [DPV08]
References
Master Theorem
Sorting problem:
Problem Overview
Hello, and welcome to the sorting problem lesson.
As usual, we start with a problem I'll review.
So sorting is a fundamental computational problem. Your input in this problem consists of a sequence
of elements, and your goal is to output this element in, for example, non-decreasing order.
The formal statement of this problem is as follows. You are given a sequence of finite elements. We
will usually denote the sequence by A throughout this lesson. And your goal is to output these same
elements in non-decreasing order.
Once again, sorting is an important computational task used in many efficient algorithms. For some
algorithms, it is just as important to process given elements in non-decreasing order, going from
smaller ones to larger ones. In some other algorithms, just by sorting your input data, you gain a
possibility to perform your queries much more efficiently. A canonical example of such situation is a
search problem. In this problem, we are given a sequence of finite elements. And your goal is to check
whether a particular element is present in your sequence. A simple way to solve this problem, is of
course, just to scan your input sequence from left to right and to check, whether your element is
present in this sequence. This gives you a linear kind algorithm. And you know already that if you
input data, if you input sequences you sorted, then you can do this much more faster. Basically, in
time, in logarithmic time, in the size of your input sequence. So ou first compare your element to the
middle element.
If it is just few element, then you are done, if it is not, you continue with the left half of your sequence
or the right half of your sequence. So in logarithmic number of comparison, and the worst case, you
will be able to say whether your element is present in this sequence or not. So, if you are given a
sequence and you are expecting many such queries. You're expecting to be asked to check whether a
given object is present or not. For me such objects, then it just makes sense to first sort your input
data and only then perform all these queries. This will give you a much more efficient algorithm in
general. All right. And this is only a small example. We will see many other situations, where sorting
your data first helps to perform queries much more efficiently. So in the subsequent videos of this
lesson, we will study many efficient sorting algorithms.
Selection Sort
In this video, we will study one of the simplest sort of algorithms called selection sort.
So it's main idea is quite simple, we just keep growing the sorted part of our rate. So let me illustrate
it on a toy example, assume we're given a sequence of links. Five consistent of five integers, eight four
two five and two. So we start just by finding one of the minimum elements in this array, in this case it
is two. Now lets just do the following, lets just swap it with the first element of our array.
After swapping, two stays in its final position, so two is the minimum value of our array and it is
already in its first position. Now let's do the fun one, let's just forget about this element. It is already
in its final position and let's repeat the same procedure was the remaining part of our array. Namely,
we began first find the minimum value, it is again two. We'll swap it with the first element of the
remaining part and then we'll just forget about this element. So again, we find the minimum value
which is now four with what was the first element of the remaining part which is now the sole
element of our array. And then, we just forget about first three elements and we continue with only
remaining parts. So once again, we just keep growing the sorted part of our array. In the end, what
we have, is that the whole array is sorted. The pseudocode shown here on the slide, directly
implements the idea of the selection sort algorithm that we just discussed.
So here we have a loop where i ranges from 1 to n. Initially, i is equal to 1. Inside this loop, we
compute the index of a minimal value in the array, from, within the list from i to n. We do this as
follows, so we create a variable, minlndex which is initially equal to i. And then we go through all the
remaining elements inside this part, I mean through elements from i + 1 to n. And if we find a smaller
element we update the variable minlndex. So in the end of this for loop, what we have is that
minindex is a position of a minimal element inside the array from i to m.
Then we swap this element with the element Ai.
Namely, when i is equal to one, what we've done, we've found the minimal element in the well array
and we've swapped it with the first element. So now, the first element of our array is in its final
position. Then under second iteration of our loop, we do the same actually. We find the minimum
value, the position of a minimum value inside the remaining part of our array and put it on the second
place. On the sort loop we find the minimum value in this remaining part and put it on the place and
so on. So we keep growing the sorted part of our array. So when it would be useful to check the
online visualization to see how it goes, so let's do this.
This visualization shows how selection sort algorithm performs on a few different datasets. Namely on
the random datasets, on a sequence which is nearly sorted. Also on a sequence which is sorted in
reversed order. And on a sequence which contains just a few unique elements. So let's run this
algorithm and see what happens.
So you can see that indeed this algorithm just grows the sorted region, the sorted initial region of our
array. So another interesting property is it is revealed by this visualization is the following. So the
running time of this algorithm actually does not depend on input data. So it only depends on the size
of our initial sequence.
The other [INAUDIBLE] time of how algorithm is quadratic and this is not difficult to see right? So what
we have is two nested loops. In the outer loop, i ranges from 1 to n. In the inner loop, j ranges from i
plus 1 to n, to find a minimum inside the remaining part of our array. So in total we have quadratic
number of iterations. At this point however, we should ask ourselves whether our estimate was right
in time of the selection, so our algorithm was too pessimistic. And this is whar I mean by this. So recall
that we have two nested loops. In the outer loop, i ranges from 1 to n. In the inner loop, g ranges
from i + 1 to n. So when i is equal to 1, the number of iterations of the inner loop is n- 1. However,
when i is equal to 2, the number of iterations of the inner loop is n- 2, and so on. So when i increases,
the number of iterations of the inner loop decreases. So a more accurate estimate for the total
number of iterations of the inner loop would be the following, (n- 1) + (n- 2) + (n- 3) and so on.
So it is definitely less than n-squared. However we will show this it is equal to n-squared. Namely, this
is xx n-squared, and this it is roughly equaled n-square divided by two.
The sum is that we need to estimate is called an Arithmetic Series, and there is a known formula for
this for this sum. Namely 1 + 2 + 3 +, and so on, n, is equal to n(n+1)/2. And this is how we can prove
this formula.
Let's just try it, all our n integers in a row, 1, 2, and so on, n. Below them let's write the same set of
integers, but in the reverse order. So, n, then n minus 1, and so on, 2, and 1. Then what we get is a
row of size 2 by n. Having n columns, and in each column, the sum of the corresponding two integers
is equal to n plus 1. Great, so in the first column we have n and one, and in the second column we
have two and minus one and so on and in the last column we have n and one. So the sum in each
column is equal to n plus one and zero n columns. Which means that the sum of all the numbers in
our table is equal to n, when supplied by n plus one. So since this table contains our sum, the sum of
the integers from 1 to n twice, we conclude that the sum of all the numbers from 1 to n is equal to
n(n+1)/2. Another possibility to find this formula, to see why this formula is correct is to take a
rectangle of size n, of dimensions n multiplied by n plus 1. So it's area is equal to n multiplied by n plus
one. And to cut it into two parts such as it's shown in the slide, such as the area of each of these two
parts is equal to 1 + 2 + and so on n. We're all ready to conclude. So we've just discussed the selection
sort algorithm. This algorithm is easy to implement, easy to analyze, and it's running time is n
squared, where n is the size of the input sequence. So it sorts the input sequence and array in place.
Meaning that it requires almost no extra memory. I mean, all extra memory which is required by this
algorithm is only for storing indices, like i, j and m index. There are many other quadratic algorithms,
like insertion sort and bubble sort. We're not going to cover them here, and instead, in the next video
we will proceed, to do a faster, a faster sort algorithm.
Merge Sort
Play
Volume
0:01/10:53
Subtitles
Settings
Full Screen
Notes
All notes
Click the “Save Note” button when you want to capture a screen. You can also highlight and save
lines from the transcript below. Add your own notes to anything you’ve captured.
Save Note
Discuss
Download
Help Us Translate
Interactive Transcript - Enable basic transcript mode by pressing the escape key
You may navigate through the transcript using tab. To save a note for a section of text press CTRL
+ S. To expand your selection you may use CTRL + arrow key. You may contract your selection
using shift + CTRL + arrow key. For screen readers that are incompatible with using arrow keys for
shortcuts, you can replace them with the H J K L keys. Some screen readers may require using
CTRL in conjunction with the alt key
In this video, we will study the so-called merge sort algorithm.
It is based on the divide and conquer technique,
which main idea is the following.
To solve a given computational problem, you first split it into two or
more disjoint subproblems, then you solve each of these subproblems recursively.
And finally, you combine the results that you get from the recursive
calls to get the result for your initial subproblem.
And this is exactly what we're going to do in merge sort algorithm.
So let's show a toy example.
We're given an array of size eight, and we are going to sort it.
First, we just split this array into two halves of size four,
just the left half and the right half.
Then we make two recursive calls to sort both these parts.
These are two results in arrays.
Now what remains to be done is to merge these two arrays into one,
these two arrays of size four into one array of size eight.
Well, let's think how this can be done.
First of all,
I claim that it is easy to find the minimal value in the resulting array.
Indeed, we know that the minimum value in this case in the first array is two, and
the minimum value in the second array is one.
Which means that the minimum value in the result in merge array must be one.
So let's take one from the right side of array, put it in the resulting array and
forget about it.
It is already in its right place.
What remains is an array of size four and
an array of size three that still need to be merged.
Well, again, it is easy to find the minimum value of
the result of merging these two arrays.
In this case, it is two, because the minimum value in the array of size four
is two, and the minimum value in the arrays of size three is six.
So two is smaller than six, so we get two out of our left array,
put it into the resulting array after one, and press hit.
In the end, we get the following sorted array.
Again, the pseudocode of the merge sort algorithm directly implements this idea.
So this pseudocode takes an input array A of size n as an input.
And if n is equal to 1, then in this case, just nothing needs to be done,
we can just return the rate A itself.
If n is greater than 1, on the other hand, then we split the rate
A into two roughly equal parts and sort them recursively.
We call them B and C here.
Then the only thing that needs to be done is to merge these two sorted arrays.
So this is done in the procedure merge, which we will present on the next slide.
And finally, we just return the result of this merging procedure.
The pseudocode of the merging procedure is also straightforward.
Assumes that we are given two sorted arrays, B and C, of size p and
q respectively, and we would like to merge them into a sorted array of size p + q.
So the first thing we do is create an array of size p + q in array D.
It is initially empty.
Then we keep doing the following thing.
So what is the minimum value among all the values stored in the arrays B and C?
Well, it is easy to find.
We know that the first element in the array B is its smallest element, and
the first element in the array C is its smallest element.
So the smallest one among these two is the smallest
element inside the unit of these two arrays.
So we just find the minimum of these first elements and move it from one of
these arrays to the results in array D, and forget about this element completely.
Now what is left is essentially the same problem.
We're left with two sorted arrays, and we still need to merge them.
So we do it exactly the same.
We take the first two elements,
we compare them and move the smaller one to the resulting array.
And we keep doing this while both of these arrays are empty.
I mean, we need this to be able to take their first elements.
When one of them becomes empty,
we just copy the rest of the other array to the resulting array D.
I mean, where rest to the resulting array D.
Well, it is not difficult to see that this procedure is correct, and the trying
time is p + q, namely, the size of the array p plus the size of the array q.
And this just because we just can both of these arrays from left
to right in the run of this merging procedure.
This is how sorting our initial array of size eight
by the merge sort algorithm looks like.
So the merge sort algorithm first splits
the initial array of size eight into two arrays of size four.
Each of these arrays of size four in turn is split into two arrays of size two, and
each of them is split into two arrays of size one.
Then merge procedure starts merging these arrays of size one into arrays
of size twos and into, then these arrays of size two into a size four.
And finally, it merges the result into arrays of size four,
into the resulting array of size eight.
We are now going to prove that the running time of the merge sort algorithm,
on a sequence containing n elements, is big O of n log n.
Know that this is significantly faster than a quadratic selection sort algorithm.
For example, it is perfectly okay to sort the sequence of size 1 million, for
example, 10 to the 6th, on your laptop using merge sort algorithm.
While for the quadratic time selection sort algorithm,
sorting a sequence of size 10 to the 6th, 1 million, will take roughly
10 to the 12th operations, which is too much for modern computers.
Okay, so to prove this lemma, to prove the upper bound on the running
time of the merge sort algorithm, first know that to merge two parts
of size n over 2 of our initial array, takes the linear time.
Namely, big O of n, because while the left part has size n over 2,
the right part has size n over 2.
And for merging, we basically just combo these parts from left to right.
So it takes just a linear amount of work to do this.
Which, in turn means, that if we denote by T of n the running time of our merge sort
algorithm, then it satisfies the following recurrence.
T(n) is at most 2T(n / 2) + big O(n).
Here 2T(n / 2) could response to two recursive calls.
So we denote it by T(n), the running time of our algorithm on input of size n.
So when we sort two sequences of size n / 2,
we spend time twice T(n / 2).
So the big O of n term corresponds to what we do before we make recursive calls and
what we do after recursive calls.
So what we do before is just split the input array into two halves.
What we do after is merging the results of two arrays into one array of size n.
So it is not difficult to see that all of this can be done in linear time.
So we get this recurrence, and on the next slide,
we're going to show that this recurrence implies that the running
time of our algorithm is bounded from above by n log n.
To estimate the running time of this algorithm,
let's consider its recursion tree.
Namely, at the top of this tree, we have one array of size n.
So for this array of size n, we make two recursive calls for
arrays of size n over 2.
Each of these arrays of size n over 2 in turn is split into two
arrays of size n over 4.
So we get four arrays of size of n over 4 and so on.
So in this tree, we have log n levels.
Now let's estimate the work done at each of the levels of these three separately,
namely, once again, to solve a problem of size n.
To sort an array of size n, we first prepare to make recursive calls.
In this case, we just split the array into two halves of size n over 2.
Then we do make recursive calls, and then we need to combine the results.
So all the work now inside recursive calls will be accounted for
on the lower levels of this tree.
So now what we are going to do is to account for
only the work done before the recursive calls and
after the recursive calls at each separate level.
And we know already that it takes linear time to do this.
I mean, if we have an array of size n,
it takes linear time to split it into two halves.
And then it takes linear time to combine the results of
recursive calls into one array.
So let's just denote this time by cn,
I mean let's denote the hidden constant inside big O by c.
Then what we can say is that on the top level we spend time cn.
Then on the next level, for each subarray,
we spend time c times n over 2, because the size of array is n over 2.
However, we have 2 arrays, so the total work that we do at this level is 2
multiplied by c, multiplied by n over 2, which is again just cn.
On the next level, we spend time 4 because we have 4 arrays multiplied by c,
multiplied by n over 4, because the size of the array is now n over 4.
This is a cn again, and so on.
So we have log n levels.
At each level, we do roughly cn operations.
So the total number of operations in our algorithm is cn log n,
which proves our lemma.
So again, what we've just proved is that the running time of
the merge sort algorithm is big O of n log n.
So in the next video, we will show that actually no algorithm,
no comparison based algorithms, to be completely formal, can sort a given
sequence of n elements asymptotically faster than in n log n time.
Which actually means that the merge sort algorithm is asymptotically optimal.
Merge Sort
In this video, we will study the so-called merge sort algorithm. It is based on the divide and conquer
technique, which main idea is the following. To solve a given computational problem, you first split it
into two or more disjoint subproblems, then you solve each of these subproblems recursively. And
finally, you combine the results that you get from the recursive calls to get the result for your initial
subproblem. And this is exactly what we're going to do in merge sort algorithm. So let's show a toy
example. We're given an array of size eight, and we are going to sort it. First, we just split this array
into two halves of size four, just the left half and the right half. Then we make two recursive calls to
sort both these parts. These are two results in arrays. Now what remains to be done is to merge these
two arrays into one, these two arrays of size four into one array of size eight. Well, let's think how this
can be done. First of all, I claim that it is easy to find the minimal value in the resulting array. Indeed,
we know that the minimum value in this case in the first array is two, and the minimum value in the
second array is one. Which means that the minimum value in the result in merge array must be one.
So let's take one from the right side of array, put it in the resulting array and forget about it. It is
already in its right place.
What remains is an array of size four and an array of size three that still need to be merged. Well,
again, it is easy to find the minimum value of
the result of merging these two arrays. In this case, it is two, because the minimum value in the array
of size four is two, and the minimum value in the arrays of size three is six. So two is smaller than six,
so we get two out of our left array, put it into the resulting array after one, and press hit. In the end,
we get the following sorted array. Again, the pseudocode of the merge sort algorithm directly
implements this idea. So this pseudocode takes an input array A of size n as an input. And if n is equal
to 1, then in this case, just nothing needs to be done, we can just return the rate A itself. If n is greater
than 1, on the other hand, then we split the rate A into two roughly equal parts and sort them
recursively. We call them B and C here. Then the only thing that needs to be done is to merge these
two sorted arrays. So this is done in the procedure merge, which we will present on the next slide.
And finally, we just return the result of this merging procedure.
The pseudocode of the merging procedure is also straightforward. Assumes that we are given two
sorted arrays, B and C, of size p and q respectively, and we would like to merge them into a sorted
array of size p + q. So the first thing we do is create an array of size p + q in array D. It is initially
empty. Then we keep doing the following thing. So what is the minimum value among all the values
stored in the arrays B and C? Well, it is easy to find. We know that the first element in the array B is its
smallest element, and the first element in the array C is its smallest element. So the smallest one
among these two is the smallest element inside the unit of these two arrays. So we just find the
minimum of these first elements and move it from one of these arrays to the results in array D, and
forget about this element completely. Now what is left is essentially the same problem. We're left
with two sorted arrays, and we still need to merge them. So we do it exactly the same. We take the
first two elements, we compare them and move the smaller one to the resulting array. And we keep
doing this while both of these arrays are empty. I mean, we need this to be able to take their first
elements. When one of them becomes empty, we just copy the rest of the other array to the resulting
array D. I mean, where rest to the resulting array D. Well, it is not difficult to see that this procedure is
correct, and the trying time is p + q, namely, the size of the array p plus the size of the array q. And
this just because we just can both of these arrays from left to right in the run of this merging
procedure. This is how sorting our initial array of size eight by the merge sort algorithm looks like. So
the merge sort algorithm first splits the initial array of size eight into two arrays of size four. Each of
these arrays of size four in turn is split into two arrays of size two, and each of them is split into two
arrays of size one. Then merge procedure starts merging these arrays of size one into arrays of size
twos and into, then these arrays of size two into a size four. And finally, it merges the result into
arrays of size four, into the resulting array of size eight. We are now going to prove that the running
time of the merge sort algorithm, on a sequence containing n elements, is big O of n log n. Know that
this is significantly faster than a quadratic selection sort algorithm. For example, it is perfectly okay to
sort the sequence of size 1 million, for example, 10 to the 6th, on your laptop using merge sort
algorithm. While for the quadratic time selection sort algorithm, sorting a sequence of size 10 to the
6th, 1 million, will take roughly 10 to the 12th operations, which is too much for modern computers.
Okay, so to prove this lemma, to prove the upper bound on the running time of the merge sort
algorithm, first know that to merge two parts of size n over 2 of our initial array, takes the linear time.
Namely, big O of n, because while the left part has size n over 2, the right part has size n over 2. And
for merging, we basically just combo these parts from left to right. So it takes just a linear amount of
work to do this. Which, in turn means, that if we denote by T of n the running time of our merge sort
algorithm, then it satisfies the following recurrence. T(n) is at most 2T(n / 2) + big O(n). Here 2T(n / 2)
could response to two recursive calls. So we denote it by T(n), the running time of our algorithm on
input of size n. So when we sort two sequences of size n / 2, we spend time twice T(n / 2). So the big
O of n term corresponds to what we do before we make recursive calls and what we do after
recursive calls. So what we do before is just split the input array into two halves. What we do after is
merging the results of two arrays into one array of size n. So it is not difficult to see that all of this can
be done in linear time. So we get this recurrence, and on the next slide, we're going to show that this
recurrence implies that the running time of our algorithm is bounded from above by n log n. To
estimate the running time of this algorithm, let's consider its recursion tree. Namely, at the top of this
tree, we have one array of size n. So for this array of size n, we make two recursive calls for arrays of
size n over 2. Each of these arrays of size n over 2 in turn is split into two arrays of size n over 4. So we
get four arrays of size of n over 4 and so on. So in this tree, we have log n levels. Now let's estimate
the work done at each of the levels of these three separately, namely, once again, to solve a problem
of size n. To sort an array of size n, we first prepare to make recursive calls. In this case, we just split
the array into two halves of size n over 2. Then we do make recursive calls, and then we need to
combine the results. So all the work now inside recursive calls will be accounted for on the lower
levels of this tree. So now what we are going to do is to account for only the work done before the
recursive calls and after the recursive calls at each separate level. And we know already that it takes
linear time to do this. I mean, if we have an array of size n, it takes linear time to split it into two
halves. And then it takes linear time to combine the results of recursive calls into one array. So let's
just denote this time by cn, I mean let's denote the hidden constant inside big O by c. Then what we
can say is that on the top level we spend time cn. Then on the next level, for each subarray, we spend
time c times n over 2, because the size of array is n over 2. However, we have 2 arrays, so the total
work that we do at this level is 2 multiplied by c, multiplied by n over 2, which is again just cn. On the
next level, we spend time 4 because we have 4 arrays multiplied by c, multiplied by n over 4, because
the size of the array is now n over 4. This is a cn again, and so on. So we have log n levels. At each
level, we do roughly cn operations. So the total number of operations in our algorithm is cn log n,
which proves our lemma. So again, what we've just proved is that the running time of the merge sort
algorithm is big O of n log n. So in the next video, we will show that actually no algorithm, no
comparison based algorithms, to be completely formal, can sort a given sequence of n elements
asymptotically faster than in n log n time. Which actually means that the merge sort algorithm is
asymptotically optimal.
Lower Bound for Comparison Based Sorting
Need to understand
In the previous video, we proved that the running time of the Warshall
algorithm on a sequence consisting of n elements is big log of n.
In this video we will show that this bond is essentially optimal.
We will do this by showing that any correct sorting algorithm
that sorts an object by comparing pairs of them.
Must make a clear stand log in operation such as particularly in the worst case.
Once again we say that the sorting algorithm is comparison based
if it sorts the given objects just by comparing pairs of them.
We can imagine the following situation, we have add objects, that look the same, for
example in walls, but have different weights.
And we also have pen balance.
And this pen balance is, our only way to compare pairs of these balls.
And our goal is to rearrange these balls, in order of increasing weights.
So, for example, the two source and algorithms that we'll already consider it,
namely the selection sort algorithm and the merge sort algorithm are both
comparison based algorithms.
So for example, the selection sort algorithm at each region
finds the minimum value in the remaining part of the array.
And it does so exactly by comparing pairs of objects, right?
Also, the merge sort algorithm is also comparison based algorithm.
So it first splits an array into halves.
And then it needs to merge the two results in arrays.
And when merging the results in arrays, it also uses comparisons, right?
So we take the first two elements of two sorted arrays.
We compare them, and based on this comparison we take one of these elements
out of one of those two arrays and put it to the result in the array.
So this as a formal statement that we're going to prove.
It says that any comparison based algorithm
that sorts an object has running time.
At least big and n log n in the worst case.
So but in otherwise we can say the following.
Assume that you have an algorithm that sorts an object
by comparing pairs of them.
It can be the case that for some given both sequences of an object,
your algorithm performs less than analog operations.
Say, linear number of operations.
However, it cannot be the case that your algorithm always sorts in time,
asymptotically less than n log n.
Meaning that, there must exist, sequence of objects, on which your algorithm
will perform at least performing a login comparison to sort such sequences.
Any comparison based algorithm can be shown as a huge tree that contains all
possible sequences of comparisons that can be made by this algorithm.
For example, here on the slide.
We show a simple algorithm that sort three object.
Three objects.
So it starts by comparing a1 and a2.
If a1 happens to be smaller than a2,
then we proceed to comparing a2 and a3.
If a2 is smaller than a3, then we already know the permutation
of the input three objects in non-decreasing order.
Namely, we know that a1 is smaller than a2,
and we know that a2 is smaller than a3.
So we can just output the following permutation.
Right.
If on the other hand, a2 happened to be at least a3,
then at this point we already know that a2 is greater than a1.
And a2 Is no smaller than a3.
So at this point, we know that a2 is the maximum element among our three elements.
However, we still need to compare a1 and a3, so we do this comparison,
and based on its result, we output either this permutation or this permutation.
Well this was the case when a1 happened to be small as an a2.
However we need also to consider the case when a1 happened to be at least a2.
So we proceed similarly in this case.
So this is just a toy example for an algorithm for
a comparison based algorithm comparing three objects, sorting three objects.
However, such a huge tree can be drawn for any comparison based health algorithm.
So at the root of this tree we have the first comparison.
And its children will label just the next
comparison that is made based on the result of the first comparison and so on.
So each internal node is labeled with some comparison.
And each leaf is labeled with a permutation of m input objects.
A simple but crucial for this argument observation is that in this tree,
we must have at least n factorial leaves.
And this is because we have n factorial different permutations of n input objects.
Where n factorial is defined to be the product of n and minus one and
minus two and so on.
So why is that?
Why we must have any possible permutation as a leaf in our tree?
Well, this is just because it is possible that this permutation is a lead output,
is the right output of our algorithm.
So for example on our previous slide, on our toy example, we have three objects and
there are six possible permutations of these three objects, and
there are six leaves in our tree.
For example one of them is 213 and it says that the second element
is the smallest one, then goes the first element, and then goes the third element.
And indeed there are cases when this is the right answer.
Right?
So when the input data consists of three objects,
such that the second element is the smallest one,
the first one is the next one, and the third element is the largest one.
Right?
So once again, you have a huge tree which carries a comparison based algorithm.
There must be at least n factorial leaves,
because each possible permutation must be present as a leaf in our tree.
So on the other hand the maximal number of
comparisons made by our algorithm corresponds to the depths of our tree.
So the depths is defined as the maximal number of edges run away
from the root to the leaf, to some leaf of our tree.
So and this is exactly the maximal possible number of comparisons
which our algorithm makes.
So now we would like to show that d must be large in our case,
must be at least be big O omega of analog n.
And we know already that our tree contains many, many leaves.
Mean n factorial is a function that grows extremely fast.
Okay so, intuitively we would like to show that if a tree has many,
many leaves, then it has a large depth.
And at least intuitively this clear.
If you have a tree of very small depths then it must just a few leaves, right?
But, we know that it has many, many,
many leaves, in fact at least ten factorial leaves.
To formally show this we need the following, we need the following estimate.
The depths of a binary tree is at least a binary algorithm of its number
of leaves or equivalently 2 to the depths is at least its number of leaves.
Well this can be proved formally, but let me just show you this informally.
Let's concede a tree for example of depth 1.
So in this case, d is equal to 1.
And it is clear that the maximal possible number of leaves
in a tree of depth 1 is equal to 2.
So now, let's try to understand what is the maximal possible
number of leaves in a depth of In a tree of depth 2.
For example, this is a tree of depth 2.
This is another tree of depth 2, it has has three leaves.
And this is a tree of depth 2 that has maximal possible number of leaves,
in this case it is 4.
It is 2 to the d indeed.
And intuitively it is clear that to have a tree of depth d that has
maximal possible number of leaves.
We need to take a tree which has a full binary tree of depth d, right?
And this tree has exactly 2 to the d leaves.
So the maximal number of leaves in a tree of depth d is 2 to the d.
Which proves that 2 to the d is at least l.
Okay, so the last step that we need to show is that if we have n factorial
leaves, then the depths of our tree is at least big log n again.
And we will show this on the next slide.
It remains to estimate log of n factorial.
We're going to show here that log of n factorial is at least c times n log n.
Which means as it works that log of n factorial is big log of n log n.
To do this, we express n factorial as a product of 1, 2, 3.
And so on n minus 1, and
then right algorithm of product of these numbers as a sum of their algorithm.
So, log of n factorial is equal to log of 1 plus log of 2 plus log of 3 and
so on plus log of n.
So, this is a sum of an object.
Let's just throw away the first half of this and elements,
and leave only the second half.
So in this second half, we have n over two elements and
each of them is at least log of n over two, right?
So this has algorithms of numbers which are at least n over two.
So we have n over two.
Elements, each of them is at least algorithms of n over 2.
This allows us to conclude that log sum is at least 10 over 2 times log of n over 2.
And this in turn be big log of n for a simple reason.
So log n over 2 is equal to log n minus 1.
Well, this grows like log n, right?
Because log n is a growing function and one is a constant so
again minus one goes as log n.
And over grows as n, right?
So, this is up to constant factors, this is just n.
So, n over two times log n over two grows like n log n.
Okay so this concludes our proof, and
this concludes the proof of the fact that any comparison based
algorithm must make at least n log n adorations in the worst case.
Once again, another conclusion is that when merged sort algorithms that we
considered in the previous lecture e is asymmetrically optimal.
In the next video we will see an algorithm that actually sorts
n given objects in time less than n log n.
Actually in time just in linear time.
In time big log of n however,
it will sort the n given objects, knowing something about these objects.
It will only sort the given objects if the subject has small integers.
And we will sort them without actually comparing them to each other.
Non-Comparison Based Sorting Algorithms

In this last video we will show that there are cases when we can sort
the n given objects without actually comparing them to each other.
And for such algorithms, our lower bound with n log n does not apply.
Well, probably the most natural case when we can sort the n given objects
without comparing them to each other
is the case when our input sequence consists of small integers.
We will illustrate it with a toy example.
So consider an array of size 12 which consists of just three different digits.
I mean each element of our array is equal to either 1, 2 or 3.
Then we can do the following, let's just go through this array from left to right.
I mean by a simple count and count the number of occurrences of 1, 2 and 3.
Just by scanning this array you will find out that 1 appears two times,
2 appears seven times, and 3 appears three times.
And this information is enough for us to sort these objects, so
we can use this information to fill in the resulting array, A prime.
So we put 1 two times, then we put 2 seven times, and then we put 3 three times.
And this gives us the resulting sorted array A prime, right?
So what just happened is that we sorted this array,
these n objects, without comparing these objects to each other.
We just counted the number of occurrences of each number, and for this we used,
essentially, the information that this array contains small integers.
The algorithm that we just saw is called counting sort algorithm.
Its main ideas are the following.
I assume that we're given an array A of size n, and
we know that all its elements are integers in the range from 1 to M.
Then, we do the following.
We create an array count of size M, and
by scanning the initial array A just once from left to right,
we count the number of occurrences of each i from 1 to M, and
we store this value in the cell count of i.
So, we scan the array A from left to right, and whenever we see
an element equal to i, we increment the value stored in the cell count of i.
Then when this array is filled, we can use this information to fill
in the resulting array A prime, as we did in our toy example.
So this is a pseudocode of the count and sort algorithm.
Here we're given an array A of size M and
we assume that all the elements of this array are integers from 1 to M.
So we introduce the recount of size M which is initially filled in by zeroes.
Then by scanning our initial array we fill in this array.
Namely, whenever we see an element k in our initial array,
we increase the cell count of k.
So after the first loop of this algorithm, we know exactly the total number of
occurrences of each number k from 1 to M in our initial array.
So for example in our toy example two slides before,
we counted that the number 1 appears two times in our initial array,
the number 2 appears seven times in our initial array,
and number 3 appears three times.
So at this point, we know that in the resulting array, the first two elements
will be occupied by the number 1, the next seven elements will be occupied by
the number 2, and the next three elements will be occupied by the number 3.
Now we would like, instead of having just the lengths of these three intervals,
we would like to compute the starting point of each interval.
So we do this in a new loop.
And for this we introduce a new array Pos.
So Pos[1] is equal to 1,
meaning that number 1 will occupy a range starting from the first index.
And the starting point for each subsequent range is computed as a starting
point of each previous range, plus the length of this previous range.
So Pos[j] is computed as Pos[j -1] + Count[j- 1].
So at this point we know the starting point for each range.
Namely, k in the resulting array, number k will occupy a range starting from Pos[k].
Then we just count our initial array and whenever we see an element,
we always know where to put it in the initial array.
So then let me remind you that we do not just fill in the array with numbers from 1
to M, but we copy elements from our initial array.
This is because what we are looking for
in this certain problem is a permutation of our initial n given objects.
Because what we have is probably not just number, not just integers from 1 to M,
but these numbers are keys of some probably complex object.
Okay, so
the running time of this algorithm can be easily seen to be big O of M plus M.
This is just because here we have three loops.
So the first loop has n iterations, the second loop has M iterations,
and the last loop also has n iterations.
Well, so, this is the formal statement.
The running time of count and sort algorithm is just n + M.
And the final remark about this algorithm is that if M grows no faster than n,
namely, for example, if our array is filled by integers from 1 to n, or
if this array is filled just by integers which are upper bounded by some constant,
then the running time of our count and sort algorithm is just linear in n.
I will now summarize the last three videos.
So we first covered the merge sort algorithm.
So this is a divide and conquer based algorithm that proceeds as follows.
Given an array of size n it first splits it into two halves,
both roughly equal size, then it sorts them recursively and
then it merges them into the resulting array.
We then, and we showed that the running time of this algorithm is big O(n log n),
which is quite fast actually.
Almost teeny.
We then showed that no other comparison based algorithm can sort n given objects
asymptotically faster than an n log n.
So we did this by showing that any comparison based algorithm
must distinguish between too many cases.
Between n factorial possible permutations.
For this, in the worst case, a comparison based algorithm
must perform at least big O(n log n) interpolations.
We then showed that it can be actually done faster and in certain problems,
can be solved in time less than n log n, in some cases.
For example, in the case when our input array contains small varied integers.
Resources
Slides
Reading
Merge sort and lower bound for comparison based sorting: Section 2.3 of [DPV08]

An elementary introduction to sorting and selection sort at Khan Academy
Visualizations
Comparison based sorting algorithms by David Galles
sorting-algorithms.com
References
Sorting
Quick Sort
Quick Sort

Video: LectureOverview
2 min
Video: LectureAlgorithm
9 min
Video: LectureRandom Pivot
13 min
Video: LectureRunning Time Analysis (optional)
15 min
Video: LectureEqual Elements
6 min
Video: LectureFinal Remarks
8 min
Reading: Resources
10 min
Practice Quiz: Quick Sort
4 questions
Hello, and welcome to the next lesson in the Divide-and-Conquer model. This lesson is going to be
devoted to the quick sort algorithm, which is one of the most efficient and commonly used in practice
sorting algorithms.
Well, as usual, we start with the overview of this algorithm. The algorithm is comparison based,
meaning that it sorts the n given elements by comparing pairs of them. Its running time is also
asymptotically n log n, but not in the worst case, as was with the merge sort algorithm, for example,
but on the average case. This is because this algorithm is randomized, so it uses random numbers to
sort the given n objects. Well, we will explain later in this lesson what this means. Finally, as I said
before, this algorithm is very efficient in practice and, at the same time, not so difficult to implement.
This is a toy example explaining the main idea of the quick sort algorithm. So given an array, in this
case of size 11, let's take its first element. In this case it is 6. And let's do the following. Let's rearrange
all the elements in this array such that the element 6 stays in its final position. All the elements that
go before it are actually at most 6. And all the elements that go after 6, after this element, are greater
than 6. Well, we will show that this can be done by a single scan of the initial array. This is how the
resulting array looks like. So once again, 6 stays in its final position. All the elements before it are at
most 6. All the elements after it are greater than 6. So we do not need to move 6 anymore. It is
already in its final position. So what remains to be done is to sort all the elements that go before 6
and all the elements that go after 6. And this can be done just with two recursive calls to the same
algorithm, to the quick sort algorithm. So we do this, and immediately after these two recursive calls,
we have a sorted array. Well, in the next video we will explain all the details of this algorithm.
In this video, we'll provide the full outline of the Greek word algorithm.
So as you remember that algorithm is recursive, for
this reason we pass through this procedure [INAUDIBLE] a and
also doing this is l and r in this array for left and right and
this procedure saw the sub array inside r within this is form l to r.
Well we first check whether l is at least r.
And if yes then this means that they can respond in sub array contains at
most one element.
And this in turn means that nothing needs to be done so we just return.
Otherwise we call the partition procedure with the same parameters.
It returns an index m between l and r.
So it rearranges all the elements inside
this sub array with the following property.
After the call to this procedure, A of m stays in its final position,
meaning that all the elements to the left of an element A of m are at most A of m.
And all the elements to the right are greater than A of m.
Well once again, after the call to the partition procedure,
A of m stays in its final position.
So what remains to be done is to sort all the elements that are at most A to m.
They stay to the left of A of m, and all the elements that stay to the right.
So we do this just by making two recursive calls.
So this is how the wall algorithm looks pictorially.
Again, we are given an array A, with two indices l and r,
and we are going to sort the sub array inside from indices L to R.
So with first call the participation procedure which parameter A, l and r.
And it gives us an index m between l and r was the following property.
All elements to the left of them are at most the element A of m.
All the elements to the right are great as an A of m.
Then we make two recursive calls to sort the left part within this is from
l to m- 1 and to solve the right part within this is from m + 1 to r.
And immediately after these two recursive call, we have a sorted array.
So before showing the actual of the partition procedure,
we explain it's main ideas again, on a toy example.
So first of all, we will take is the element A[l] and denoted by x.
This will be called our pivot element.
So what pivot is exactly is the element with respect to which
we're going to partition our sub array.
So x will be placed in its final position.
So our goal now is to rearrange all the elements inside our current sub array so
that x stays in its final position and all the elements to the left of x.
At most x and all the elements to the right of x are greater than x.
So we will do this gradually increasing the region of already discovered elements.
So for this we will use a counter i, and we will maintain the following invariant.
So I will go from l + 1 to r, and
at each point of time when we have already have the i element
we will keep to region in sizes these region from l + 1 to i.
In the first region from l + y to j, we will keep all elements that are at most x.
In the second adjacent region within this is from j +
1 to i we will have all elements that are greater than x.
Let's see it for example.
So I assume that we are somewhere in the middle of this process.
In this case, x is equal to 6, and
we need to partition all the elements with respect to x.
We already have two sub regions so in the red region,
we keep all elements that are at most x.
There are at most 6 in the blue region we have holds elements that
are greater than 6.
Okay, now we move i to the next position and we discover the element 9.
So this element is greater than 6, so
we just need to extend the second region, the blue region.
The region of elements is at greater than 6.
So in this case we just do nothing.
Well the next case is more interesting, we move i to the next position, and
we discover the element 4.
In this case, we need to somehow move this element to the red region,
to the region of elements which at most 6.
So to do this we just swoop it to currently
first element of the blue region, in this case was 9.
So if we do this 4 will be the last element of currently red region and
9 will go to the blue region.
So we do this and now, we increase also the just
to reflect the fact that our red region had just been extended.
Then we will find to the next element so we discover element 7 which is
greater than 6 which means that we can just extend the blue region,
then we discover another element which is 6.
6 is at most 6 and it is actually equal to 6, so
we need to move it to the red region.
Again, we swap it with the first element of the blue region and
then we extend the red region.
We increase g to reflect the fact that the red region has just been extended,
then we discover another element, which is at most 6.
We move it to the end of the red region.
And finally, what we also need to do in the very end is to move the pivot
element which is 6 in this case to its final position.
And its final position actually can easily be found in this case.
So we have red region and we have blue region.
In red region, all the elements are at most 6, and in blue region,
all the elements are greater than 6.
So we can just swap 6 with the last element of the red region.
In this case it is 1, so if we swap these two elements then you
can see that all the elements in the blue region are indeed greater than 6.
All the elements in the red region are smaller than 6.
So we are done with this partition procedure.
Where now ready to present the Soutacot of the petition procedure.
We called it what we're going to do is to place some element x,
which is called the pivot, into it's final place so that all the elements before
x are at most x and all the elements after x is greater than x.
So as the pivot element in this procedure, we are going to use just the first
element of the correspondence of rate, so x is assigned A of l.
We're also going to remain the following subregions.
So first of all, we will readily increase the region of discovered elements.
So i goes from l +1 to r and inside this region of
[INAUDIBLE] elements, we will maintain two sub regions.
In the first region with indices from l +1 to j,
we will keep all of the elements at most x.
In the second region with indices from j+1 to i, we will keep all of the elements
that are greater than x and we will gradually and freeze the value of i.
So when i is increased, so I assumed that i has just been increase so
we discovered a new element of A of i.
So if A of i is greater than x then just the second
of region of elements that are greater than x,
is extended automatically and we do not need to do anything in this case.
However, if the newly discovered element is at most x,
then we need to move it to the first region.
So we do this as follows.
So we just increase the value of j to indicate the fact that
the first region has just been increased, and then swap the elements.
A[j] and A[i], so this way,
we just maintain our invariant each time when i is increased.
So in the very end, when i reaches the value of r,
we also need to place our initial element that states that at
the beginning our pivot between our two regions.
So for this we just swap elements A[l], so this is our pivot with element A[j].
And we then return the value j as an index of our pivot element.
Random Pivot
Now when the algorithm is present, we need to estimate the running time.
For the Quicksort algorithm, the running time analysis is a little bit tricky.
So before stating, and proving the theorem about it's running time,
let's be allowing partition.
First of all, let's consider a pathological case, when somehow,
it always happens that we select
the minimum value of our current subarray as our pivot element.
Well in this case, well let's see what happens with our current subarray.
Say of size n.
And we select its minimum value as a pivot.
And partition, the subarray with respect to the sum.
Since this is the minimum value,
it's final position is just the first position, is the resulting array, right?
Which means that we partition into two parts.
The first part is just empty, we have no elements smaller than our pivot.
And the second part, contains n- 1 elements,
because all the remaining elements are greater than our current element.
Okay, so in this case, if this happens at this iteration,
I mean at this call to partition procedure,
then we can write the running time of our Quicksort algorithm,
satisfies the following relation T of n is equal to n plus t of n minus one.
The term n here, the response to the running time of the petition procedure.
Well, it is actually big often.
But just to simplify let's put n here.
Let me also recall you that, well if we have an array of size n,
then the partition procedure indeed works in time,
big often, because it just becomes the subarray?
So now let's see what is the solution for this recurrence relation.
Well, we can just unwind this recurrence relation term by term.
So we have n plus T of n minus 1.
Let's replace T of n minus 1 by n minus 1 plus T of n minus 2.
Then we replace T of n minus 2 by n minus 2 plus T of n minus 3.
And we keep doing so.
So what is left is the following sum, n + (n- 1) + (n- 2) and so on.
So what we know is this sum already, so this is arithmetic series.
And we know that it grows quadratically.
Which give us something strange,
I mean our Quicksort algorithm works in quadratic time.
Which means that it is not quick, actually, right?
We'll resolve this issue later.
Now, let's consider a slightly different case.
Assume that somehow, we always partition into two parts,
such that one of them has size, for example, n- 5.
And the other one has the size four.
Well I claim that even in this case.
First of all, note that both these cases correspond to very unbalanced partitions.
In the first case, we have two parts one of size 0 and one of size n-1.
In the second case, we have two parts one of size five.
And one of size four and one of size n minus five.
So the size of stuff parts are very unbalanced.
They are very different.
Okay, so I claimed that in this case the running time,
is also going to be quadratic.
And this can also be shown, just be unwinding this recurrence relation.
So let's just throw away this T(4), and leave only T(n- 5).
Okay, so T(n) is equal to n plus T(n-5).
Let's replace T(n-5) with (n-5)+T(n)-10.
Let's then replace T of n minus ten with T of n
with n minus ten plus T of n minus 15 and so on.
So this leaves us with the following sum.
N plus n minus five plus n minus ten and so on and
this is also an arithmetic progression.
The only difference with the previous arithmetic progression is that now,
we have step five.
The difference between neighbors is five, but not one.
Well, still, this arithmetic progression has a linear number of terms.
Which means that it sums rows quadratically.
With the only difference that the hidden constant inside this set up is
smaller than in the previous case.
Now let's consider another pathological case.
Assume that it somehow so happens for some unknown reasons
that at each iteration at each call there's a partition procedure.
It partitions the array into roughly equal sizes.
Well in this case we can write the following reference relation
on the running time of our algorithm.
T of n is equal to T of n over 2 plus the linear term.
And we know this reference already.
It is exactly the reference the running time of the satisfies.
Right?
And we, proved that in this case t of n grows as n increases vertically.
Let me remind you, how we prove this.
We analyzed the.
So, in this three of the route we have one array of size n.
At the next level we have two arrays of size n over two n,
at the next level we have four rays of size n over four, and so on.
So the height of this tree is log base two, well it is basically logarithmic.
At each level the sum of the sizes of of full arrays is equal to N.
So we have array of size N at the top, two arrays of size N over two
at the next level, and four arrays of size N over four at the next level,
the size is still N, and so on.
At each level we spend a linear amount of work.
This is essential.
We spend a linear amount of work at each level, and we have a logarithmic number of
levels, which means we spent an N log N in time total.
Okay, let's consider another, again very pathological case.
I assume that we alway split an array of size n into two parts.
One of size n over 2, n over 10.
I'm sorry.
One of size 9n over 10.
So in this case the recurrence is the following.
T of n is equal to T of n, over 10.
Plus T of 9N over 10 plus a linear term.
I claim that even in this case we can prove [INAUDIBLE] again on
the running time of how well and this is how we can do this.
Well, lets again consider the [INAUDIBLE] because the [INAUDIBLE] of [INAUDIBLE]
algorithm.
In this case, it is not balanced.
Right?
Because when we go to the left branch,
we reduce the size of the current subproblem by 10.
And when we go to the right branch, we reduce the size of the current subproblem
only by factor of 10 divided by 9.
Which means that in our 3, the size of the left branch is
of the left most branch, is actually log based ten.
Of n while is the size of the right most branch is log based ten over nine over m.
Well, still the height of this of this three is logarithmic.
But the previous case is that nouns are based on the algorithm is different but
it's still constant.
It is log based, log based 9 of m.
And also, but also, the previous property is true.
The sum of the sizes of all arrays at each level is still equal to n.
It is at most n, actually.
At the root we have one array of size n.
At the next level we have two arrays, one of size n/10,
and the other one is of size 9n/10.
Right? So the size is still n.
At the next level it is the same, and so on.
So we have a logarithmic number of levels, and at each level we spend a linear amount
of work which gives us an n log n upper bound once again.
Okay, all this analysis of what about only pathological cases
if we always split in a balanced way or in an unbalanced way.
In reality, or just when we run a Greek algorithm on some array,
well some of the partitions are going to be balanced.
Some of the partitions are going to be unbalanced.
So will still do not know what is the actual running time
of the Greek algorithm.
We still need to determine this.
However, we already get an important message.
So running time of Algorithm of the Greeks are depends
on how balanced our partitions.
What we know know is the following, if all our politicians are balanced does
that make improved that the running time is at most n log n hypothetically.
At the same time if all of our partitions are unbalanced
then the running time is quadratic.
This means that we would like to have a way of selecting a pivot
element such that it always guarantees a balanced partition.
At the same time it is not clear at all how to do this.
How to guarantee that we can always peek quickly.
The pivot element with respect to this pivot,
the rate is partitioned in a balanced way.
So instead of this we will use the following elegant solutions, so
let's just select the pivot element from the current subarray randomly.
To implement this solution, we do the following.
Before following the partition procedure.
We just select a random index between l and m, and
we swap elements A[l] and this random element.
Okay, then we call partition, and then we proceed in a usual way.
Let me explain intuitively why selecting a random partition is going to
help us to prove a good upper bound on the running time of the Quicksort algorithm.
Well, for this, consider array A's that we're going to partition
with respect to random p and consider it sorted version.
Assume for the moment that all the elements inside our array are different.
In the sorted version, consider the middle half elements.
Well we can see that n/2 elements that stay exactly in the middle.
Well an important property of all these elements is the following: for each
of these n/2 elements there are at least n/4 elements that are greater than them.
And at least n over four elements that are smaller.
Well this means that if we select any of these elements inside our array a,
then the partition with respect to this element is going to be balanced.
Right?
In both parts there will be at least n over four elements.
Well these n over two elements stay somewhere in the initial array.
So they stay in the middle in the sorted array and
they stay somewhere in the initial array.
It doesn't matter for us.
What is important for us is that there are at least n over two
elements with respect to which the partition is going to be balanced.
Which means that with probability one half we will have a balanced partition.
And this happens to be enough to prove it with upper bound.
So we're going to show that the randomized Quicksort algorithm is actually very fast.
Well first of all it is fast in practice and
we will prove it's theoretical analog out upper bound when it's running time.
This is a formal statement of an upper bound on the running time of the Quicksort
algorithm that we are going to prove in the next video.
So I assume that we are given an array A, and assume for
the moment that all the elements of this array are pairwise different.
Then the average running time of the Quicksort algorithm
on this array consisting of n elements, is big o of n log n.
While the worst case running time of this algorithm is n squared.
Well let me explain the word on average here.
Well, this means that for any fixed array.
So if we are very unlikely with the random beats,
the running time potentially could be higher as an algorithm.
However, on average, and average is over all possible random beats.
The running time of the QuickSort algorithm is n log n.
And this is true for any input.
So this theorem doesn't say well, for Quicksort algorithm.
For some arrays, the running time is large, for
some arrays the running time is low.
But on average, the running time is good enough.
So it says that for any fixed rate, the average running time is then n log n.
Okay, so we are going to prove this theorem in the next video.
Running Time Analysis (optional)

Help Us Translate
You may navigate through the transcript using tab. To save a note for a section of text press CTRL + S.
To expand your selection you may use CTRL + arrow key. You may contract your selection using shift +
CTRL + arrow key. For screen readers that are incompatible with using arrow keys for shortcuts, you
can replace them with the H J K L keys. Some screen readers may require using CTRL in conjunction
with the alt key
In this video we are going to prove formally an upper bound on the running time of the randomized
quick sort algorithm. Namely we're going to prove that the running time on average of the
randomized quick sort algorithm is n log n. That is it must be go for n log n. And that is, that, in the
worst case, the running time is big O of n squared.
Well, before going into the details of the proof. Let's again, intuition. First of all, let's know that what
we need to estimate is the number of comparisons. Why is that? Well because a quick sort algorithm
contains of first the call to the partition procedure and then to recursive calls. Each of these two
recursive calls is going to be unwinded into
the call to partition procedure and another to small recursive calls. So what is going on inside the
quick sort algorithm is essentially many calls to the partition procedure. While inside the partition
procedure, what we do actually is to compare all the elements of the curve and separate who is the
pivot element, right? So what we estimate is a total number of comparisons. Which is not surprising
because the quick sort algorithm is a comparison based algorithm.
Well, now let me also explain why balanced partitions are better for us. I will explain this intuitively.
So consider this story example. This array of size seven. Assume that we selected one as the pivot
element. So we partitioned the rate is shown here on the left. Now let's see what we now know. We
can pair it all the elements in this array to the pivot element which is one in this case. So now we
know that one, the final position of the pivot element is just the first position in the array. So we
known that 1 is the minimum value. However, we know nothing about pairs of other elements, right?
So we only learn that 1 is the minimum value. Now consider another possibility, consider the
following balanced partition shown on the right. So assume that we selected 4 as the pivot element.
I claimed that in this case, this partition is much better for us because we saved many subsequent
comparisons. So look, in this case, in the subsequent trends of the partition precision, we are not
going to compare elements 3, 1, 2 with element 6, 5, 7. Because we already know that all the
elements to the left are for a. As in all the elements to the right are formed. Well the left part will stay
in a separate recursive code and the right part well stay in a separate recursive code. So once again,
balanced partitions save us a lot of comparisons that we do not need to make in the subsequent calls
to partition procedure. Another thing I would like to discuss with you before growing and know
details of the proof is the following. Our algorithm is randomized, so its running time and its number
of comparisons depends on the random beats used inside the algorithm,. In particular for any two
elements there is some probability that they are going to be compared.
And using this toy example shown on the slide, I would like to just build an intuition on how to
estimate this probability on which factors this probability depends. So consider this small example. So
this is an array of say it's nine containing all the digits. And I would like to estimate the probability that
elements 1 and 9 are going to be compared if we call randomized quick sort physics already. So, let's
see what happens. Assume that in the very first quarters of partition procedure, we select the
elements 3 for example as the pivot element, so what happens? In this case, 1 will go to the left of 3
and 9 will go to the right side. To the right of three, I'm sorry. So in this case 1 and 9 will be in
different parts and they will never be compared in
as a partician procedure just because they are already in different parts. Okay, for the ways it means,
that we already know that 1 is smaller than 9, because 1 is smaller than 3, and 3 is smaller than 9,
right? We do not need to compare them.
Well then this happens if we select as our pivot element, any of the elements, 2, 3, 4, 5, 6, 7 or 9, 8,
I'm sorry. If on the other hand we select 1 and 9 as our first pivot element, then 1 and 9 will become
pivot. Just because, well, if we select, for example, 9 as the pivot element, we can pivot with all the
elements of our array, in particular with 1. So there are two cases when 1 and 9 are compared. And
this is how exactly the case is when either 1 or 9 are selected as a first pivot. In all other seven cases
there are not compared. This means that the probability that they are compared are 2 over 9.
Okay, makes sense?
Now let's try to estimate the probability that the elements three and four are compared. Well I
claimed that in this case this probability is equal to 1.
And the explanation is the following, there is no element inside RRE that can help the randomized
weak sort algorithm to understand that 3 is smaller than 4 without comparing them. I mean for 1 and
9, there are seven such elements. All they are is elements. I mean, if we partition with respect to any
of the elements, we already know that 1 is smaller than 9 because they go to other parts. Different
parts with respect to this pivot. For three and four there is no such element. So algorithm just must
compare these two elements to be sure that 3 is smaller than 4. So in this case the probability is 1.
Well this shows that the probability of comparing two elements depends on how close they are in the
sorted array. In particular if they're very far apart of each other than the probability is small and if
they are close to each other than the probability is high. We will use this observation in the formal
proof of our statement. We now start to formally prove an upper bound on the running time of the
randomized quicksort algorithm. For this, we introduce the following random variable. Let i and j be
different indices from 1 to m. We define Xi of ij to be equal to 1 if two elements, A'[i] and A'[j] are
compared in the [INAUDIBLE] quick sort algorithm and to be equal to 0 otherwise.
Once again, to estimate the running time of the quick sort algorithm, we would like to estimate the
total number of comparisons made, so we would like to estimate, for any pair of elements, what is
the probability that they are compared? As we discussed on the previous slide, the probability that
two elements are compared depends on how close they are in the sorted version of our array. For this
reason, we define c of ij dependent on the sorted array. We do not have this sorted array, right? We
are only constructing this in the quick sort algorithm but we use it just for the analysis, okay?
The next thing to note is the following. For any two elements of our initial array assorted array
doesn't matter, so for any two elements they are either compared just once or they are not compared
at all. So, why is that? Why just once? Well, if two elements are compared at some point, this means
that at this point one of these elements is because in the partition procedure we can put a with all of
the elements of the current summary. So, if two elements are compared that one of them is a pivot.
This also means that right after the call of this partition procedure, we are not going to use this pivot
element. We will put the pivot element into its final place, and we are not going to touch it in any of
the subsequent calls. This immediately implies the quadratic upper bound on the worst case right in
time with final algorithm.
Once again we have quadratic number of possible pairs of element, and each pair of element is as it
compared once or not compared at all. Right so right in time with the worst case is quadratic.
Now comes the most important observation of this proof. I claim that the elements A'[i] and A'[j] are
compared if and only if the first pivot selected in the subrange of the solitary a prime from either side
to index j is either a prime of a, of i, or a prime of j.
Well let's see why.
First of all, when we select it pivot the random pivot which is not in their sub range, and then all the
elements from this sub range in this sort of element goes either to the left or this to the right. So, they
all stay together in the same branch of three, okay. So before we select a pivot which stays inside this
range, all these elements stay together in the same sub-array. Now, assume that we selected a pivot
from this sub-range, and assume that it is not A'[i] or A'[j], for example. In this case a prime of A and a
prime of J will be splitted apart. They will go into different parts with respect to this pivot, right? At
this point I must humor that all the elements in my summary are different, and in duality are
different, okay? So once again, if the first selected element from this subrange is not prime of A or a
prime of j then these two elements are not going to be compared. Because right after the partition
procedure uses this pivot from this range A prime of a and A prime of j will go to different parts, right?
If, on the other hand, the first selected pivot from this subrange is either A prime of a or A prime of j,
then these two elements are going to become paired, right? So this is the most important observation
in this proof. Everything else is just calculations. So if this is clear, let's then estimate the probability
that second respondent to elements are compared. So we know that they're compared if and only if
the first selected Pivot in this sub range is one of these two elements. This helps us to estimate the
probability of
not the fact that c of i j is equal to one. Well this is equal to two. I mean because we have only two
choices. I mean either a prime of a, or a prime of j divided by the total number of choices, I mean the
total number of elements in this subrange. And this is j minus i plus 1. So the probability that Z of ij is
equal to 1 equals 2 divided by g minus i plus 1. For example, if j and i differ by 1. So j is equal to y plus
1. So neighboring element in the. Then this probability is equal to 2 divided by 1 plus 1. This is 2 by 2,
this is 1. [INAUDIBLE] Just reflects the fact that if there are two neighboring elements inside. This
sorted array, then the algorithm just must compare them, to understand that one of them is smaller.
There is no other element that can help our algorithm to realize that one of these element is smaller
than the other one, okay.
This in turn helps us to estimate the expected value of this random variable. So recall that if we have a
random variable, which takes only values zero and one, then its expected value is one multiplied by
the probability that it takes value one plus zero multiplied by the probability that it takes the value
zero. Well zero multiplied by something is zero. So what is left is just probability. That it takes
multiplied by one. So the expected value of CIJ is equal to 2 divided by g minus i plus one.
The final step in our proof is estimating the sum
random variables to see they all possible I and J. So, once again the expected value of average value
of the sum of the number of comparisons made is the expected value of the sum of all possible x
adjacent, or for all I Js. So the expected value of their sum is the sum of their expected values. So we
can write the following. The average running time is equal to the sum overall possible different of the
expected values cij, and we know this expected value already. So this is a some overall possible
different ij, in where j is greater than i of 2 divided by g minus i plus one. Well we can take this
constant two out, and consider all the possible. And consider a fixed i. For this i what we have j ranges
from i+1 to n. So what we have for the specified time in this sum is a subset of the following sum,
1/2+1/3+1/4 and so on. And this is actually a known sum. This is called harmonic series and it is
known that it grows arithmetically. Once again, 1 over 2 plus 1 over 3 plus, and so on, 1 over n, is
theta of logarithm of n. Well, this means that, for each they correspond in sum, which ranges over all j
from i plus 1 through n, grows, at most, logarithmically. This means since we have m traces for i from
one to m that the grows vertically as m. Okay, and this concludes our proof.
Equal Elements
In this video we address the issue of equal elements in the. So recall that we proved the upper bound
on the running time of the render Greek algorithm, in the assumption that all the elements inside the
given array are prioritized different.
And actually, we used essentially these assumptions. So, recall that we estimated the probability that
two elements, A prime of I and A prime of J are comparative. And we argued that if any element
between them is selected, the appearance is that they will not become period. However, if they are
equal, so if A prime of A is equal to A prime of J, this means actually that all the elements in this range
are equal. So if we select any element inside this range, in the middle of this range, as a pivot, then it
is not true that these two elements will go into different parts with respect to this element, this is just
because all of the elements inside this range are equal, which means that if we partition with respect
of this element, all of the elements in this range will be [INAUDIBLE] this element. So, they all will be
in the left part with respect to this element, okay? So our analysis doesn't work for equal elements
but let's see what happens in real life. What happens if we run the greek sort algorithm for the array
in which there are equal elements. For this, let's use the following online visualization. This
visualization shows how different selection, different certain algorithms that are formed on different
datasets. So there are eight certain algorithms here where we are now interested in this QuickSort
algorithm. And there are four different types of datasets. So the field datasets is just random
sequence. The next one is in sorted sequence, the next one is a reversed sequence and the next one
which is most interesting to us at the moment is a sequence that contains a few unique elements. So
let's see how the greek sort algorithm performs on the last dataset. So for this let's just run all the
algorithms, on all data sets. So let's see what happens here. So you, you may notice now that, for
example, have already sorted everything. And while greek sort have just finished to sort the last, the
last data set. So Greek sort is not, is not so fast on data sets that contains few unique elements and
this is why. So just consider a dataset that consists of elements that are all equal to each other. So all
elements are equal to each. This means that the selection, the partition procedure always
partitions the array with respect to the element x, right? And then in this case, one of the parts,
namely the part of the elements that are greater than x, is just empty. It has size zero. And the other
part has size n minus one. So the records and equalities, the records equalities on the running time of
how a algorithm on such a data set always satisfies the following relation, T of n is equal to T of n
minus 1 plus a linear term plus T of 0. And we know already, so this is an unbalanced partition. We
know the responds to the quadratic right in time so, which means that the running time of the quick
sort algorithm a very simple array. So it contains all the elements of this array are equal. Which means
that actually this array is already sorted. In this array our quick sort algorithm spends a quadratic time
to sort it to overcome this difficulty we'll do the following. Instead of partitioning our rate into two
regions. Namely these regions contain all elements that contain all x and all elements that are greater
than x. We are going to partition into three parts. The corresponding partition procedure is usually
called three-way partition. Formally, it returns two indices. m1 and m2, such that, all the elements
inside the region from m1 to m2 are equal to x. All the elements to the left of this region are smaller
than x. All the elements that are to the right of this region are greater than x.
So this is how it looks pictorially. We have three regions. So, from l to m1 minus 1, we have all
elements that are smaller than x. In the region from m1 to m2, we have all elements that are equal to
x.
In the region from m2 plus 1 to r, we have all elements that are greater than x. This procedure
actually can be implemented in a similar way to their regional partition procedure. It can be
implemented ties with a single kind of area with maintaing their regions or it can be implemented
with two counts. So we first split our rate into regions which contain elements of most x or greater
than x and then we split the region into two parts.
Well, this is how the modified randomized quick sort algorithm is going to apply.
So we just replace it, the cold partition procedure by a cold two partition suite procedure. Now we
have three regions, and, actually the middle region is in its final place, so we do not touch it after the
partition procedure. We make two recursive calls to the first region and to the last region. So, let's see
whether the resulting algorithm is indeed Greek. And for this, let's use the same visualization. The
resulting algorithm is shown here in the last column. Let's, once again, run all the algorithms and see
what happens in the last column.
Well we see that now, the results in Greek algorithm, is indeed Greek.
In this last video of the Quicksort lesson, I would like to address two implementation issues. So the
first issue is about space complexity of the QuickSort algorithm. So on one hand, when sorting an
array by a Quicksort algorithm, we do not use any additional space. We just partition the array and
with small elements inside the array. On the other hand, the QuickSort algorithm is a recursive
algorithm. And when we make a recursive call we store some information on this tech. Right? So on
one hand it is possible to show that the average recurrent depths is logarithmic. Meaning that we
need only a logarithmic additional space. On the other hand, there is a very nice and elegant trick that
allows to re-implement the QuickSort algorithm, such that it's worst case space complexity is at most
logarithmic.
So for this, let's recall that the QuickSort algorithm contains of the call to the partition procedure and
then of two recursive calls.
So the situation when we have a recursive call is and, if the procedure is called tail recursion. And
there is a known way to eliminate such a recursive call. Namely, instead of making this recursive call,
let's just update. Well, in the second recursive call, we sort the right part of our array. I mean, the part
from index n+1 to index r. Instead of making this recursive call, let's replace the with a while loop,
inside this while loop we call the partition procedure as shown on the slide. Then we make a recursive
call to the left part, but instead of making the recursive call for the right part, we'll just update the
value of l to be equal to m+1. And then we go to the beginning of this while loop, and this essentially
mimics our recursive call.
So far so good. We've just realized that we can eliminate the last recursive call. At the same time let's
also realize the following thing.
In our QuickSort algorithm we first call the partition precision, then we make two recursive calls. And
these two recursive calls are in a sense independent. Well it doesn't matter which comes first, right?
So they do not depend on each other. This means that we can as well eliminate a recursive call
through the first part. Well, and this in turn means that we can always select which one to eliminate.
And for us, it is better to remove a recursive call to a longer part. And this is why, if we always make a
recursive call during the rate which is shorter, then we make a recursive call during the rate which is
at least twice shorter than the initial already, right? And this in turn means that the depths of our
recursion will be at most logarithmic. Because well, the first recursive call is made for an array of size
of at most n over 2, then at most n over 4 and so on. So the depth is logarithmic, which is good. And
this can be implemented as follows. So we first call the partition procedure. It gives us a value of m. At
this point, we know the length of two parts. And we just compare them. If, for example, the lengths of
the first part is shorter, then we make a recursive call to this part. And instead of making the recursive
call for the second part, we just update the value of l. In the other case when the right part is shorter,
we make the recursive call for this part, and instead of making the recursive call for this part, we'll just
update the value of r. Right? So overall this gives us an implementation of the QuickSort algorithm
which uses in the worst case an additional logarithmic space. So the next implementation issue
concerns the random bits used by our algorithm. So I assume that we would like to have a
deterministic version of our randomized QuickSort. And this is a reasonable thing to want because in
practice we would like to have such a thing as reproducibility, which is for example essential for
debugging. So we would like our program to always output the same, on the same dataset. And this is
why we would probably not like to use random numbers, okay? Then we can do the following. The
following algorithm is known as intro sort and is used in many practical implementation of QuickSort.
So instead of selecting the pivot element randomly, let's select it as follows using, for example, the
following simple heuristic. Each time when we're given a summary, and we need to partition it with
respect to some pivot. So for this we need to select pivot, and let's select it as follows. We take the
first element of the summary, the last element and the middle element, for example. Then we have
three elements, and we sort them. We just compare them and we select the medium value of these.
And we use this element as our pivot element. So this is a very simple heuristic, it can be
implemented very efficiently. We just need three comparisons to select this median. And in many
cases this is enough for the QuickSort algorithm to work effectively. However, this is not what we
want, right. We are not happy with the statement that this algorithm works. Works well in many
cases. We would like our algorithm to works well just on every possible input.
Unfortunately there are pathological cases in which these heuristics works badly. But we can
overcome this in the following way. While running our QuickSort algorithm, well let's count what is
the current depths of our recursion three. And at some point when it exceeds some values here again,
for some constant c, then we just stop the current algorithm and switch to some other algorithm. For
example for the heap sort algorithm. This is another efficient algorithm, which is, asymptotically as
good as MergeSort I mean, it has asymptotic again. However Greek sort is usually faster in practice.
So, at this point, we switch to the quick sort algorithm.
Which means that for these pathological bad instances, for the QuickSort with this simple heuristic of
selecting the pivot element, we still work in the worst case in time n log m. Because before we
exceeded the depth c log n, we spend time n log m. And after this, we'll start this algorithm.
Immediately and we run the heap sort algorithm. So overall, we've spent time n log n. So this gives us
an algorithm which in many cases performs like the QuickSort algorithm and in any case, just in the
worst case, its running time is bounded above by n log n. So to conclude, the QuickSort algorithm is a
comparison based algorithm whose running time is big O of n log n in the average case, and big O of n
squared in the worst case.
What is important in this algorithm is that it is very efficient in practice. It is more efficient than the
north shore algorithim for example. For this reason it is commonly used in practice and for this reason
it is called QuickSort
Resources
Slides
Reading
Quick sort: Chapter 7 of [CLRS]

An elementary introduction to quick sort at Khan Academy
Visualizations
sorting-algorithms.com
References
[CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to
Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009

Quick Sort
Programming Assignment: Programming Assignment 4: Divide and
Conquer
Interactive Puzzle: Local Maximum
http://dm.compsciclub.ru/app/quiz-local-maximum
Week 5
Algorithmic Toolbox
Week 5
77 threads · Last post 2 days ago
Go to forum
Dynamic Programming 1
In this final module of the course you will learn about the powerful algorithmic technique for solving
many optimization problems called Dynamic Programming. It turned out that dynamic programming
can solve many problems that evade all attempts to solve them using greedy or divide-and-conquer
strategy. There are countless applications of dynamic programming in practice: from maximizing the
advertisement revenue of a TV station, to search for similar Internet pages, to gene finding (the
problem where biologists need to find the minimum number of mutations to transform one gene into
another). You will learn how the same idea helps to automatically make spelling corrections and to
show the differences between two versions of the same text.
Less
Key Concepts
 apply dynamic programming technique to implement efficient programs

 compute the edit distance between to files
 practice applying the most popular algorithmic technique: dynamic programming
Interactive Puzzle: Number of Paths
http://dm.compsciclub.ru/app/quiz-number-of-paths
Interactive Puzzle: Two Rocks Game
http://dm.compsciclub.ru/app/quiz-take-the-last-stone
Interactive Puzzle: Three Rocks Game
http://dm.compsciclub.ru/app/quiz-three-rocks-game
Change Problem
Before recording this lecture, I stopped by the coffee shop.
This cappuccino is good.
And as soon as I gave $5 to the cashier, she faced
an algorithmic problem of which coins to select to give me the change.
And cashiers all over the world use an algorithmic approach
called greedy algorithm to solve this problem.
Today we will learn how cashiers and computer scientists
use greedy algorithm for solving many practical problems.
So the change problem is finding the minimum number of coins
needed to make change.
More formally, input to the problem is integer money and positive integers,
coin1, coin2, coind, that represents coin denominations.
For example in the US, coin1 will be 1 cents,
coin2 will be 5 cents, 10 cents, 25 cents, and 50 cents.
And the output is the minimum number of coins, with denominations coin1,
coin2, coind that changes money exactly.
So today in the morning, when cashier had to return me 40 cents,
she most likely used the following algorithm.
First, finding the largest coin denomination
that is smaller than 40 cents.
It will be 25 cents.
So she gave me 25, 15 cents left and
then the next challenge is how to change 15 cents.
The next step is she probably found the largest coin smaller than 15 cents,
it is 10 cents.
She gave me 10 cents, and finally, she returned 5 cents.
As a result, she changed 40 cents as 25 plus 10 plus 5.
Do you think it's the minimum number of coins she could possibly return?
It is the minimal number of coins in the United States.
But if you travel to Tanzania, it won't be
the minimum number of coins because there is a 20 cent coin in Tanzania.
And therefore this greedy approach to solving the change
problem will fail in Tanzania because there is a better way to change 40 cents,
simply as 20 cents plus 20 cents, using Tanzanian 20 cents coin.
Since the greedy approach to solving the change problem failed,
let's try something different.
Let's try the recursive algorithm for solving the same problem.
Suppose we want to change 9 cents, and
our denominations are 1 cent, 5 cents, and 6 cents.
What would be the optimal way to change 9 cents?
Well, if we only knew what is the optimal ways to change 9 minus 6 cents,
9 minus 5 cents and 9 minus 1 cents, then we would know,
what is the optimal way to change 9 cents?
In other words, to change 9 cents,
we need to know how to change small number of cents,
in our case, 3 cents, 4 cents, and 8 cents.
And therefore, an approach to solving this problem would be
to use this recurrence to write the recursive program.
This idea is implemented in the program RecursiveChange.
To change money, cents using coins, coin1,
coin2, coind, we do the following.
We first recursively call RecursiveChange with the amount of money,
money minus coin1, money minus coin2, and money minus coind.
And find the minimum amount of money for these d choices.
We have plus 1 because there is one more coin to add and returns this way.
This looks like the right approach to solve the problem,
but let's check how fast the resulting program is.
So, when we're changing 76 coins, there are actually three choices.
We need to recursively call RecursiveChange for 70 cents,
71 cents, and 75 cents.
But for each of these values, we need once again to call three choices.
And we will continue growing this tree and
very quickly it will turn into a gigantic tree.
Let's check how many times we have already tried to change 70 cents.
Three times, and we only started expanding this tree.
In fact, if we continue further, we will see that there were six [mistake: four]
times when we needed to compute RecursiveChange for 70.
How many times do you think we will need to run recursive calls
when we compute the minimal number of coins for 30 cents?
It turn out that we will need to call it trillions of times,
which means that our seemingly very elegant
RecursiveChange program will not finish before the end of your lifetime.
So as simple as the change problem looks like, neither a greedy approach nor
a recursive approach solve it in a reasonable time.
60 years ago, a brilliant mathematician, Richard Bellman, had a different idea.
Wouldn't it be nice to know all the answers for changing money minus
coin i by the time we need to compute an optimal way of changing money?
And instead of the time consuming calls to RecursiveChange,
money minus coin i, that may require to be repeated trillions of times,
they would simply look up these values.
This idea resulted in dynamic programming approach that is applied in thousands
of diverse, practical applications in a myriad of different fields.
And the key idea of dynamic programming is to start filling this matrix,
not from the right to the left, as we did before in the recursive change, but
instead, from the left to the right.
So, we will first ask the trivial question,
what is the minimum number of coins needed to change 0 cents?
And, of course, it is 0.
What is the minimum number of coins to change 1 cents?
Obviously it is one, but we can compute this number by finding what
is the minimum number of coins to change 0 cents and adding one coin.
We will proceed in a similar fashion to compute the minimum number of
coins to change 2 cents, 3 cents, and 4 cents.
There is only one possibility to derive this number from the previous number.
And for 5 cents, actually there are two possibilities, green and blue.
For green one, you can derive it from 0 cents by adding 5 coins,
and for blue possibility, we can derive it from 4 cents by adding one penny.
Well, which possibility would you select?
Of course the one that gives you the minimum change for 5 coins.
And continue further and apply the code to 6 cents and there are three possibilities and once
again we select the optimal choice that correspond to minimum number of coins.
Let's say it may be 0 coins plus 6 cents.
We continue for 7 cents, continue for 8 cents, and finally,
very quickly, we actually found the correct answer for 9 cents.
We need four coins to change 9 cents.
And this results in DPChange algorithm that simply fills
up the table that I just showed you from left to right.
DP change is the first dynamic programming algorithm that you saw in this course,
and there will be thousands more.
You may be wondering why is this algorithm called dynamic programming and
what does it have to do with programming?
Well, in fact programming and
dynamic programming has nothing to do with programming.
Amazingly enough, dynamic programming is one of
the most practical algorithms computer scientists use.
But when Richard Bellman was developing this idea for
Air Force project he was working on, it looked completely impractical.
And he wanted to hide that he's really doing mathematics from the Secretary
of Defense, rather than working on Air Force project.
Therefore he invented a name
that basically has nothing to do with what dynamic programming algorithms do.
In his own word, he said, what name could I choose?
I was interested in planning but planning is not a good word for various reasons.
I decided therefore to use the word programming, and
I wanted to get across the idea that this was dynamic.
It was something not even a Congressman could object.
Change Money
Resources
Slides
Reading
Change problem: Section "An Introduction to Dynamic Programming: The Change Problem" of
[CP]
Visualizations
Making change by David Galles
References
[CP] Phillip Compeau, Pavel Pevzner. Bioinformatics Algorithms: An Active Learning Approach.
Active Learning Publishers. 2014.
String Comparison
The Alignment Game
Cystic Fibrosis is one of the most common genetic diseases in humans. Approximately one in 25
people carries a cystic fibrosis gene. And when both parents carry a faulty gene, there is a 25% chance
that their child will have cystic fibrosis.
In the early 1980s biologists started the hunt for cystic fibrosis genes, one of the first gene hunting
projects in the framework of the human genome project. 30 years ago biologists narrowed the search
for the cystic fibrosis gene to a million nucleotide-long region on chromosome 7. However, this region
contained many genes, and it was not clear which of them is responsible for cystic fibrosis. How
would you find which of these genes is the cause of cystic fibrosis?
I'll give you a hint.
Cystic fibrosis in involves sweat secretion with abnormally high sodium levels.
Well, this is a biological hint that does not help us to solve the challenge of finding something in this
one million nucleotide area that is responsible for cystic fibrosis. Let me give you hint number two. By
that time when cystic fibrosis hunt was on, biologists already knew the sequences of some genes
responsible for secretions. For example, ATP binding proteins act as transport channels responsible
for secretion.
You still may be wondering how these two hints may help you to find the cystic fibrosis gene in the
found one million nucleotide-long region on chromosome 7. But here's my third hint.
Should we search for genes in this region that are similar to known genes responsible for secretion?
Biologists used this third hint and bingo, they found that one of genes in this region was similar to the
ATP binding proteins that act as transport channels responsible for secretion. To learn how biologists
find similarities between chance, we will first learn how to play a simple game called the alignment
game.
The alignment game is a single person game. I give you two strings, and your goal is to remove symbol
from the strings in such a way that the number of points is maximized. I have to explain to you how
you can get points for playing the alignment game. You can either remove the first symbol from both
strings. And in this case, you get one point if they're the same symbol, you don't get any points if they
are different symbol. Or you can remove first symbol from one of the strings and in this case you also
don't get any points. So let's try to play this game. In the beginning it makes sense to remove the first
symbol from both strings, we'll get plus one. Then another pair of identical symbols, another plus one.
And now symbols are different so it doesn't make sense to remove them both because we'll get zero
point. Maybe we should only remove C from the second string and after we've done this there is a
possibility to remove two Gs from both string. We get another point we continue, continue, continue,
and finally after playing this game we get score of plus four. Do you think you can get score of plus
five playing this game?
We also after playing this game have constructed something that is called alignment of two strings.
Alignment of two strings is a two row matrix such that, that first row consist of symbols of the first
string in order, possibly interspaced with the space symbol. And the second row consists of the
symbols of the second string once again, possibly interspersed with the space symbol. After we
constructed the alignment, we can classify different columns in the alignment matrix as matches or
mismatches or insertions. Insertions corresponds to the case when we selected the symbol from the
second string and deletions that correspond to the case when we selected the symbol from the first
string. And more over we can score this alignment by giving premium for every match, we'll give
premium plus one and penalty for every mismatch and every insertion and deletion that we denote as
indel. In our case we will use penalty minus mu for mismatches and penalty minus sigma for
insertions and deletions or indels. For example in our case if mu equals zero and sigma equal to one,
then we get alignment score equal to one. So we define the alignment score as number of matches
minus mu number of mismatches and minus sigma number of indels. And the optimal alignment
problem is given two strings mismatch penalty mu, and indel penalty sigma find an alignment of two
strings maximizing the score. We will be particularly interested in one particular score of alignment.
We will define common subsequence as simply matches in an alignment of two strands. In this case,
common subsequence is represented by ATGT, and the longest common subsequence problems that
we will be interested in is the following. Given two strings we want to find the longest common
subsequence of these strings.
And of course, you have already recognized that to find longest common subsequence we simply
need to find maximum score alignment with the parameters mu equals zero and sigma equals zero.
Another classical problem in computer science is the edit distance problem. Given two strings, find
the minimum number of elementary operations, insertions, deletions, or substitutions of symbols.
That transform one string into another. And of course the minimum number of insertions, deletions,
and mismatches in an alignment of two strings, represents the edit distance. For example, if you want
to find the editing distance between the strings, editing and distance, they can construct optimal
alignment of the string with appropriate scores. Here I show matches, mismatches,insertions,
deletions. And to see that the edit distance problem is equivalent to the alignment problem let's
consider this alignment between editing and distance. And let's compute the total number of symbols
in the two strings.
Obviously the total number of symbol in two strings is equal to twice number of matches, plus twice
number of mismatches plus number of insertions plus number of deletions. I will take the liberty to
derive this expression and after I rewrote it you will see that the first three terms corresponds to the
alignment score, and the last three terms corresponds to the edit distance.
Therefore, minimizing edit distance is the same as maximizing the alignment score. Which means the
edit distance problem is just one particular version of the alignment problem.
Computing Edit Distance

Let's now see how dynamic programming algorithm solves the edit distance problem. We start by
considering two strings, A of length n and B of length m and we will ask the question question what is
an optimal alignment of an i-prefix of A, which is the first i symbols of A, and the j-prefix of B which
are the first j symbols only. The last column of an optimal alignment is either an insertion or a deletion
or a mismatch, or a match.
And please notice that if we remove the last column from the optimal alignment of the strings, what
is left is an optimal alignment of the corresponding two prefixes. And we can adjust the score of the
optimal alignment for i prefix and j prefix by adding plus 1 in the case of insertion, plus 1 in the case of
deletion, plus 1 in the case of mismatch and adding nothing in the case of match.
Let's denote D (i, j) to be the edit distance between an i-prefix and a j-prefix. And in this case, this
figure at the top of the slide illustrates the following recurrency. D(i,j) equal to the minimum of the
following four values: D(i,j-1)+1, D(i-1,j)+1, D(i-1,j-1)+1, in the case the last
two symbols in the i prefix of A and j prefix of B are different. And D(i- 1, j- 1), if the last symbols in i
and j prefix are the same.
Our goal now is to compute the edit distance D, i, j between all i prefixes of string A and all j prefixes
of string B.
In the case of string editing and distance we will construct eight by nine grid and our goal is to
compute all edit distances D(i, j) corresponding to all nodes in this grid. For example, for i and j equal
to four and four. How will we compute the corresponding distance D(i, j)? Let's start by filling
distances D(i, 0) in the first column of this matrix. It is easy because indeed we are comparing an i-
prefix of string A against a 0-prefix of string D and therefore this edit distance for i-prefix will be equal
to i. That's what's shown here, similarly we can easily fill the first row in this matrix. And now let's try
to compute what will be the distance D(1,1) corresponding to comparison of string consisting of single
symbol E, with a string consisting of single symbol D, there are three possible ways to arrive to the
node (1, 1): from the nodes (0, 0), (0, 1), and (1, 0). Which one should be the way we will select to find
the optimal edit distance. According to the previous recurrency, we should select the one of three
directions that gives minimal value for D(i, j), which is minimum of 2, 2, and 1 and therefore we arrive
to node (1, 1) by diagonal edge. Let's keep this in memory that the right direction to arrive at node (1,
1) was the diagonal direction.
We will now try to compute the edit distance for the next node in the matrix. And in this case D(2,1) is
equal to minimum D(2,0) + 1, D(1,1) + 1 and D(1,0) of each tells us that's the optimal way to arrive to
this node would be again, by diagonal edge. You continue further, once again compare three values
and it turn out that the best way to arrive to this node will be by vertical edge. We'll continue further
and we'll fill the whole second column in the matrix. Now let's continue with the circle, what about
this node? For this node D(1,2) = minimum {D(1,1) + 1, D (0,2) + 1, and D (0,1) + 1}. And it is minimum
of 2, 3, and 2. In fact, there are two optimal ways to arrive to this node and in this case we show both
of them by diagonal edge into this vertex and by horizontal edge of this vertex. You'll continue further
and slowly but surely we will fill the whole matrix.
The edit distance pseudocode implements the algorithm we just discussed. It first fills in the first
column and the first row of the dynamic programming matrix and then it continues filling it up by
computing the cost of moving to vertex (i, j) using insertion, deletion, or mismatch or match or in
other words, exploring all possibility. Moving to the vertex i, j using vertical edge, horizontal edge, and
diagonal edge. And then it finds out which of these possibilities results in the minimum edit distance.
We now know how to compute the edit distance or to compute the optimal
alignment by filling in the entries in the dynamic programming matrix.
But it doesn't tell us yet how to construct the alignment
two rows with the first row representing the first sequence and
the second row representing the second sequence.
Here's an idea.
Let's use the backtracking pointers that we constructed while filling in
the dynamic programming matrix to reconstruct optimal alignment between strings.
We can start by noting that any path from (0, 0)
to (i, j) in the dynamic programming matrix
spell an alignment of an i prefix of A with a j prefix of B.
For example let's start the line in the sequences, which means let's start
traveling from the point 0, 0 to the point n, m in our dynamic programming matrix.
As soon as we move along diagonal left it will correspond to either mismatch or
match, then we'll continue using horizontal or
vertical edges and it will correspond to insertions or deletions.
Then we will use once again diagonal edge.
In this case it is a match, and you'll continue
by constructing the n-alignment of two strings.
Please note that the constructed path corresponds to distance A and is not
an optimal alignment because we know that an optimal alignment distance is 5.
To construct an optimal alignment we will use the backtracking pointers by starting
from the last vertex in this matrix
particularly from this vertex where the added distance is recorded as 5.
Using backtracking pointers we see that
there are two possible ways to arrive to this last vertex.
Let's arbitrarily choose one of them.
One of them corresponds to a mismatch and another corresponds to insertion.
So let's arbitrarily choose a mismatch edge that will correspond to
mismatch between j and i, then from the previous point there is only
one way to move into this point and it will correspond to an indel
that will continue further, match, further, further, further,
further, and we will finally arrive to the initial point at
the same time constructing the optimal alignment between two strings.
The output alignment pseudoode implement's this idea.
We simply look at the backtracking pointers that enters in the node (i, j).
If they arrive to node (i, j) by using a vertical edge
that we will simply output one column of the alignment with
a of i in the first row.
If on the other hand it corresponds to horizontal edge we output
column with b of j in the second row, and
if it corresponds to a diagonal edge we output a column
of alignment with a of i in the first row and v of j in the second row.
It appears that we actually need to store all backtracking pointers
to output alignment, but this slightly modified
pseudocode tells you that you can compute backtracking pointers by analyzing
entries in the dynamic programming matrix and thus saving a little space.
Edit distance is just one
many applications of string comparisons in various disciplines that range
from analyzing internet pages to finding similar genes.
We started this lecture from the example of gene hunt for
cystic fibrosis: one of the first successes of the human genome project.
If you want to learn more about comparing genes, protein, and
genomes you may enroll in the Coursera specialization called Bioinformatics or
you can read the book Bioinformatics Algorithms: the Active Learning Approach.
Edit Distance
Resources
Slides
Reading
Edit distance: Section 6.3 of [DPV08]
Visualizations
Edit distance calculator by Peter Kleiweg
Longest common subsequence by David Galles (note the longest common subsequence problem
is a special case of the edit distance problem where we allow insertions and deletions only)
Advanced Reading
Chapter 5 "How Do We Compare Biological Sequences" of [CP]
Advanced dynamic programming lecture notes by Jeff Erickson
Both sources explain, in particular, Hirschber's algorithm that allows to compute an optimal
alignment (but not just its score!) of two strings of length nn and mm in quadratic
time O(nm)O(nm) and a linear space O(m+n)O(m+n) only.
References
[CP] Phillip Compeau, Pavel Pevzner. Bioinformatics Algorithms: An Active Learning Approach.
Active Learning Publishers. 2014.
Additional Slides
Dynamic programming is probably the hardest part of the course. At the same time, it is definitely
one of the most important algorithmic techniques. Please see additional slides that discuss an
alternative perspective for dynamic programming algorithms: to get to a dynamic programming
solution, we start from the most naive brute force solution and then start optimizing it. The slides
also contain many pieces of Python code.
Programming assignment 5
Interactive Puzzle: Primitive Calculator
http://dm.compsciclub.ru/app/quiz-primitive-calculator
Programming Assignment: Programming Assignment 5: Dynamic
Programming 1
Week 6
Algorithmic Toolbox
Week 6
Discuss this week's modules here.
42 threads · Last post 16 hours ago
Go to forum
Dynamic Programming 2
In this module, we continue practicing implementing dynamic programming solutions....
Key Concepts
 continue practicing implementing dynamic programming solutions

 learn more complex applications of dynamic programming
 implement efficient solutions to various problems in combinatorial optimization
Hi, today we are going to revisit the Knapsack problem, the problem that we already discussed in the
Greedy Algorithms module.
In this very first segment of this lesson, we will recall the definition of this problem, as well as
motivate its study by providing a few examples of applying this problem in real life.
Our first example is the following. Assume that you are given a time slot, say two or three minutes,
and together with this time slot, you are given a set of TV commercials. For each commercial, you
know its revenue and you know its duration, that is length in minutes, and your goal is to maximize
the revenue. That is, you would like to select some subset of your available TV commercials, so that
the total revenue is as large as possible while the total length does not exceed the length of your
available time slot.
In our second example, you are given a fixed budget and your goal is to purchase a number of
computers so that to maximize the total performance. Again, we assume that the part of your input in
this case is a set of available computers or machine and for each machine you know its price and its
performance. Both the considerated problems can be easily seen to be special cases of the following
general problem known as the Knapsack Problem. In this problem, you are given a set of items
together with the total capacity of the knapsack. For each item you know its value and its weight. For
example, the value of the green item here is four, while its weight is 12. And your goal is to select the
subset of items such that the total value is as large as possible while the total weight is at most, the
capacity of the knapsack. In our case, the total capacity of the knapsack is equal to 15. There are two
versions of the knapsack problem. Fractional knapsack and discrete knapsack.So, for the fractional
version, which you are already familiar with, you can take any fraction off of any item, while in the
discrete version, for each item, you either take the whole item in your knapsack or you do not take it
at all. So, in turn, the discrete version has two variants also. So, the first variant is knapsack with
repetitions. So in this case, you are given an unlimited quantity of each item. While in the knapsack
without repetitions, you are given just a single copy of each item. So we know already that the
fractional knapsack problem can be solved by a simple greedy algorithm. Such an algorithm at each
iteration just picks an element, an item with the currently maximal value per unit of weight. This
strategy, however, doesn't work for the discrete version of the knapsack problem. So instead of using
greedy strategy, we will design a dynamic programming solution to find an optimal value.
Now let me give you a toy example. Assume that our input consists of a knapsack of total capacity of
ten and four items shown on the slide. Then the optimal value for the knapsack without repetitions
problem is equal to 46 and it can be obtained by taking the first item and the third item into your
knapsack.
At the same time for the knapsack with repetitions problem. The optimal value in this case is equal to
48 and it can be obtained by taking one copy of the first item and two copies of the last item. Finally,
for the fractional knapsack problem, the optimal value is equal to 48 and a half and can be obtained
by taking the first item, the second item, and half of the last item.
Let's also use this example to show that greedy algorithm fails for the discrete version of the knapsack
problem. Recall that the greedy strategy for this problem is to first compute the value per unit of
weight for each item. In our case, the value per unit of weight for the first item is equal to five, for the
second item it is equal to four and two thirds, for the third item it is equal to four, and for the last
item it is equal to four and one half. So the first item has maximal value per unit of weight so we take
it into our solution. The next available item with the maximal value per unit of weight is the second
one, so we take it also into the solution. Now the remaining capacity is too small to add any other
element. So this is our constructed solution, and it has weight, it has value 44 which is not optimal, we
know it already. For example here, by replacing the second item by the third item, we will increase
the total value.
This actually means that taking an element with a maximal value per unit of weight is not a safe step.
Just by doing this we can lose a possibility to construct an optimal solution.
Right, so this means actually that we need some other algorithm to solve this problem optimally. And
we will design such an algorithm based on the dynamic programming technique in the next video.
Knapsack
In this video, we will design a dynamic programming solution for the Knapsack with repetitions
problem.
Recall that in this problem, we are given an unlimited quantity of each item.
This is a formal statement of the problem. We're given n items with weights w1, w2 and so on, wn.
And its values are v1, v2 and so on, Vn.
By capital W we denote the total capacity or the total weight of the knapsack. And our goal is to select
the subset of items where each item can be taken any number of times such that the total weight is at
most capital W while the total value is as large as possible.
To come up with a dynamic programing algorithm, let's analyze the structure of an optimal solution.
For this consider some subset of items, of total weight, at most capital W, whose total value is
maximal. And let's consider some element i in it, let's see what happens if we take this element out of
this solution. So what remains is some subset of items whose total weight is at most capital W minus
wi. Right? So this is easy. What is crucial for us is that the total value of this remaining subset of items
must be optimal. I mean it must be maximal amount all subset of items whose total weight is at most
capital w minus w i. Why is that? Well, assume that there is some other subset of items whose total
weight is at most, capital W- wi, but whose total value is higher? Let's then take the highest item and
put it back to this subset of items. What we get, actually, is the solution to our initial problem of
higher value. I mean, its total weight is at most capital W, and its value is higher than the value of our
initial solution. But these contradicts to the fact that we started with an optimal solution.
So, such trick is known as cut and paste trick. And it is frequently used in designing dynamic
programming algorithms. So, let me repeat what we just proved. If we take an optimal solution for a
knapsack of total weight W and take some item i out of it, then what remains must be an optimal
solution for a knapsack of smaller weight.
So this suggests that we have a separate subproblem for each possible total weight from zero to
capital W. Namely, let's define value of w as a optimal total value of items whose total weight is, at
most w.
This allows us to express value of w using the values for a smaller weight knapsack. Namely to get an
optimal solution for a knapsack of total weight w we first take some smaller knapsack and an optimal
solution for it and add an item i to it. So first of all to be able to add an item i to it and get a knapsack
of total weight W we need this smaller knapsack to be of total weight at most W minus wi, also when
adding i'th item to it we increase its value by vi, and the final thing is we do not know which element
to add exactly. For this reason, we just go through all possible elements, n items, and select the
maximal value. The maximal value of the following thing: Value of W minus wi, plus vi.
Having a recurrent formula for value of w as we just discussed, it is not so difficult to implement an
algorithm solving the knapsack problem with repetitions. Recall that we expressed the solution for a
knapsack, through solutions from knapsacks of smaller weight. This means that it makes sense to
solve our subproblems in the order of increasing weight. So we do this in the pseudocode. Initially we
set value of 0 to 0 just to reflect that fact
that the maximal possible total value of a Knapsack of weight 0, clearly equals 0. Then we go in a loop
from w=1 to W. And for each such w we just compute the corresponding maximum as follows. We go
through all items i such that wi is at most w.
And for each such item i, we see what happens if we take an optimal solution
of for a knapsack of size W minus wi, and add an item i into it. Clearly in this case, the total value is
value(w minus wi) plus vi, and the total weight is at most W. So this is a feasible solution for a
Knapsack of total weight W. So we check whether the result in value is larger and what we currently
have and if it is we update value of w. In the end, we just return value of capital W. So this algorithm
is clearly correct because it just implements our recurrent formula, right? So in particular this loop
just computes the maximum from the previous slide. Now let's estimate the running time of this
algorithm. It is not difficult to see that the running time is
of n multiplied by capital W. Why is that? Well just because we have two nested loops here. So this is
the first loop, and this is the second loop. The first one has capital W on it, capital W iterations. And
the second one has n iterations. N iterations. What happens inside in the loop here it takes just
constant time.
We conclude this video by applying our algorithm to the example considered a few minutes before.
So in this case we are given four items and a knapsack of total capacity 10. We are going to compute
the optimal value for all knapsacks of total weight from zero to ten. So, which means that it makes
sense to store all these values just in an array. So, shown here on the slide. Initially this array is filled
by zero's and we're going to fill it in with values from left to right.
So the first non-obvious cell is two. So this is the first weight for which we can add any item. So in this
case we can actually say that to get a solution for knapsack of total weight two we can get a solution
for knapsack of total weight 0 and add the last element to it. This will also give us plus nine to the
value.
So this is the only possible solution for this cell, so we do not even need to compute the maximum. So
in this case, the value is equal to nine.
So what about value of three? So in this case, we already have a choice. We can either get an optimal
solution for total weight one, and add the fourth element to it, or we can get an optimal solution for a
knapsack of total weight zero and add the second element to it, whose value is 14. So among these
two values, the second choice is better. It gives us a solution of value 14, so we'll write it in this cell.
Now, for value of 4, there are already three choices. Let's consider them. So also we can take an
optimal solution for a knapsack of total weight two and add the last to it. So this is plus 9 or we can
take an optimal solution for a knapsack of total weight one and add the second item to it
so plus 14 or we can take an optimal solution for a knapsack of total weight 0 and add the third item.
This is plus 16. Right? So in this case, we need to select the maximum amount 16, 14 and 9 plus 9
which is 18. In this case, 18 is the maximum value. So we'll write it in this cell. So by continuing in the
same manner, we can fill in the whole array
and see that the last element is equal to 48, we just devise that the optimal value for this knapsack
with repetitions problem is equal to 48. And also, let me remind you that this optimal value can be
updated by taking one copy of this item, and 2 copies of the last item.
In the next video, we will learn how to solve this problem when repetitions are not allowed.
Knapsack without Repetitions
In this video we will be designing a dynamic formatting solution for the Knapsack without Repetitions
problem. Recall that in this problem we're give a single copy of each item. So this is also to remind
you the formal statement of the problem, so we emphasize once again that we are not allowed to
take more than a single copy of each item. Well, we already know that our previous same reason
cannot produce the right answer for our new very namely for the Knapsack without repetitions
problems. Well this is simply because in our toy example is that optimal value for the Knapsack with
repetitions was 48 while the optimal value for the Knapsack without repetitions was 46. So this means
that if we just run our previous algorithm, it will produce an incorrect result. Still it is important to
understand where our algorithms, where our reasoning more generally fails for this problem.
So once again, let's consider an optimal subset of items for a knapsack of total weight capital W. And
assume for the moment that we know that it contains the nth element.
That is the last item. So we argue, well similarly to the previous case that if we take this item out of
the current knapsack, then what we get must be an optimal solution for a knapsack of smaller weight,
namely of total weight W- wn. So if we take we the smaller solution and we add the nth item to it, we
get an optimal solution for the initial knapsack of total weight, W. I assume however, that the optimal
solution for the smaller knapsack, already contains the nth item. This means that we cannot add
another copy of the nth element to it, right, because then the resulting solution will contain two
copies of the nth element which is now forbidden by the problem formulation. So this is why we need
to come up with a different notion of a subproblem. So still, let's take a closer look at our optimal
solution. It is not difficult to see that there are only two cases, either it contains the lost item, or it
doesn't contain it. I assume that it contains, and let's again take this nth item out of our current
solution. So what is left? First of all, it is some solution for a knapsack of total weight, capital W- wn,
and it also uses only items from 1 to n- 1, because, well, we just took out the nth item, right?
If, on the other hand, the initial optimal solution for the knapsack of total weight W does not contain
the nth item, well, then it contains only items from 1 to n minus 1. Right? Well this simple observation
will help us to get the right definition of a subproblem for this version of the knapsack problem.
Well on the previous slide we argued as follows. Consider an optimal solution for a knapsack of total
weight capital W. And there are two cases. Either it can contain the last item or it doesn't contain. If it
contains we can take it out, and reduce the problem for small knapsack using only items from one to
n minus one. On the other hand, if it doesn't contain the nth item, then we'll reduce it to another case
when the knapsack only uses items from 1 to n-1. In any case, we reduce the number of items and in
the first case, we also reduce the size of the knapsack, the total weight of the knapsack. We might
continue this process, and express the solution for all sub-problems through solutions to force up
subproblems. If we continue in the same fashion what we get somewhere in the middle is a solution
for a knapsack of some weight that uses some first i items. Well let's just use this as a definition of our
subproblem. Namely, for any w, from 0 to W, and for any i, from 0 to n, let's denote by value of w and
i the maximum value that can be achieved by using only items from 1 to i, and whose total weight is
at most w. Right, then it is easy to express it through solutions for smaller such problems. Once again,
value of w and i, is a subset, is an optimal
value of a subset, of the first items who stole the weight is utmost w. So we know that in this optimal
subset, either there is the i-th item or the i-th item is not contained in it. So there are two cases. So
we need to select the maximum out of two cases. And the first case if we take the i-th item out what
is left is an optimal solution for the following problem. We are allowed only to use the first i-1 items
and the total weight should be no more than w-wi, so this is the first term under the maximum.
In the second case, if the i-th item is not used in an optimal solution, then we just know that the
optimal solution is the same as for the knapsack of total weight, W, using only the first i- 1 items.
So we managed to express the solution for our problems through solutions for smaller sub-problems.
And this is probably the most important thing in designing dynamic problem in algorithms.
We now done our recurrent formula into a dynamic problem in algorithm. As usual, we start from
initialization namely with your set all the values of 0, j to 0 for all j and all the values of w, 0 to 0. Well,
this just expresses the fact that if we have no items, well, then the value is zero. If we have the
knapsack of total weight zero, then the total value's also zero, of course. Then recall, now, we need to
somehow compute, all other values of w, i.
Recall that we expressed value Wi of Wi through values of
W, smaller w and i- 1 and W and i- 1. This means that we always reduce the problem from Wi to
something with smaller number of items, to i- 1. This actually helps us to understand that it makes
sense to gradually increase the number of allowable items. And this is why we have in this
pseudocode an outer loop where i goes from 1 to n. When i is fixed, we will compute all the values of
W, i. So for this, we also go from W equal to 1 to capital W and do the following. So now, i and W are
fixed, we need to compute value of W, i. First, we just check what is the value of, what is the solution
for the subproblem when we use the knapsack of the same weight w but we only use the first i-1
items.
This is implemented as follows. We first just assign value of w, i to value of w, i-1. Then we need to
check whether we can improve this value by using the i-th item. First of all we can only do this if the
weight of the ice item does not exceed the weight of the current knapsack which is just W. So, if it
doesn't exceed we see what happens if we take an optimal value for the knapsack of the total weight
w minus wi. That is filled only by elements from 1 to i minus 1, and add the i-th element to it. If it
gives a larger value than we currently have, we will update the value of wi, so in the end we just
return the value of capital w and n. Because this is the solution to our initial problem. So this a
solution for a knapsack of size capital w that uses just all the n items, right? Now so it is clear that this
algorithm is correct just because it directly implements the recurrent formula that we already
discussed. So let's analyze its running time. It is not difficult to show, again, that its running time is
actually the same. It is again n multiplied by W. Well, this is again just because we have two loops
here. So this is the first loop with n iterations, and this is the inner loop with W iterations. And what is
going on inside only takes some constant time.
Now let's apply the algorithm that we've just designed to our toy example. Recall that we need to
store the values of all subproblems for Wi, for all W from zero to ten, and all i from zero to four, in our
case. For these purposes, it is natural to use a two-dimensional table, or two-dimensional array. You
can see such a two-dimensional array on the slide already filled in. So here we have i, so all the rows
of our columns are by all possible way of i, and all the columns in this set by all possible values of W.
Right, we start by initializing the first row, and the first column of this table by zero. That is, we fill this
row by zeroes and we fill this column by zeroes also. Then we start filling in this table row by row.
That is, we first fill in this cell, then this cell, then this cell, then this cell, and so on. So we go like this.
So we first fill in this row, then fill in this row, then fill in this row and then fill in this row. So the
results in value 46 is actually the answer to our initial problem. Now, let me show you how some
particular value, just through this trait, let me show you how some particular value in this table was
computed. For example, consider this cell.
So formally, this is value, value(10, 2). Which means that this is an optimal value of a knapsack of total
weight 10 that only uses the first two items. So assume that we don't know what to put here.
So we just need to compute it right now. So let's argue as we did before. So this is a knapsack of total
weight 10 that uses only the first two items. Well, we then say that the second item is either used or
not. So if it is not used, then this is the same as filling in the Knapsack of total weight ten just using the
first item. And we already know this value because it is in the previous row. So this is value 10, 1,
right? So the value in this case is 30. On the other hand, if the second item is used, then if we take it
out, what is left is an optimal solution for a knapsack of total weight 10 minus 3. Because 3 is the
weight of the second item, which means that it is an optimal solution for a knapsack of size 7. Of total
weight 7 that only uses the first, that is only allowed to use the first item. Also, if we add this item to,
if we add the second item to the solution, we get 30 plus 14. Which is much better than without using
the second item, right? So that's why we have 44 here.
And also for this reason we fill this matrix row by row. So now that when we need to compute the
value of this cell, we already have computed the value of these two cells.
So that's why we fill our metrics exactly row by row.
Now let me use the same example to illustrate an important technique in dynamic programming.
Namely reconstructing an optimal solution.
Reconstructing an optimal solution in this particular problem I mean finding not only the optimal
value for the knapsack of size of total weight. But the subset of items that lead to this optimal value
itself. For this we first create a boolean array of size four. In this array, we will mark, for each item,
whether it is used in an optimal solution or not. Now what we're going to do is to back trace the path
that led us to the optimal value, 46. In particular, let's try to understand how this value of 46 was
computed.
Well, first of all, 46 is formally value of 10, 4, that is is an optimal value for a knapsack of total weight
ten using the first four items. We argued that the fourth item is either used or not. If it is not used,
then this value is the same as the value 10, 3, which is shown here. That is the value of the knapsack
of the same weight, using the first three items. If on the other hand it is used, then what is left must
be an optimal solution for a knapsack of size 10 minus 2 which is 8, that uses also the first three items.
Well this value is already computed, it is 30, so we need to compute the maximum among 30 plus 9,
because, well the value of the last item is 9 and 46. In this particular case there, the maximum is equal
to 46 which means that we decided at this point not to use the last item, right? So we put 0 into our
boolean array to indicate this, and we move to this cell.
Again, let's try to understand how this value was computed. It was computed as the maximum value
of two numbers which depend on the following values. So either we do not use the third item, then it
is the same, has the value of this cell or we use the third item. In this case, what remains is an
knapsack of size, of total weight 6, and using the first two items and its value is 30.
Plus the weight of the third item, which is 16. In this particular case, 30 plus 16 is larger than 44,
which means that this value of 46 was computed using this value. This, in turn, means that we
decided to use the third item. Let's mark it by putting 1 into our boolean array. Now we stay in this
cell and we try to understand how it was computed. It was computed as a maximum over this 30 and
this 0, plus fourteen. Right, in this case, the first value is larger so we move to this cell and we mark
that we decided not to use the second item. Okay and finally, we realize that we arrived at this value
30 from the right, from the left upper corner. Right? So, this way we reconstructed the wall optimal
solution. Once again, we backtraced the path that led us to the optimal value.
Here, what is shown here, is that we decided to use the first item and the third item. So let's check
that it indeed gives us the optimal value of 46. So indeed if we compute the sum of the weight of the
first and the third item, it is 10. And while the total value is 30 plus 16 which is 46 indeed. And as I said
before this technique is usually used in dynamic programming algorithms to reconstruct the optimal
solution.
Final Remarks
Play
Volume
0:03/7:40
Subtitles
Settings
Full Screen
Notes
All notes
Click the “Save Note” button when you want to capture a screen. You can also highlight and save
lines from the transcript below. Add your own notes to anything you’ve captured.
Save Note
Discuss
Download
Help Us Translate
You may navigate through the transcript using tab. To save a note for a section of text press CTRL
+ S. To expand your selection you may use CTRL + arrow key. You may contract your selection
using shift + CTRL + arrow key. For screen readers that are incompatible with using arrow keys for
shortcuts, you can replace them with the H J K L keys. Some screen readers may require using
CTRL in conjunction with the alt key
We conclude this lesson with a few important remarks.
The first remark is about a trick called memoization.
Usually when designing a dynamic program and algorithm, you start
with analyzing the structure of an optimal solution for your computational problem.
You do this to come up with the right
definition of a sub-problem that will allow you to express the solution for
a sub-problem through solutions for smaller sub-sub-problems.
So, when you write down this recurrence relation you can actually transform it
to an ISA alternative algorithm or a recursive algorithm.
The corresponding i 20 algorithm just solves all sub-problems,
going from smaller ones to larger ones.
And for this reason it is also sometimes called a bottom up algorithm.
On the other hand, the recursive algorithm to solve a sub-problem
makes recursive calls to smaller sub-sub-problems.
And for this reason it is sometimes called the top down approach.
Well if you implement a recursive algorithms
straightforwardly it might turn out to be very slow because it will recompute
some radius many, many, many times.
Like with three-dimensional numbers for example.
However, there is a simple trick, and it is called memorization,
that allows you to avoid re-computing many times the same thing.
Namely, you can do the following, when solving sub-problems,
right after solving it you store its solution into a table, for example.
And when you make a recursive call to solve some sub-problem, before
trying to solve it, you check in a table whether its solution is already stored.
And if its solution is already in the table which means that it was
already computed then you just return it immediately.
So this recursive call, turns out to be just a table look up.
So this is how a recursive algorithm with memoization works.
Let's see how a recursive algorithm with memoization for
the Knapsack problem looks like.
For simplicity let's assume that we're talking about the Knapsack
we use repetitions.
In this case, we need to compute our sub-problem for a Knapsack of size w,
is just the optimal rate of a Knapsack of total weight w.
So we computed as follows, we computed by recursive procedure.
First of all, we check whether its solution is already in a hash table.
We use hash table to store pairs of objects.
So, for weight w, we store value of w if it is already computed.
If it is already in the table, we return it immediately, otherwise we just
compute it and we make recursive calls to compute the values for
the sub-problem on w minus wi, okay?
And when the value is computed, we just store it in our hash table.
So this way, we use memoization by storing this in the hash table
to avoid recomputing the same thing once again.
So once again, an iterative algorithm solves all sub-problems
going from smaller ones to larger ones, right?
And eventually solves the initial problem.
On the other hand the recursive algorithm goes as follows.
So it stars from the initial problem and
it makes recursive calls to smaller sub-sub-problems, right?
So in some sense an iterative algorithm and the recursive algorithm are doing
the same job, especially if we need to solve just old range of sub-problems.
However, a recursive algorithm might turn to be slightly slower because
it solves the same sub-problems on one hand.
On the other hand, when making a recursive call you also need to
put the return address on stamp, for example.
So, the recursive algorithm has some overhead.
There are however cases when you do not need to solve all the sub-problems and
the Knapsack problem is nice illustration of this situation.
So, imagine that we are given an input to the Knapsack problem where all
the weight of n items together with total weight of the Knapsack
are divisible by 100, for example.
This means that we are actually not interested in sub-problems
where the weight of the knapsack is not divisible by 100, why is that?
Well just because for any subset of items since all the weight
of items is divisible by 100 their total weight is also divisible by 100.
So in this case an iterative algorithm still will solve just
whole range of sub-problems.
While a recursive algorithm will make only those recursive calls
that I actually needed to compute the final solution.
So, it will make only recursive course through sub-problems
whose weight are divisible by 100.
The final remark of this lesson is about the running time.
So if you remember the running time of words that we recently designed in this
lesson was the log of n multiplied by w.
And this running time looks like polynomial, however it is not.
And this is why, so consider for example, the following input.
I mean, I assume that the total weight of the knapsack is as shown on this slide.
This is a very huge number, roughly ten to the 20,
I mean 20 digits of decimal representation.
At the same time, the input size is really tiny, just 20 digits, right?
So this is not gigabytes of data, just 20 digits but on this input
already our algorithm will need to perform roughly ten to the 20 operations.
This is really huge, for example we can't do this on our laptops,
and this is because to represent the value of W, we only need log W digits.
So, in case of the Knapsack problem,
our input is proportional not to n plus W, but to n plus log W.
Okay, and if you represent the running time in terms of n and log W,
then you get the following expression, n multiplied by 2 to the log W,
which means that our algorithm is in fact exponential time algorithm.
Put it otherwise, it can only process inputs where W is not large enough,
it's roughly less than 1 billion, for example.
Okay, and in fact, we believe that it is very difficult to construct an algorithm
that will solve this problem in polynomial time, in truly polynomial time.
In particular, we will learn later in this presentation that this problem
is considered to be so difficult that for solving the Knapsack problem for
example, in polynomial time, one gets $1 million.
Polynomial vs Pseudopolynomial
Many of you are surprised to learn that the running time O(nW) for the knapsack algorithm is
called pseudo polynomial, but not just polynomial. The catch is that the input size is proportional
to logW, rather than W.
To further illustrate this, consider the following two scenarios:
1. The input consists of m objects (say, integers).

2. The input is an integer m.
They look similar, but there is a dramatic difference. Assume that we have an algorithm that loops
for m iterations. Then, in the first case it is a polynomial time algorithm (in fact, even linear time),
whereas in the second case it is an exponential time algorithm. This is because we always
measure the running time in terms of the input size. In the first case the input size is proportional
to m, but in the second case it is proportional to logm. Indeed, a file containing just a number
“100000” occupies about 7 bytes on your disc while a file containing a sequence of 100000 zeroes
(separated by spaces) occupies about 200000 bytes (or 200 KB). Hence, in the first case the
running time of the algorithm is O(size), whereas in the second case the running time
is O(2size).
Let’s also consider the same issue from a slightly different angle. Assume that we have a file
containing a single integer 74145970345617824751. If we treat it as a sequence
of m=20 digits, then an algorithm working in time O(m) will be extremely fast in practice. If, on
the other hand, we treat it as an integer m=74145970345617824751, then an algorithm
making m iterations will work for
74145970345617824751109⋅60⋅60⋅24⋅365≈2351
years, assuming that the underlying machine performs 109 operations per second.
Further reading: a question at stackoverflow.
Resources
Slides
Reading
Knapsack: Section 6.4 of [DPV08]
References
placing parentheses
Hello, and welcome to the next lesson in the dynamic programming module. In this lesson, we will be
applying the dynamic programming technique for solving a wide range of problems where your goal is
to find an optimal order of something. We will illustrate this technique by solving the so-called placing
parentheses problem. In this problem, your input is an arithmetic expression consisting of numbers or
digits and arithmetic operations, and your goal is to find an order of applying these arithmetic
operations that maximizes the radian. You specify this order by placing parentheses, and that's why
the problem is called placing parentheses.
As usual we start with problem overview.
Consider the following toy arithmetic expression. 1 + 2- 3 x 4- 5. In this case we have five digits and
four arithmetic operations. And we would like to find an order of applying these four arithmetic
operations to maximize the value of this expression. So when the order of operation is fixed, you do
the following. You take the first operation. You take two adjusting digits, and you apply these
operations. For example, if the operation is multiplication, in this case, so then two digits are 3 and 4.
So you multiply 3 and 4, you get 12, and you just replace 3 times 4 by 12. You then take the next
operation, apply it also, and replace two numbers and the arithmetic sign by this result, until you
proceed in a similar fashion. In the end here, you get a single number. And your goal is to find an
order that guarantees that this number is as large as possible.
You can specify an order just by placing a set of parentheses in your expression. For example, if you
would like to apply all your four operations just from left to right, you place the parentheses as
follows. In this particular case, we compute the results as follows. So we first compute 1 + 2, this is 3.
We then subtract 3 from the results. This gives us 0. We then multiply the result by 4. This is still 0.
And finally, we subtract 5. So this gives us -5. And this is actually non-optimal, because for example,
there is a better order. In this case, we first multiply 3 and 4, this gives us 12. We then subtract 5, this
gives us 7. Then we go to compute the sum of 1 and 2, this gives us 3. So when the final operation is
subtraction, we subtract 7 from 3. This gives us -4. So in this case the order of applying operations was
the following. So we first compute the product of 3 and 4, so this is the first operation. We then
subtract 5. This is the second operation. We then compute the result of 1 + 2. So this plus is the third
operation, and this minus is the fourth operation, the last one.
It is not difficult to see that the optimal value in this case is equal to 6. And it can be obtained as
follows. You first subtract 5 from 4. This gives you -1. You then multiply it by 3, and you get -3. You
then compute the sum of the first two digits. This is 1 + 2, and that is equal to 3. Finally you subtract
-3 from 3. This is the same as 3 + 3, it is equal to 6. Well, you might find the result as follows, you just
go through all possible orders. Let's see how many different orders are there. Well, there are four
arithmetic operations in this case, so you can choose any of the four possible arithmetic operations to
be the first one. You can choose any of these three remaining operations to be the second one, and
you can select any of the two remaining operations to be the third one. And the last one is unique, it
is the only remaining operations. So, in total, there are 4 by 3 by 2 by 1 different orders. This is equal
to 24, and you can just enumerate all of them, write them down, compute an answer for each of
these orderings and select the maximal value. However, our method of going through all possible
orderings does not scale well. And this is why. Consider the toy example shown on the slide. In this
case we have six digits and five arithmetic operations. This example will require us to go through all
possible 120 orderings.
So just because there are five iterations, so any of five of them can be the first one, any of the
remaining four of them can be the second one, and so on. So this is 5 by 4 by 3 by 2 by 1, which is
equal to 120. This is already not so easy to do this by hand. I mean, to go through all possible such
orderings. Well, this is not easy, but we can teach a computer to do this, right? So we can implement
an algorithm that goes through all possible orderings. However, in general, this algorithm will perform
roughly n factorial steps, where n is the number of arithmetic operations, for exactly the same reason.
If you have n arithmetic operations, then any of them can be the first one. Any of the remaining n
minus 1 operations can be the second one, and so on. So this is n times n minus 1, times n minus 2,
and so on. This is equal n factorial, and n factorial is an extremely fastly growing function. For
example, 20 factorial already equals roughly 2 times 10 to the 18. This means that if you implement
such an algorithm, it will not be able to compute the maximum value of an expression consisting of
just 20 digits in a reasonable time, even in one year, not to say about one second. Which means, as
usual, that we need another algorithm, as you might well have guessed. We will use dynamic
programming to find, to design a more efficient algorithm. In the meantime, you might want to check
your intuition by trying a few possible orderings to perform in this small expression, and by using our
in video quiz.
Maximum Value of an Arithmetic Expression

As usual, we start designing our dynamic program in algorithm
by defining a subproblem in a way that allows us to solve
a subproblem by solving smaller sub subproblem.
As we said already, this is probably the most important step
in designing dynamic programming solutions.
So before doing this, we define our problem formally.
So the input consists of n digits.
d1, d2, and so on, dn.
And then -1 operations between them, which we call op1, op2, and so on, opn.
Each operation is either summation, subtraction, or multiplication.
And our goal is to find an order of applying these operations so
that the value of the resulting expression is maximized.
As we discussed already,
we can specify this order just by placing parentheses into our expression.
We start building our intuition by reconsidering our toy example.
So assume that the multiplication is the last operation in some optimal ordering
in an ordering leading to an optimal value in this toy example.
Well this means that in this expression we already have to pairs of parentheses.
And our goal is to parenthesize the initial sub-expression and
the second sub-expression, so as to maximize the value.
This means that it would be good for us to know what is an optimal value for
the first subexpression and the second subexpression, right?
And in general if you have an expression and
if you select a realistic operation, which is the last one,
then it splits your initial expression into two subexpressions, right?
And for both of them it would be good to know an optimal value.
And, in turn, each of these two subexpressions are split into two
sub subexpressions by the last arithmetic operations, and so on.
So this suggests Very good problem in our case would be find an optimal value for
any subexpression or former initial expression.
So we've just realized that it would be good to know the optimal values for
all subexpressions of our initial expression.
What do we mean however, by saying optimal values for all subexpressions?
Assume for example that we need to compute the optimal, the maximal value for
the sum of two subexpressions, subexpression one and subexpression two.
Well this obviously means that we would like this subexpression to be
as large as possible and this subexpression to be as large as possible.
If on the other hand we would like to compute
the maximum value of subexpression one minus subexpression two.
Well this means that we would like subexpression one to be
as large as possible while we would like the value of subexpression two
to be as small as possible, right?
Just because we compute subexpression one minus subexpression two.
This suggests that knowing just the maximal value for
each subexpression would not be enough.
And this usually happens when designing a dynamic programming solution.
This also suggests that, instead of computing just maximal,
we will maintain what is the maximum value and
the minimum possible value for each subexpression.
Let's illustrate this reasoning once again with our previous toy example.
So in this case we are maximizing the product of two small subexpressions.
In this case these two subexpressions, so
small, that it is not difficult to compute their minimal and maximal values.
For example, for subexpression 5- 8 + 7,
the minimum value is- 10 and the maximal value is 4, right?
At the same time, for the second subexpression,
(4-(8+9)), the minimum value is- 13, while the maximum value is 5, right?
Now we would like to parenthesis both subexpressions, so
that their product is maximal.
Well it is not difficult to see, that in this case the optimal way to do this is
to take the minimal values of both sub expressions, right?
So this will give us- 10 multiplied by -13, which is equal to 130.
Right?
Which is much larger than the product of the maximum
values of these two sub expressions which is 4 by 5, which is 20 in turn.
Okay, we are now ready to write down the recurrent relation for our subproblems.
Before this, let's formally define E of ij to be the subexpression of our initial
expression resulting by taking digits from i to j and all operations between them.
Then our goal is to compute the maximum value of the subexpression which we denote
by capital M(i,j) and the minimum value of the sub expression denoted by m(i,j).
Okay, can you see that our initial subexpression from I to J and
assumes that we would like to compute one of the extreme
values of the subexpression and implies there is a minimum or the maximum.
Well we know that in many ordering for this subexpression there is some
last operation, say okay so this separation splits our
initial subexpression into two sub subexpression namely,
subexpression i, k and subexpression k plus 1j.
Right?
To compute the maximum value, we just go through all possible such case,
from i to j- 1, and through all possible extreme values for two subexpressions.
I mean, either we apply operation k to the maximum values of these two
subexpressions or we apply operation K to minimum value,
the minimum values of these two subexpressions.
Or we apply it to the maximum value of one subexpression and
the minimum value of another or vice versa.
To compute the maximum value of sub expression i j,
we just select the maximum among all these possibilities.
While to compute it's minimum value,
we simply select the minimum among all such possibilities.
We now convert our recurrence relation into a dynamic programming algorithm.
We start by implementing a procedure that computes the minimum and
maximum value of the subexpression (i,j) through optimal values for
smaller sub subexpressions.
So the procedure is called MinAndMax(i,j).
So we first declared two intervals, max and min.
Initially min is equal to plus infinity, max is equal to minus infinity, or
to a very large number, or to very small number.
Then we go through all possible values of k between i and j- 1.
I mean between,
we just go through all possibilities of splitting our subexpression (i,
j) into two sub subexpressions from i to k and from k plus 1 to j.
When such a splitting is fixed, we compute four possible values, applying
opk to either two maximum values of this subexpression or two minimum values or
two maximum and minimum value or two minimum and maximum value.
When such two values are computed,
we just check whether one of them can improve our minimum or maximum values.
If it improves we update the min or max variable.
Finally we return the minimum value and the maximum value for our subexpression.
Our current relation expresses the solution for an expression (i,j) for
a solution for smaller sub subexpressions.
What do we mean by saying smaller?
Well, we mean just that they are shorter, right?
So once again when we compute the value for a subexpression (i,j) we rely on
the fact that those are values for shorter subexpressions are already computed.
This means that our algorithm needs to compute the solutions for
all subproblems in order of increasing length.
Namely, in order of increasing value of j minus i, right?
So for this problem we have roughly quadratic number of subproblems.
Namely our subproblem, i, i, j, is parameterized by the value of i and
j which in turn range from 1 to n.
Right, so it makes sense in this case to store the values for
all subproblems in a two dimensional table of size n by n.
Recall also that we need to recall our subproblems
in the order of increasing value of j- 1.
We can do this just by going through all subproblems in an order
shown on the slide.
So, why this order?
Well this is simply because it goes through all possible values of i,
j in order of increasing j minus y as required.
So lets take a look.
On this diagonal we have
all the cells where I, where j- i is equal to 0, right?
So the first cell here is 1, 1.
The second cell is 2, 2.
The third cell is 3, 3 and so on.
We then proceed to this cell here i is equal to 1,
j is equal to 2, so the difference is 1.
We then proceed to this cell.
This is the cell 2, 3 with the difference 1 again.
We then proceed to this cell which is 3, 4 and so on.
So on this cell we have on this diagonal we have all the cells i,
j where i- j = 0.
On this diagonal we have all cells i, j where j- i = 1.
For this diagonal, this difference is equal to two.
For this diagonal, this difference is equal to three and so on.
The resulting value for
our initial subproblem will be computed as the value of the last cell.
Right, because of this cell responds to the initial subexpression from one to n.
Now everything is ready to write down an algorithm.
In the algorithm we will maintain two tables, m and capital M.
The first one for storing the minimum values for all subexpressions, and
the second one for storing the maximum values for all subexpressions.
We start by initializing these tables as follows.
So when subexpression contains just one digit,
which means that when j = i, then there is nothing,
actually to minimize or maximize because there are no operations.
So there is no order on operations.
So, because of that we just initialize
the main diagonals of this table with the most current point in digits.
This is with the following loop.
So m(i,i) and M(i,i) = di.
Then we go through all possible subproblems in order of increasing size.
And this is done as follows.
We gradually increase the parameter s from 1 to n- 1.
This is done in the following loop.
When s is fixed, i goes from 1 to n- s.
And j is computed as i + s.
This is done to go through all possible payers (i,j)
such that j- i = s.
Right when i and j are fixed we call the procedure min and
max to compute the minimum and maximum value of the subexpression (i,j).
All right.
So finally we return the value of capital M of 1,n as the result for
our initial problem because this subexpression,
1 n corresponds to our initial problem.
Containing all digits from 1 to n.
Okay so the running time of this algorithm is cubic.
Namely, big O of nq.
And these can be seen by noting that we have two nested loops.
The first one with n-1 iterations, the inner one is
with n-s iterations, which is at most n.
Also, inside these two loops we have a call to min and max procedure.
The running time of min and
max procedure is proportional to j-i which is also at most n.
So the right end time however algorithm is it must O and
n times n time n, which is n cubed.
This slide shows an example on how a table's m and
capital M look like if we ran a well reason on our toy example,
namely expression 5- 8 + 7 x 4- 8 + 9.
Let's just go through this example step by step.
So we start by filling in the values on the main diagonal in both matrices.
So this is 5, this is 8, this is 7, this is 4, 8, 9.
So this response to subexpression consisted of just one digit.
So there is nothing to maximize or minimize.
So we do the same for capital M matrix.
We then proceed to the second diagonal.
Well with -3 here and this corresponds to this subexpression
again in this case there is just one operation.
So there is nothing to minimize or maximize here,
because there will just be one other when we have Just one sign.
So we put -3 here this corresponds to the problem, to the subproblem one, two.
Let me put all the indices here by the way.
Then we proceed through the cell to 3, which corresponds to this subproblem.
Again, there is nothing to maximize or minimize so we continue in the same way.
In this case it is not so interesting and
then we proceed to the third day namely to this cell.
So this can respond to the subexpression 1,3 which
consists of three digits and two operations, minus and plus.
So we know that one of them is the last operation in the optimal order
when computing minimal value for example.
So as soon as this is minus.
This will split the subexpression into two sub subexpression, 5 and 8 + 7.
So for both the subexpressions we already know their maximum and minimum values.
So once again, this subexpression corresponds to (1, 1),
this subexpression corresponds to (2, 3).
Sort of from second to third digits, and third digit from first to first digit.
So we know that for the first subexpression we know already it's minimum
value it is here, and it's maximum value, it is here.
So for the second subexpression, we already know it's minimum value,
it is here.
It is 15, and then its maximum value.
It is also 15.
So by going through all possible pairs of obviously maximum and
minimum values, in this case, they're all the same.
We compute the minimum value, which is just 5- 15.
It is minus ten.
However, this was only the first case of splitting this
sub expression into two sub expressions.
And as a possibility would be the following so
we can split it into the following two subexpressions.
So this corresponds to 1, 2 and this corresponds to 3,3.
Right?
So, for one two we know its minimum value, it is minus three,
and its maximum value, it is also minus three.
For 3, 3 we know its maximum value.
It is here, seven.
Its minimum value and its maximum value.
So then we can compute- 3 + 7,
which gives us just 4.
So for the maximum value of the subexpression (1,3) we select 4.
For the minimum value we select -10.
So we proceed filling in this table in a similar fashion.
So we then put 36 here in this cell, then -20 in this cell,
and then parallel we put 60 here, 20 here, and so on.
So, in the end we see the value 200 here.
And this is the maximum value of our initial expression.
This still doesn't give us the optimal load rate itself, but
we will be able to reconstruct it from these two tables.
Now we are sure that the maximum value of our initial expression is 200,
and we will find out the optimal ordering, or the optimal sizing in a minute.
Reconstructing a Solution
In this last video of this lesson, we show a method of reconstructing an actual
solution from two tables computed by our dynamic programming algorithm.
Okay, here on this slide we see two tables, m and capital M.
Computed by our dynamic program and
algorithm which contain minimal and maximal values respectively for
all possible subexpressions of our initial expression.
Let me first put in this for
all the rows and columns of these two matrices,
as well as numbers for our initial digits.
Well, in particular, we see by reading the contents of this cell
capital M of (1,6) that the maximal value of our initial expression is equal to 200,
and our goal is to unwind the whole solution, I mean,
parenthesizing of the initial expression, from these two tables.
So our first goal on this way is to understand from which two
subexpressions of the initial expression the value 200 was computed.
Well, let's see, when computing the value for the maximal value for
subexpression (1,6), we tried all possible splittings
of the expression (1,6) into two subexpressions.
Well, let's just go through all of them.
The first possibility is to split it into two subexpressions (1,1),
which corresponds just to the first digit which
is just 5, and subexpression (2,6),
with a minus sign between them, right.
So for both these two subexpressions we already know minimal values and
maximal values.
Well, let me mark them.
So this is the minimal value for the subexpression (1,1).
This is the maximal value for subexpression (1,1).
For (2,6), this is the minimal value,
-195, and this is a maximal value, 75.
So we would like to maximize this subexpression
one minus subexpression two, which means that we would like the first subexpression
to be as large as possible and the second subexpression to be as small as possible.
Well, this means that we need to
try to take the maximal value of the first subexpression which is five and
the minimal value of the second subexpression which is -195.
Well, we see that in this case,
5 minus -195 is the same as 5 plus 195,
which equals exactly 200, right,
which allows us to conclude, actually,
that the value 200 can be obtained as follows.
So, we subtract the minimum value which is -195
over the second subexpression from 5, right.
So we restored the last operation in an optimal
parenthesizing of the initial expression.
However, we still need to find out how to obtain
-195 out of the second subexpression.
Well, let's do this.
Okay, so we need to find how the minimum value
of the subexpression (2,6) was obtained.
Well, there are several possible splittings, once again,
of the subexpression (2,6) into two smaller sub-subexpressions.
The first of them is to split (2,6) into (2,2),
which just corresponds to the digit 8 plus (3,6).
Well, in this case, we would like the value to be as small as possible and
our sign is plus in this case, which means that we would like the value of
subexpression (2,2) to be as small as possible and
the value of subexpression (3,6) also to be as small as possible.
And you already know these values, they are in our tables,
so the minimal value of subexpression (2,2) is 8,
while the minimum value of subexpression (3, 6) is minus 91, right.
So we see that the sum of these two values is not equal to -195,
right, which means that plus is not the last operation
in the optimal parenthesizing that gives the minimum
value of subexpression (2, 6), right.
So let's check the next one.
Another possibility to split the subexpression (2, 6) is the following.
We split it into subexpression (2,
3) times subexpression (4, 6), right.
So once again, we would like to find the minimum value of subexpression (2, 6).
Well, let's see just all possibilities.
The minimum value of subexpression (2, 3) is 15.
It's maximal value is also 15.
As to subexpression (4,6), its minimum value is -13.
It's maximal value is 5.
And we would like the product of these two values to be as small as possible.
Well, it is not difficult to see that if we take just 15 and
multiply it, which is a minimum value of subexpression (2,3),
and multiply it by the minimum value of the subexpression (4,6),
which is -13, then we get exactly -195.
And this, in turn, allows us to get -195
from the subexpression (2,6).
We can do as follows.
We can first compute the sum of 8 and 7.
This gives us 15.
And then to multiply it by the result of the second subexpression.
Well, now it remains to find out how to get -13 out of this subexpression for
6, but in this small example, it is already easy to get -13.
Well, we just first compute the sum of 8 and 9 and
then subtract it from 4, right.
So this way we reconstructed the whole solution, I mean,
an optimal parenthesizing, or an optimal ordering, of our arithmetic operations,
leading to this value, to this maximal value, 200.
Let's just check it once again that our parenthesizing leads to the value 200,
indeed.
So we first compute the sum of 8 and 9.
This gives us 17.
We then subtract 17 from 4.
And this gives us -13.
We then compute the sum of 8 and 7.
This gives us 15.
We multiply 15 by -13.
It gives us -195, and, finally,
we subtract this number from 5, and we get 200, indeed.
So we reconstructed the whole solution.
In general, I mean for an expression consisting of n digits and
n minus 1 operations they can respond, an algorithm makes roughly quadratic number
of steps, because it needs to reconstruct n minus one operations, I mean,
an order of n minus one operations, going from last one to the first one.
And for each operation, it potentially needs to go through all possible
splittings into two subexpressions, and this number is at most M.
So the running time is bigger of ten times M, which is bigger of M squared.
And this technique is quite general.
It applies in many cases in the dynamic problem in algorithms.
Programming assignment
Programming Assignment: Programming Assignment 6: Dynamic

Programming 2

Coursera Algorithm Toolbox

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Coursera Algorithm Toolbox

Uploaded by

Copyright:

Available Formats

Resources :

20-June >> Start Coursera

Measuring Algorithms Perfromance - 1 (Arabic)

The Ultimate Big O Notation Tutorial (Time & Space

1.8.1 Asymptotic Notations Big Oh - Omega - Theta #1

Data Structures Tutorial #2 - Big O notation explained |

#01 [Data Structures] - Complexity

#2.1- Time Complexity Analysis: Frequency Count | ‫بالعربي‬

‫ مثلة عن كيفية حساب الـ‬Big O complexity

Reading: What background knowledge is necessary?

Programming Assignment 1: Programming Challenges

Video: LectureSolving the Sum of Two Digits Programming Challenge (screencast)

Programming Assignment: Programming Assignment 1: Sum of Two Digits

(OPTIONAL) Solving The Maximum Pairwise Product Programming Challenge in C++

Reading: Optional Videos and Screencasts

Video: LectureStress Test - Implementation

Video: LectureStress Test - Find the Test and Debug

Video: LectureStress Test - More Testing, Submit and Pass!

Reading: Alternative testing guide in Python

Maximum Pairwise Product Programming Challenge

Reading: Maximum Pairwise Product Programming Challenge

Practice Quiz: Solving Programming Challenges

Purchase a subscription to unlock this item.

Programming Assignment: Programming Assignment 1: Maximum Pairwise Product

Using PyCharm to solve programming challenges (optional experimental feature)

Reading: Using PyCharm to solve programming challenges

The book includes:

 some theory on algorithm design techniques;

Order the book through Amazon (printed, kindle), Leanpub (pdf, mobile friendly pdf),

2. Basic knowledge of discrete mathematics: proof by induction, proof by contradiction.

Knowledge of discrete mathematics is necessary for analyzing algorithms (proving correctness,

Maximum Pairwise Product Programming Challenge

Reading input, writing output and the solution to the problem.

Just writing the output.

Just reading the input.

Just the solution of the problem.

The problem statement is wrong.

The input data is incorrect.

Just submit the program and see if it passes the assignment.

You enter the input data manually.

Video: LectureWhy Study Algorithms?

Greatest Common Divisor

Video: LectureProblem Overview and Naive Algorithm

Notebook: Big-O Notation: Plots

Practice Quiz: Growth rate

Purchase a subscription to unlock this item.

Programming Assignment: Programming Assignment 2: Algorithmic Warm-up

Due Jun 14, 11:59 PM PDT

If you find this lesson difficult to follow

Greatest Common Divisor

Problem Overview and Naive Algorithm

If you find this lesson difficult to follow

If you find this lesson difficult to follow

Hello everybody, welcome back to data structures and algorithms specialization.

Play video starting at 10 seconds and follow transcript0:10

Play video starting at 1 minute 1 second and follow transcript1:01

Play video starting at 1 minute 47 seconds and follow transcript1:47

So the key idea is if we come up with a measure of runtime complexity that

Play video starting at 2 minutes 8 seconds and follow transcript2:08

Play video starting at 2 minutes 35 seconds and follow transcript2:35

Play video starting at 3 minutes 18 seconds and follow transcript3:18

Play video starting at 3 minutes 31 seconds and follow transcript3:31

Play video starting at 3 minutes 42 seconds and follow transcript3:42

Play video starting at 3 minutes 54 seconds and follow transcript3:54