You are on page 1of 7

- Hi, welcome to the course.

In this course, there are two

main subjects that we will study.

One is probability and the other is statistics.

These are very related subjects,

but they are still different, so let's start

by thinking about what is probability.

So probability is the mathematical framework

for computing probabilities of complex events.

That's a mouthful, and we make an assumption

that we know the probability of the basic event.

What precisely do we mean by probability and by event?

Those will be defined later in the class.

For now, let's just think in terms of common sense.

So let's think about a simple question.

We flip a coin and we get tails.

We flip it again, we get heads.

Okay, what we believe somehow is that

the probabilities are equal, but

what do we really mean by that?

What does that mean, does it mean that

we'll exactly get the same number of heads and tails?

No, it just means that ...

if we flip the coin many times,

then for some very large number of coins that we flip,


let's say ten thousand, we'll get

the number of heads is approximately or about five thousand.

Okay, that's what we expect, but

what do we mean by about, how can we,

how can we express this notion of about

in a better way because we might be

actually interested in knowing how far from that we are.

Okay, so we're going to simulate coin flips.

We'll use a pseudo random number generator

to simulate coin flips, and instead of heads and tails

it'll be more convenient to use one and minus one,

and then the number of heads relates to

summing all of these plus ones and minus ones

and what we expect is that the sum

will be zero or close to zero.

So we will vary the number of

coins flips which we denote by k

and here is a little bit of code

for generating such random coin flips.

Here we're generating the coin flips themselves

and here we're summing the coin flips

along a particular sequence.

We're generating many sequences at once,

and this is the number n that we

say here is by default one hundred.


So this is a central part of the code,

but I'm not going to show you all of the code,

for, to see that, you have to download

the notebook yourself and play with it to see the details.

So here is a histogram that shows for

flipping a coin a thousand times

what is the distribution of this sum, that we said.

Sum is about zero, but then you see it's not exactly zero,

and every time that I rerun this experiment,

every time that I rerun the experiment,

what you see is that the histogram

that you get is slightly different.

However, even though it is always

each time slightly different, there is

something very much in common for all of these coin flips.

They're all concentrated around zero,

but they're not exactly zero, and for this

number of coin flips, one thousand,

it is extremely unlikely that they're

below two hundred and fifty or above

minus two hundred and fifty and above two hundred and fifty.

Okay, so with probability theory,

we can calculate how small we expect Sk, the sum, to be.

The absolute value of the sum, so it can be

either negative or positive, and what we show,


we will show, is that the probability

that this Sk is larger than four times

the square root of k is extremely small.

It is two times ten to the minus eight,

or 0.000002 percent,

so we'll have to flip

the sequence of one thousand coins

many many many times before we can see

that it will be bigger than four square root of k.

So let's actually do the simulation

and see if that is the case.

Okay, so here is our simulation.

What we see is, here, we have one hundred coin flips.

Here, one thousand coin flips,

and here ten thousand coin flips,

and the red line mark what probability theory says

is the boundary in which it is very, very likely

that the total number of coin flips resides.

So I can rerun this experiment too.

And you see that again, each time

the distribution is somewhat different

but it never goes outside of the red bar.

So that's consistent with what we said.

Now, here it seems that all of them are very similar.

It doesn't really matter if you do


one hundred, one thousand, or ten thousand coin flips,

but that's really because I'm scaling it

according to this boundary, so the boundary

here is minus 40 to 40, here it's

minus one hundred and something to one hundred and something

and here it's minus four hundred to four hundred.

If we scale, if we plot the full scale

of these coin flips, what we see is the following.

We see something like this, so when

we plot the whole scale from

minus one hundred to one hundred, for hundred coins,

and from minus ten thousand to ten thousand

for ten thousand coins, then we see

that the distribution becomes more and more concentrated

around zero, relative to this scale.

So if I run it again ...

Again you get each time a different distribution

but you get that the distribution

is more and more concentrated if you

flip the coin more and more times

and the width of this is square root of k,

four times two times square of k, so you see

the more coin flips you, more times you flip the coin,

the closer it is relatively to the range to zero.

Okay, so let's summarize.


We did some experiments where we summed k random numbers

that are correspond to coin flips

with probability xi, so we had xi minus one or plus one

with probability is half and half.

And our experiments show that the sum

is almost always in the range

minus four square root of k to plus four square root of k.

Okay, so we can write it this way.

If k goes to infinity, we have 4 square root of k

as the range, divided by k, so that is

four divided by the square root of k,

which is equal to zero, which goes to zero as k increases.

And so what we can say is that Sk, relative to k,

so the ratio of the number, of the difference

between heads and tails, divided by k, that goes to zero.

And that's basically what we mean

by the probability is being half and half.

Okay so again, what is probability theory?

It's the math involving proving in a precise way

the statements that we made above.

Okay, so before, we just kind of did simulations

and alluded to something that will prove in the future

but that's really what probability theory is,

is proving these in a precise way.

In most cases, we can approximate the output,


these probabilities, using simulations.

These are called Monte-Carlo simulations,

and that's essentially what we did

in this little experiment that we did.

So why isn't that enough?

Because, first of all, calculating

the probability gives you a precise answer,

and doing Monte-Carlo simulations

just gives you an approximation

and you need to run the experiment longer and longer

to get more and more accurate answers.

And the second is that it is much

faster than Monte-Carlo simulations,

essentially for the same reasons.

Okay, so that is a quick description of what is probability

and next time, we're going to talk about what is statistics.

See you there.

End of transcript. Skip to the start.

POLL

You might also like