You are on page 1of 17

Random Variables

A random variable is a variable whose value is determined by the outcome


of a random experiment.

Example: In a single die toss experiment, the sample space consists of six
elements, 1, …, 6, denoted by i, i=1,…,6. A random variable may be
defined for this experiment: let the value of the random variable be equal
to the value of the dice, i.e

x(i) = i, i=1, 2, …, 6

The random variables discussed in our example could take on only a set of
discrete numbers. Such random variables are know as discrete random
variables (i.e. variables assume countable values). Random variables of another
type, know as continuous random variables, may take on values anywhere
within a continuous ranges.
Binomial Distribution
In many Geographic studies, we often face a situation where we deal with a
random variable that only takes two values, zero-one, yes-no, presence-absence,
over a given period of time. Since there are only two possible outcomes,
knowing the probability of one knows the probability of the other.

P(1)=p
P(0)=1-p=q

If the random experiment is conducted n times, then the probability for the event
to happen x times follow binomial distribution:
 n  x n x n!
P( x)    p q  p x q n x
 x x!(n  x)!

Where n! the factorial of n. e.g. 5!=5*4*3*2*1=120.


Binomial Distribution Example

For example, the presence-absence of drought in a year directly influence


the profit of agriculture due to irrigation costs added in a dry year. Suppose a
geographer is hired to do risk analysis for an Ag. Company whether a piece
of land is profitable for agriculture. Past experience shows that irrigation can
be afforded only one year in five. According to weather record, 4 out of the
last 25 years suffered from drought in the area.

Let 1 denote drought presence, and 0 denote drought absence, then


P(1)=4/25=0.16, so P(0)=1-0.16=0.84.

For 5 years, there are six possibilities of drought occurrence: 0, 1, 2, 3, 4, 5.


Drought occurrence probability in 5 years (probability mass function):

5!
P(0)  0.160  0.8450  0.418
0!(5  0)!
5! The probability of profitable
P(1)  0.161  0.8451  0.398
1!(5  1)! agriculture is summation of
probabilities of no drought
5! and one drought in five years,
P(2)  0.162  0.8452  0.152
2!(5  2)! i.e. 0.418+0.398=0.816

5! This the risk is 18.4% in five


P(3)  0.163  0.8453  0.029
3!(5  3)! years.
5!
P(4)  0.164  0.8454  0.003
4!(5  4)!

5!
P(5)  0.165  0.8455  0.000
5!(5  5)!
Poisson Distribution

A discrete random variable is said to follow Poisson Distribution


if its probability mass function is

e   x
P( x) 
x!

Where x = 0, 1, 2, …, and  is the mean.


Poisson Distribution Example
Suppose a geographer is assessing the risk of summer wheat yields to devastating
hailstorms in a particular geographic location. Weather records show
that in the past 35 years show that 10 years with no hailstorm, 13 years with one
hailstorm, 8 years with 2 hailstorms, 3 years with 3 hailstorms, and 1 year with
1 hailstorm. Assume the occurrence of hailstorm is independent of past or
future occurrences and can be considered random. Then the number of hailstorms
happening in any given year follows Poisson distribution. In the above example,
there are 42 hailstorms in 35 years, thus the mean number of hailstorms in a
year is 1.2, then

e 1.21.20 e 1.21.21
P(0)   0.301 P(1)   0.361
0! 1!
e1.21.22 e1.21.23
P(2)   0.217 P(3)   0.087
2! 3!
e1.21.24
P(4)   0.026
4!
Normal Distribution

A continuous random variable is said to be normally distributed if its pdf is

( x )2
1 
f ( x)  e 2 2 Where (, ) are the distribution
2  parameters

f(x)

x

What Does the Mean Tell Us?
For a random variable that follows normal distribution (, ),

f(x)

1 2 x

The mean value tells us where the value x is concentrated most.


What Does the Variance Tell Us?

f(x)
1
2 >  1

2

 x

The variance tell how the value is spread. The larger the variance,
the more even the value spreads over a large range. Is this good
or bad?
f(x)

 x
x

f(x)
x

x
Does the variance change here? Why?
Standard Normal Distribution

Prob

-3 -2 -1 0 1 2 3
68.3%

95.5%

99.7%
Hypothesis Testing

One application for the probability distribution is hypothesis testing, which


is a standard statistical analysis for “difference” or “effect”. For example,
before a new drug is put into market, FDA requires a detailed statistical
analysis report on how effective the new drug is. This often requires a lot of
random experiments. What they do is they usually ask a group of volunteers
to test the new drug, and in the meantime, they have another group who
may not take anything or a traditional drug for the same purpose. Then
they test how effective the new drug is. The way they do this analysis is
based on two statements:
Statement 1: The new drug is not effective
Statement 2: The new drug is effective

These two statement is mutually exclusive, meaning negation of statement 1


naturally goes to statement 2. These two statements are hypotheses. The
experimental results from the volunteers will be used to test which statement
is acceptable, or we call it hypothesis testing.
Hypothesis Testing

Null Hypothesis (H0): no effect or no difference

Alternative Hypothesis(H1): Prob

The null hypothesis is given on which a probability


distribution will be developed and its probability
is then used to test the hypotheses.

However, the decision made based on the


statistics is not always correct.
We make mistakes. Types of error we -3 -2 -  + +2 +3
have. 68.3%

95.5%
Type I: H0 is true,but is rejected
Type II: H0 is false, but is not rejected 99.7%
Student t distribution
Probability Density Function:
 k 1
   k 1 
2   2 
2   x 
f ( x)   1   Where k is degrees of freedom
 
k   
k k 
2

Mean: 0

Variance: k/(k-2), k>2


Exponential Distribution
Probability density function
 x
f ( x)  e
Mean: 1/λ

Variance: 1/ λ2
Chi-Square distribution
Probability Density Function:
k
1 2
Where k is degrees of freedom, and x≥0
   k  x
f ( x)   2
x  2 1 e 2
k
 
2

Mean: k

Variance: 2k
F distribution
Probability Density function:

U1
d1 Where U1 and U2 are chi-square distribution
f ( x)  with d1 and d2 degrees of freedom, respectively
U2
d2