You are on page 1of 39

Introduction to Econometrics

Chapters 2

Review of Probability

Ms. Bayan Baabad


2.1 Random Variables and Probability Distributions

Before we starts, we need to know some definitions

• Probabilities : The probability of an outcome is the proportion of the time that the outcome occurs

in the long run.

• The gender of the next new person you meet, your grade on an exam, and the number of times your

computer will crash while you are writing a term paper all have an element of chance or

randomness.

Outcomes:The mutually exclusive potential results of a random process


Probability Distribution of a Discrete Random
Variable

• Probability distribution: all possible values (M) of the variable and the probability (Pr) that each

value will occur.

• Probabilities of events: The probability of an event can be computed from the probability

distribution.

• For example : the probability of the event of one or two crashes is the sum of the probabilities of

the constituent outcomes. That is, Pr(M = 1 or M = 2) = Pr(M = 1) + Pr(M = 2)

= 0.10 + 0.06 = 0.16, or 16%.


• Cumulative probability distribution: is the probability that the random variable

is less than or equal to a particular value.

• For example, the probability of at most one crash, Pr(M … 1), is 90%, which

is the sum of the probabilities of no crashes (80%) and of one crash (10%).
2.2 Expected Values, Mean, and Variance

• Expected value :The expected value or mean of a random

variable is the average value over many repeated trails or

occurrences.
Example of Expected Values:
Assume that you assign the following subjective probabilities for your final grade in your

econometrics course (the standard GPA scale of 4 = A to 0 = F applies):

The expected value is:


Grade Probability
A 0.20

A) 3.0 B
C
0.50
0.20
B) 3.5 D 0.08

C) 2.78 F 0.02

D) 3.25
• The Variance :
• The variance of a random variable Y is the expected value of the square of the
deviation of Y from its mean.
• The variance is a measure of the spread of a probability distribution.
• The variance of Y equals
The Skewness :

• of a distribution provides mathematical way to describe how much a

distribution deviates from symmetry.

The skewness of the distribution of a random variable Y is


Kurtosis: is a measure of how much mass is in the tails of a distribution
• If a random variable has extreme values “outliers” the

kurtosis will be high

• The Kurtosis is unit free and cannot be negative

• For normal distribution the skewness and kurtosis are 0-3


The Conditional distribution:
• The distribution of a random variable conditional on another random
variable taking on a specific.
• In general the conditional distribution of Y given X is

• The conditional expectation of Y given X is


Law of iterated expectations

• states that the mean of Y is the weighted average of the

conditional expectation of Y given X, weighted by the

probability distribution of X
• Independence :Two random variables X and Y are

independently distributed, or independent, if knowing the

value of one of the variables provides no information about

the other.
• Covariance: One measure of the extent to which two

random variables move together is their covariance.

The covariance between X and Y is the expected value

• Correlation: is an alternative measure of dependence


between X and Y that solves the “units” problem of
the covariance.
Correlation:
is an alternative measure of dependence between X and Y that solves the

“units” problem of the covariance.

The uncorrelation between X and Y in all case :

• Being independent

• Having a zero covariance

• E (Y | X) =0
The Normal Distribution :
• A normal distribution with mean and standard deviation is denoted
• as N ( ; )
• to Standardize the variable by first subtracting the mean, then by
dividing the result by the standard deviation.
The Student t distribution
• with m degrees of freedom is the distribution the random variable

• The distribution of the ratio of a standard normal random variable,

divided by the square root of an independently distributed chi-squared

random variable with m degrees of freedom divided by m.


The sample average:

• The sample average of a randomly drawn sample is a random

variable with a probability distribution called the sampling

distribution.

• The mean of Y is
Review of Statistics

• Hypothesis Testing for Mean of Samples

• Confidence Interval for Mean of Samples


Hypothesis Testing for MEAN
The hypothesis testing problem (for the mean): make a provisional decision
based on the evidence at hand whether a null hypothesis is true, or instead
that some alternative hypothesis is true. That is, test
• H0: E(Y) = μY,0 vs. H1: E(Y) > μY,0 (1-sided, >)
• H0: E(Y) = μY,0 vs. H1: E(Y) < μY,0 (1-sided, <)
• H0: E(Y) = μY,0 vs. H1: E(Y) ≠ μY,0 (2-sided)

1-2
• A statistical test uses the data obtained from a sample to make a decision about
• whether the null hypothesis should be rejected.
• The numerical value obtained from a statistical test is called the test value.
• The level of significance is the maximum probability of committing a type I error. This
• probability is symbolized by a (Greek letter alpha). That is, P(type I error) a.
• The critical value separates the critical region from the noncritical region. The symbol
• for critical value is C.V.
• The critical or rejection region is the range of values of the test value that indicates
• that there is a significant difference and that the null hypothesis should be rejected.
• The noncritical or nonrejection region is the range of values of the test value that
• indicates that the difference was probably due to chance and that the null hypothesis
• should not be rejected.
• A one-tailed test indicates that the null hypothesis should be rejected when the test value
is in the critical region on one side of the mean.
• In a two-tailed test, the null hypothesis should be rejected when the test value is in either
of the two critical regions.

1-3
1. If s is known, use the z test. The variable must be normally distributed if n < 30.
2. If s is unknown but n >=30, use the t test.
3. If s is unknown and n<30, use the t test.

1-4
Find critical value in z-statistics
1-6
Finding critical value for t-statistics
(d.f. = n 1)

1-7
Comments on Student t distribution, ctd.

2. If the sample size is moderate (several dozen) or large (hundreds or more),


the difference between the t-distribution and N(0,1) critical values is
negligible. Here are some 5% critical values for 2-sided tests:

degrees of freedom 5% t-distribution


(n – 1) critical value
10 2.23
20 2.09
30 2.04
60 2.00
∞ 1.96

1-8
Running a z test on your data requires five steps:

1. State the null hypothesis and alternate hypothesis.


2. Choose an alpha level.
3. Find the critical value of z in a z table.
4. Decision (use any of the 3 methods Traditional – p-value – Confidence interval)
5. Summarize the results

Running a t test on your data requires five steps:


If Yi, i = 1,…, n is i.i.d. N(μY, s Y2), then the t-statistic has the Student t-
distribution with n – 1 degrees of freedom.
The critical values of the Student t-distribution is tabulated in the back of all
statistics books. Remember the recipe?
1. State the null hypothesis and alternate hypothesis.
2. Choose an alpha level.
3. Find the critical value of t in a t table.
4. Decision (use any of the 3 methods Traditional – p-value – Confidence interval)
5. Summarize the results
1-9
Methods used to test hypotheses
The three methods used to test hypotheses are

• 1. The traditional method

• 2. The P-value method

• 3. The confidence interval method

1-10
The Traditional Method

A researcher wishes to see if the mean number of days that a basic, low-price, A medical investigation claims that the average number of infections per
small automobile sits on a dealer’s lot is 29. A sample of 30 automobile dealers week at a hospital in southwestern Pennsylvania is 16.3. A random sample
has a mean of 30.1 days for basic, low-price, small automobiles. At a 0.05, test of 10 weeks had a mean number of 17.7 infections. The sample standard
the claim that the mean time is greater than 29 days. The standard deviation of deviation is 1.8. Is there enough evidence to reject the investigator’s claim
the population is 3.8 days. at a 0.05?

1-11
p-value Method

A researcher wishes to test the claim that the average cost of tuition and fees
at a four year public college is greater than $5700. She selects a random
sample of 36 four-year public colleges and finds the mean to be $5950. The
population standard deviation is $659. Is there evidence to support the claim at
a 0.05? Use the P-value method.

1-12
Comments on Student t
distribution, ctd.
4. You might not know this. Consider the t-statistic testing the hypothesis
that two means (groups s, l) are equal:
Ys - Yl Ys - Yl
t= =
ss2 sl2 SE(Ys - Yl )
+n
ns l
Even if the population distribution of Y in the two groups is normal, this
statistic doesn’t have a Student t distribution!
There is a statistic testing this hypothesis that has a normal distribution,
the “pooled variance” t-statistic – see SW (Section 3.6) – however the
pooled variance t-statistic is only valid if the variances of the normal
distributions are the same in the two groups. Would you expect this to
be true, say, for men’s v. women’s wages?

1-13
Confidence Interval
• A 95% confidence interval for μY is an interval that contains the true
value of μY in 95% of repeated samples.
• Digression: What is random here? The values of Y1,...,Yn and thus any
functions of them – including the confidence interval. The confidence
interval will differ from one sample to the next. The population
parameter, μY, is not random; we just don’t know it.
Confidence Intervals for the Mean
When s Y2 Is Known

Confidence Intervals for the Mean


When s Y2 Is unknown

1-15
Ten randomly selected people were asked how long they slept at night. The mean time was
7.1 hours, and the standard deviation was 0.78 hour. Find the 95% confidence interval of the
mean time. Assume the variable is normally distributed.
Testing the Difference Between Two Means of Independent Samples:
Using the t Test

Hypothesis testing for the Difference of


Two Means: Independent Samples

Confidence Intervals for the Difference of


Two Means: Independent Samples
Hypothesis testing for the Difference of
Two Means: Independent Samples
Confidence Intervals for the Difference of
Two Means: Independent Samples
1-20

You might also like