StockWatson Econ CH 2

Introduction to Econometrics
Chapters 2
Review of Probability
Ms. Bayan Baabad

2.1 Random Variables and Probability Distributions
Before we starts, we need to know some definitions
• Probabilities : The probability of an outcome is the proportion of the time that the outcome occurs
in the long run.
• The gender of the next new person you meet, your grade on an exam, and the number of times your
computer will crash while you are writing a term paper all have an element of chance or
randomness.
Outcomes:The mutually exclusive potential results of a random process

Probability Distribution of a Discrete Random
Variable
• Probability distribution: all possible values (M) of the variable and the probability (Pr) that each
value will occur.
• Probabilities of events: The probability of an event can be computed from the probability
distribution.
• For example : the probability of the event of one or two crashes is the sum of the probabilities of
the constituent outcomes. That is, Pr(M = 1 or M = 2) = Pr(M = 1) + Pr(M = 2)
= 0.10 + 0.06 = 0.16, or 16%.

• Cumulative probability distribution: is the probability that the random variable
is less than or equal to a particular value.
• For example, the probability of at most one crash, Pr(M … 1), is 90%, which
is the sum of the probabilities of no crashes (80%) and of one crash (10%).
2.2 Expected Values, Mean, and Variance
• Expected value :The expected value or mean of a random
variable is the average value over many repeated trails or
occurrences.
Example of Expected Values:
Assume that you assign the following subjective probabilities for your final grade in your
econometrics course (the standard GPA scale of 4 = A to 0 = F applies):
The expected value is:

Grade Probability
A 0.20
A) 3.0 B
C
0.50
0.20
B) 3.5 D 0.08
C) 2.78 F 0.02
D) 3.25
• The Variance :
• The variance of a random variable Y is the expected value of the square of the
deviation of Y from its mean.
• The variance is a measure of the spread of a probability distribution.
• The variance of Y equals
The Skewness :
• of a distribution provides mathematical way to describe how much a
distribution deviates from symmetry.
The skewness of the distribution of a random variable Y is

Kurtosis: is a measure of how much mass is in the tails of a distribution
• If a random variable has extreme values “outliers” the
kurtosis will be high
• The Kurtosis is unit free and cannot be negative
• For normal distribution the skewness and kurtosis are 0-3

The Conditional distribution:
• The distribution of a random variable conditional on another random
variable taking on a specific.
• In general the conditional distribution of Y given X is
• The conditional expectation of Y given X is

Law of iterated expectations
• states that the mean of Y is the weighted average of the
conditional expectation of Y given X, weighted by the
probability distribution of X
• Independence :Two random variables X and Y are
independently distributed, or independent, if knowing the
value of one of the variables provides no information about
the other.
• Covariance: One measure of the extent to which two
random variables move together is their covariance.
The covariance between X and Y is the expected value
• Correlation: is an alternative measure of dependence

between X and Y that solves the “units” problem of
the covariance.
Correlation:
is an alternative measure of dependence between X and Y that solves the
“units” problem of the covariance.
The uncorrelation between X and Y in all case :
• Being independent
• Having a zero covariance
• E (Y | X) =0
The Normal Distribution :
• A normal distribution with mean and standard deviation is denoted
• as N ( ; )
• to Standardize the variable by first subtracting the mean, then by
dividing the result by the standard deviation.
The Student t distribution
• with m degrees of freedom is the distribution the random variable
• The distribution of the ratio of a standard normal random variable,
divided by the square root of an independently distributed chi-squared
random variable with m degrees of freedom divided by m.

The sample average:
• The sample average of a randomly drawn sample is a random
variable with a probability distribution called the sampling
distribution.
• The mean of Y is
Review of Statistics
• Hypothesis Testing for Mean of Samples
• Confidence Interval for Mean of Samples

Hypothesis Testing for MEAN
The hypothesis testing problem (for the mean): make a provisional decision
based on the evidence at hand whether a null hypothesis is true, or instead
that some alternative hypothesis is true. That is, test
• H0: E(Y) = μY,0 vs. H1: E(Y) > μY,0 (1-sided, >)
• H0: E(Y) = μY,0 vs. H1: E(Y) < μY,0 (1-sided, <)
• H0: E(Y) = μY,0 vs. H1: E(Y) ≠ μY,0 (2-sided)
1-2
• A statistical test uses the data obtained from a sample to make a decision about
• whether the null hypothesis should be rejected.
• The numerical value obtained from a statistical test is called the test value.
• The level of significance is the maximum probability of committing a type I error. This
• probability is symbolized by a (Greek letter alpha). That is, P(type I error) a.
• The critical value separates the critical region from the noncritical region. The symbol
• for critical value is C.V.
• The critical or rejection region is the range of values of the test value that indicates
• that there is a significant difference and that the null hypothesis should be rejected.
• The noncritical or nonrejection region is the range of values of the test value that
• indicates that the difference was probably due to chance and that the null hypothesis
• should not be rejected.
• A one-tailed test indicates that the null hypothesis should be rejected when the test value
is in the critical region on one side of the mean.
• In a two-tailed test, the null hypothesis should be rejected when the test value is in either
of the two critical regions.
1-3
1. If s is known, use the z test. The variable must be normally distributed if n < 30.
2. If s is unknown but n >=30, use the t test.
3. If s is unknown and n<30, use the t test.
1-4
Find critical value in z-statistics
1-6
Finding critical value for t-statistics
(d.f. = n 1)
1-7
Comments on Student t distribution, ctd.
2. If the sample size is moderate (several dozen) or large (hundreds or more),

the difference between the t-distribution and N(0,1) critical values is
negligible. Here are some 5% critical values for 2-sided tests:
degrees of freedom 5% t-distribution

(n – 1) critical value
10 2.23
20 2.09
30 2.04
60 2.00
∞ 1.96
1-8
Running a z test on your data requires five steps:
1. State the null hypothesis and alternate hypothesis.

2. Choose an alpha level.
3. Find the critical value of z in a z table.
4. Decision (use any of the 3 methods Traditional – p-value – Confidence interval)
5. Summarize the results
Running a t test on your data requires five steps:

If Yi, i = 1,…, n is i.i.d. N(μY, s Y2), then the t-statistic has the Student t-
distribution with n – 1 degrees of freedom.
The critical values of the Student t-distribution is tabulated in the back of all
statistics books. Remember the recipe?
1. State the null hypothesis and alternate hypothesis.
2. Choose an alpha level.
3. Find the critical value of t in a t table.
4. Decision (use any of the 3 methods Traditional – p-value – Confidence interval)
5. Summarize the results
1-9
Methods used to test hypotheses
The three methods used to test hypotheses are
• 1. The traditional method
• 2. The P-value method
• 3. The confidence interval method
1-10
The Traditional Method
A researcher wishes to see if the mean number of days that a basic, low-price, A medical investigation claims that the average number of infections per
small automobile sits on a dealer’s lot is 29. A sample of 30 automobile dealers week at a hospital in southwestern Pennsylvania is 16.3. A random sample
has a mean of 30.1 days for basic, low-price, small automobiles. At a 0.05, test of 10 weeks had a mean number of 17.7 infections. The sample standard
the claim that the mean time is greater than 29 days. The standard deviation of deviation is 1.8. Is there enough evidence to reject the investigator’s claim
the population is 3.8 days. at a 0.05?
1-11
p-value Method
A researcher wishes to test the claim that the average cost of tuition and fees
at a four year public college is greater than $5700. She selects a random
sample of 36 four-year public colleges and finds the mean to be $5950. The
population standard deviation is $659. Is there evidence to support the claim at
a 0.05? Use the P-value method.
1-12
Comments on Student t
distribution, ctd.
4. You might not know this. Consider the t-statistic testing the hypothesis
that two means (groups s, l) are equal:
Ys - Yl Ys - Yl
t= =
ss2 sl2 SE(Ys - Yl )
+n
ns l
Even if the population distribution of Y in the two groups is normal, this
statistic doesn’t have a Student t distribution!
There is a statistic testing this hypothesis that has a normal distribution,
the “pooled variance” t-statistic – see SW (Section 3.6) – however the
pooled variance t-statistic is only valid if the variances of the normal
distributions are the same in the two groups. Would you expect this to
be true, say, for men’s v. women’s wages?
1-13
Confidence Interval
• A 95% confidence interval for μY is an interval that contains the true
value of μY in 95% of repeated samples.
• Digression: What is random here? The values of Y1,...,Yn and thus any
functions of them – including the confidence interval. The confidence
interval will differ from one sample to the next. The population
parameter, μY, is not random; we just don’t know it.
Confidence Intervals for the Mean
When s Y2 Is Known
Confidence Intervals for the Mean

When s Y2 Is unknown
1-15
Ten randomly selected people were asked how long they slept at night. The mean time was
7.1 hours, and the standard deviation was 0.78 hour. Find the 95% confidence interval of the
mean time. Assume the variable is normally distributed.
Testing the Difference Between Two Means of Independent Samples:
Using the t Test
Hypothesis testing for the Difference of

Two Means: Independent Samples
Confidence Intervals for the Difference of

Hypothesis testing for the Difference of
Confidence Intervals for the Difference of
1-20

StockWatson Econ CH 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

StockWatson Econ CH 2

Uploaded by

Copyright:

Available Formats

Introduction to Econometrics

Ms. Bayan Baabad

Before we starts, we need to know some definitions

in the long run.

Outcomes:The mutually exclusive potential results of a random process

value will occur.

the constituent outcomes. That is, Pr(M = 1 or M = 2) = Pr(M = 1) + Pr(M = 2)

= 0.10 + 0.06 = 0.16, or 16%.

is less than or equal to a particular value.

• Expected value :The expected value or mean of a random

variable is the average value over many repeated trails or

econometrics course (the standard GPA scale of 4 = A to 0 = F applies):

The expected value is:

• of a distribution provides mathematical way to describe how much a

distribution deviates from symmetry.

The skewness of the distribution of a random variable Y is

kurtosis will be high

• The Kurtosis is unit free and cannot be negative

• For normal distribution the skewness and kurtosis are 0-3

• The conditional expectation of Y given X is

• states that the mean of Y is the weighted average of the

conditional expectation of Y given X, weighted by the

independently distributed, or independent, if knowing the

value of one of the variables provides no information about

random variables move together is their covariance.

The covariance between X and Y is the expected value

• Correlation: is an alternative measure of dependence

“units” problem of the covariance.

The uncorrelation between X and Y in all case :

• Having a zero covariance

• The distribution of the ratio of a standard normal random variable,

divided by the square root of an independently distributed chi-squared

random variable with m degrees of freedom divided by m.

• The sample average of a randomly drawn sample is a random

variable with a probability distribution called the sampling

• Hypothesis Testing for Mean of Samples

• Confidence Interval for Mean of Samples

2. If the sample size is moderate (several dozen) or large (hundreds or more),

degrees of freedom 5% t-distribution

1. State the null hypothesis and alternate hypothesis.

Running a t test on your data requires five steps:

• 1. The traditional method

• 2. The P-value method

• 3. The confidence interval method

Confidence Intervals for the Mean

Hypothesis testing for the Difference of

Confidence Intervals for the Difference of

You might also like