This action might not be possible to undo. Are you sure you want to continue?

**Gov2000: Quantitative Methodology for Political Science I
**

Lecture 3: Univariate Statistical Inference

October 1, 2007

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Outline

1 Point Estimation Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties Interval Estimation Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties Testing Some Statistical Decision Theory Sampling Distributions for Test Statistics p-Values, Rejection Regions, and CIs

2

3

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Point Estimation

Suppose we are primarily interested in speciﬁc characteristics of the population distribution. A parameter is a characteristic of the population distribution (e.g. the mean), and is often denoted with a greek letter. (e.g. θ) A statistic is a function of the sample. Often we use a statistic to estimate (or guess) the value of a parameter, and we will ˆ denote this with a hat (e.g. θ). Such estimation is known as point estimation. ˆ Point Estimators, written as θ or maybe X , are random quantities. Point Estimates are realized values of an estimator, and hence they are not random ¯ (e.g. x ).

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Consider income data from the 1996 ANES

Histogram of income

Density

0.00 0

0.02

0.04

0.06

0.08

5

10 income

15

20

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Histogram of income

0.08

Population Density

Density

0.00 0

0.02

0.04

0.06

5

10 income

15

20

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

The Balance Point for the Density

We may not have enough data to get a good estimate of the density (inﬁnite data histogram), but we may have enough data to estimate one characteristic (parameter) of the density. Often we choose the balance point as our parameter of interest. Also Known As: expected value µ population mean true mean true average inﬁnite data average

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Histogram of income

0.08

Density Balance Point

Density

0.00 0

0.02

0.04

0.06

5

10 income

15

20

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Why the balance point?

It is a reasonable measure for the “center” of the density. We have some intuition about balance points. The balance point tells us a lot about the normal density. Many intuitive estimators for the density balance point have properties that are easy to describe.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Estimators for the Density Balance Point

**Some possibilities for µ: b
**

1 2 3 4

**Y1 , the ﬁrst data observation
**

1 (Y1 2

**+ Yn ), the average of the ﬁrst and the last observations
**

1 (Y1 n

the number 7 Yn = + · · · + Yn ), the sample average

Clearly, some of these estimators are better than others (which ones?), but how can we deﬁne “better”?

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

**Sampling Distributions of Point Estimators
**

In order to assess the properties of an estimator, we assume it has a distribution under “repeated sampling”, and we call this distribution a sampling distribution. Illustrative Example: X = the number of times a respondent voted in the last two presidential elections. We will assume three possible values {0,1,2} 8 < 1/4 x = 0 1/2 x = 1 Assume P(x) = : 1/4 x = 2 Assume n=2 Exercise:

1 2 3

List all the possible samples Calculate the probability of each sample under repeated sampling Form the sampling distribution for the sample mean

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

ANES Example

If we think of the data as randomly sampled from a density, then Y1 , . . . , Yn are independent and identically distributed (i.i.d.) random variables with, E[Yi ] = µ V [Yi ] = σ 2 Then µ, which is a function of Y1 , . . . , Yn , will be a random variable with its own b expectation and variance.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

**How to draw a sampling distribution for µ: b
**

1 2 3

sample an inﬁnite number of data sets of size n calculate µ for each data set b form an inﬁnite “data” histogram for µ, where the “data” are the µs from each b b data set

The next slide shows an approximation of this procedure for the four proposed estimators. I simulated 10,000 data sets of size n from the density shown at the beginning of the lecture notes.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

0.06

Density

0.02

Density −10 0 10 20 30 40

−0.02

−0.02

0.02

0.06

0.10

0

10

20

30

muHat1

muHat2

1.0

q

0.6

Density 5 10 15 20

Mass

0.2

−0.2

−0.1 12

0.1

0.3

14

16

18

20

22

muHat3

muHat4

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Bias

Bias is the expected difference between the estimator and the parameter. Bias is not the difference between an estimate and the parameter. h i ˆ E θ−θ h i ˆ E θ −θ

ˆ Bias(θ)

= =

For example, the sample mean is an unbiased estimator for µ. h i E X n − E[X ] E [ˆ − µ] µ 0

Bias(X n )

= = =

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Example

1 2 3 4

E[Y1 ] = µ E[ 1 (Y1 + Yn )] = 2 E[7] = 7 E[Y n ] =

1 nµ n 1 (µ 2

+ µ) = µ

=µ

Estimators 1,2, and 4 all get the right answer on average. Which is better?

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

0.06

Density

0.02

Density −10 0 10 20 30 40

−0.02

−0.02

0.02

0.06

0.10

0

10

20

30

muHat1

muHat2

1.0

q

0.6

Density 5 10 15 20

Mass

0.2

−0.2

−0.1 12

0.1

0.3

14

16

18

20

22

muHat3

muHat4

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Election Example

Let π be the proportion of voters who will vote for the Republican candidate in the 2008 general election. Let’s examine two estimators.

1 2

µ = Y1 = ˆ

1 0

vote rep otherwise

µ = class guess ˆ

Which is unbiased? Which do you prefer?

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Variance

All else equal, we prefer estimators with small variance. In particular, if two estimators are unbiased, we prefer the estimator with the smaller variance. Low variance means that under repeated sampling, the estimates are likely to be similar. Note that this doesn’t necessarily mean that a particular estimate is close to the true parameter value. Note also that the standard deviation from a sampling distribution is often called the standard error.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Variance

1 2 3 4

V [Y1 ] = σ 2 V [ 1 (Y1 + Yn )] = 2 V [7] = 0 V [Y n ] =

1 nσ 2 n2 1 V [Y1 4

+ Yn ] =

1 (σ 2 4

+ σ2 ) =

1 2 σ 2

=

1 2 σ n

Among the unbiased estimators, the sample average has the smallest variance. This means that Estimator 4 (the sample average) is likely to be closer to the true value µ, than Estimators 1 and 2. In order to fully understand this, it is helpful to again look at the sampling distributions.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

0.06

Density

0.02

Density −10 0 10 20 30 40

−0.02

−0.02

0.02

0.06

0.10

0

10

20

30

muHat1

muHat2

1.0

q

0.6

Density 5 10 15 20

Mass

0.2

−0.2

−0.1 12

0.1

0.3

14

16

18

20

22

muHat3

muHat4

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Properties and comparisons of the estimators

**Recall the deﬁnitions of the estimators:
**

1 2 3 4

**Y1 , the ﬁrst data observation
**

1 (Y1 2

**+ Yn ), the average of the ﬁrst and the last observations
**

1 (Y1 n

the number 7 Yn = + · · · + Yn ), the sample average

From the pictures on the previous slide: Estimators 1,2, and 4 are unbiased Estimator 3 has no variance Estimator 4 has the lowest variance among the unbiased estimators

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

**Least Squares Estimation
**

Choose a to minimize the sum of the squared errors.

n n X X 2 ¯ ¯ (xi − a) = {(xi − x ) + (x − a)}2 i=1 i=1 n Xn i=1 n n n X X X 2 ¯ ¯ ¯ ¯ = (xi − x ) + 2(x − a) (xi − x ) + (x − a)2 i=1 i=1 i=1 n X i=1

=

¯ ¯ ¯ ¯ (xi − x )2 + 2(x − a)(xi − x ) + (x − a)2

o

=

¯ ¯ (xi − x )2 + n(x − a)2

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Best Linear Unbiased Estimator for µ

Let X1 , ..., Xn be ∼i.i.d ?(µ, σ 2 ), Pn i=1 wi Xi is a linear estimator for µ. Show that X is the best linear unbiased estimator for µ (i.e. smallest variance unbiased estimator).

1 2 3 4

Use E[

**wi Xi ] = µ to derive something about Pn Simplify V [ i=1 wi Xi ].
**

i=1

Pn

Pn

i=1

wi .

Write each wi in this simpliﬁed expression as ...

1 n

+ ci .

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

**Mean Square Error
**

MSE is the expected squared difference between the estimator and the parameter. MSE is not the squared difference between an estimate and the parameter. Furthermore, MSE can be written as the Bias squared plus the Variance.

ˆ MSE(θ)

= =

ˆ E[(θ − θ)2 ] ˆ ˆ Bias(θ)2 + V (θ)

For example, consider the sample mean.

MSE(X n )

= = =

E[(X n − µ)2 ] Bias(X n )2 + V (X n ) 0 + V (X n )

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Example

**Assume an i.i.d. sample and recall the two possible deﬁnitions of sample variance:
**

2 S0n n 1X = (Xi − X n )2 n i=1 n 1 X (Xi − X n )2 n−1 i=1

2 S1 n =

Which has less bias? Which has smaller variance? Which has smaller MSE?

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Asymptotic Unbiasedness

b E[θn ] → θ

n=1

0.4 0.4

n = 10

0.40

n = 100

0.3

0.3

^ f(θ) θ

^ f(θ) θ

0.2

^ f(θ) θ

0 1 2 3 4

0.2

0.1

0.1

0.0

0

1

^ θ

2

3

4

0.05 0

0.10

0.15

0.20

0.25

0.30

0.35

^ θ

1

^ θ

2

3

4

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Consistency

b An estimator θ is consistent if it converges in probability to the estimand (parameter of interest).

b θn →p θ

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

**The Weak Law of Large Numbers Revisited
**

If X1 , X2 , . . . , Xn , . . . are i.i.d. with −∞ < E[X1 ] = µ < ∞, then X n →p µ

n=1

0.40 1.2

n = 10

4

n = 100

0.35

1.0

0.30

f(Xn)

0.25

f(Xn)

0.8

f(Xn)

0 1 2 3 4

0.20

0.15

0.4

0.6

0.10

0.05

0.0

0.2

0

1

2

3

4

0 0

1

2

3

1

2

3

4

Xn

Xn

Xn

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

Asymptotic Sampling Distribution

**b An estimator θn with possibly unknown sampling distribution, has asymptotic sampling distribution F if
**

1 2

b θn has a sampling distribution described by cdf Fn , and Fn →d F as n → ∞

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Point Estimators Small Sample Properties Large Sample Properties

**The Classical Central Limit Theorem
**

If X1 , X2 , . . . , Xn , . . . are i.i.d. with E[X1 ] = µ and V [X1 ] = σ 2 and E|X |2 < ∞, then √ n(X n − µ) →d N (0, σ 2 )

n=1

0.08

n=2

Density

Density 0 5 10 15 20 25

0.04

0.00

0.00 0

0.04

0.08

5

10

15

20

25

muHat4

muHat4

n=10

0.20 0.00 0.10 0.20 0.30

n=30

Density

0.00

10

15 muHat4

20

Density

0.10

12

14

16

18

20

muHat4

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

**What is Interval Estimation?
**

Point estimates attempt to predict a scalar parameter with single number. We might want more information about the uncertainty in our estimate. We may want a bound for an estimate instead of trying to predict the parameter with a single number. Interval estimation accomplishes both of these goals. For a scalar parameter θ, an interval estimator takes the following form: ˆ ˆ [θlower , θupper ] where the lower and upper bounds are random quantities. An interval estimate is a realized value from an interval estimator. For example: s s ¯ ¯ [x − 1.96 · √ , x + 1.96 · √ ] n n where the lower and upper bounds are ﬁxed quantities.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

Example: Party ID

QUESTION: --------Generally speaking, do you usually think of yourself as a REPUBLICAN, a DEMOCRAT, an INDEPENDENT, or what? Would you call yourself a STRONG [Democrat/Republican] or a NOT VERY STRONG [Democrat/Republican]? Do you think of yourself as CLOSER to the Republican Party or to the Democratic party? VALID CODES: -----------0. Strong Democrat (2/1/.) 1. Weak Democrat (2/5-8-9/.) 2. Independent-Democrat (3-4-5/./5) 3. Independent-Independent (3/./3-8-9 ; 5/./3-8-9 if not apolitical) 4. Independent-Republican (3-4-5/./1) 5. Weak Republican (1/5-8-9/.) 6. Strong Republican (1/1/.)

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

Sampling Distribution for PID Interval Estimator

Let X be a discrete random variable describing PID with the following distribution. x f (x) 0 .16 1 .15 2 .17 3 .10 4 .12 5 .14 6 .16

**Consider the following procedure.
**

1 2 3

Take a random sample of size n. ¯ ¯ Construct an interval estimate for µ (E[X ]) with the form [x − s, x + s] Repeat

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

**Sampling Distribution for PID Interval Estimator
**

Interval Estimates

10 sample 2 0 4 6 8

1

2

3 µ

4

5

6

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

Example: Feeling Thermometer Scores

============================================================================ B1. INTRO THERMOMETERS PRE ============================================================================ Please look at page 2 of the booklet. I’d like to get your feelings toward some of our political leaders and other people who are in the news these days. I’ll read the name of a person and I’d like you to rate that person using something we call the feeling thermometer. Ratings between 50 degrees and 100 degrees mean that you feel favorable and warm toward the person. Ratings between 0 degrees and 50 degrees mean that you don’t feel favorable toward the person and that you don’t care too much for that person. You would rate the person at the 50 degree mark if you don’t feel particularly warm or cold toward the person. If we come to a person whose name you don’t recognize, you don’t need to rate that person. Just tell me and we’ll move on to the next one.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

**Clinton and Edwards FTS
**

Histogram of hcFTS

80 0 0 40

Frequency

20

40 hcFTS

60

80

100

Histogram of jeFTS

Frequency

0 0

40 80

20

40 jeFTS

60

80

100

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

**Sampling Distribution for FTS Score Interval Estimator
**

Clinton FTS Mean Interval Estimates

2 4 6 8 0

sample

20

40 ^ µ

60

80

100

Edwards FTS Mean Interval Estimates

2 4 6 8 0

sample

20

40 ^ µ

60

80

100

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

Coverage Probability

Coverage probability is the probability that an interval estimator contains the true value of the parameter. ˆ ˆ P(θlower ≤ θ ≤ θupper ) = 1 − α This is usually written as 1 − α. (To be explained later). Question: What is the probability that an interval estimate contains the true value of the parameter. For example, s s ¯ ¯ [x − 1.96 · √ , x + 1.96 · √ ] n n

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

**FTS Example: Mean from Normal Distribution (Variance Known)
**

Suppose we assume that JE FTS scores as normally distributed, and we know (somehow) that σ = 25.5. Recall that if X1 , ..., Xn ∼i.i.d. N(µ, σ 2 ) , then µ−µ b

σ √ n

∼ N(0, 1)

P „ P

−1.96 ≤

µ−µ b

σ √ n

! ≤ 1.96 « = 95% = 95%

σ σ µ − 1.96 √ ≤ µ ≤ µ + 1.96 √ b b n n σ µ ± 1.96 √ ˆ n

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

Is 95% all there is?

Our 95% CI had the following form: σ µ ± 1.96 √ ˆ n Where did the 1.96 come from? ! ≤ 1.96 = 95%

P

−1.96 ≤

µ−µ b

σ √ n

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

(1 − α)% Conﬁdence Intervals

P „ P

−zα/2 ≤

µ−µ b

σ √ n

! ≤ zα/2 « = (1 − α)% = (1 − α)%

σ σ µ − zα/2 √ ≤ µ ≤ µ + zα/2 √ b b n n

We usually construct the (1 − α)% conﬁdence interval with the following formula. σ µ ± zα/2 √ ˆ n

Question: Why not 100% conﬁdence?

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

FTS Example: Mean from Normal Distribution (Variance Unknown)

**Suppose we model JE FTS scores as normal distributed with σ unknown. Recall that if X1 , ..., Xn ∼i.i.d. N(µ, σ 2 ) , then µ−µ b
**

σ √ n

∼ N(0, 1)

Question: Why can’t our previous interval be used? σ µ ± zα/2 √ ˆ n

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

Estimating σ and the SE

**Recall that the sample variance can be written as the following:
**

n 1 X S = (Xi − X n )2 n−1 2 i=1

and that the sample standard deviation can be written as p S = S2 We will plug in S for σ and our estimated standard error will be S c µ SE[ˆ] = √ n

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

Recall the t distribution

If Z ∼ N(0, 1), Y ∼ χ2 , and Z and Y are independent, then ν Z X ≡ q

Y ν

follows a tν distribution. If a sample (X1 , . . . , Xn ) of any size n is taken from a normal distribution with known mean and unknown variance then the sampling distribution of the sample mean minus the known mean divided by the sample standard error will have the t distribution with ν = n − 1.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

(1 − α)% t- Intervals

µ−µ b

σ √ n

∼ tn−1

0 P @−tn−1,α/2 ≤ „ P

µ−µ b

σ ˆ √ n

1 ≤ tn−1,α/2 A = (1 − α)% « = (1 − α)%

σ ˆ σ ˆ µ − tn−1,α/2 √ ≤ µ ≤ µ + tn−1,α/2 √ b b n n

We usually construct the (1 − α)% conﬁdence interval with the following formula. σ ˆ µ ± tn−1,α/2 √ ˆ n For a 95% conﬁdence interval, tn−1,α/2 is often close to 2.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

Asymptotic Coverage Probability

Without making an assumption about the population distribution, we will often not know the sampling distribution of the interval estimator, and therefore, we will not know the coverage probability. We may be able to derive the asymptotic coverage probability instead. ˆ ˆ P(θlower ,n ≤ θ ≤ θupper ,n ) → 1 − α as n→∞

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

FTS Example: Mean from Unknown Distribution

**Suppose we do not assume a distribution for HC FTS. Recall that if X1 , ..., Xn ∼i.i.d. ?(µ, σ 2 ) , then µn − µ b
**

1 √ n

→d N(0, σ 2 )

**and σn →p σ ˆ it can be shown that µn − µ b
**

σ ˆ √n n

→d N(0, 1)

Therefore, our normal quantile conﬁdence intervals will have valid asymptotic coverage. (t-quantile intervals also)

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

0.6

t1 t4 t 15

Density

0.0

0.1

0.2

0.3

0.4

0.5

−4

−2

0 x

2

4

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Sampling Distributions for Interval Estimators Small Sample Properties Large Sample Properties

**Example: Clinton and Edwards FTS Interval Estimates
**

Clinton and Edwards 95% CIs

3.0 2.5

Clinton Edwards 2.0 0.0 40 0.5 1.0 1.5

45

50 ^ µ

55

60

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Some Statistical Decision Theory Sampling Distributions for Test Statistics p-Values, Rejection Regions, and CIs

The Trial Analogy

Suppose we must decide whether to convict or acquit a defendant based on evidence presented at a trial. There are four possible outcomes.

**Table: Decisions and Outcomes
**

Truth Guilty Correct Type II Error Innocent Type I Error Correct

Decision

Convict Acquit

Our goal is to limit the probability of error.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Some Statistical Decision Theory Sampling Distributions for Test Statistics p-Values, Rejection Regions, and CIs

**The Trial Analogy
**

Suppose we can somehow model the probabilities for the various outcomes conditional on the true state of the world.

**Table: Probabilities given the true state of the world
**

Truth Guilty 1−β β Innocent α 1−α

Decision

Convict Acquit

We would like α and β to be small, but it may be difﬁcult to achieve both goals. The standard statistical approach is to pick a small level for α (e.g. 5%), and then try to minimize β given this constraint.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Some Statistical Decision Theory Sampling Distributions for Test Statistics p-Values, Rejection Regions, and CIs

**The Statistical Version
**

Suppose we must decide whether to reject or fail to reject a prior hypothesis about the world (null hypothesis) in favor of an alternative hypothesis.

**Table: Decisions and Outcomes
**

Truth Alternative Hypothesis Correct Type II Error Null Hypothesis Type I Error Correct

Decision

Reject Fail to Reject

**Table: Probabilities given the true state of the world
**

Truth Alternative Hypothesis 1−β β Null Hypothesis α 1−α

Decision

Reject Fail to Reject

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Edwards FTS Example

As in our previous example, let µ be the expected value of JE FTS for the population. Lets assume the population mean for HC FTS is 55 (i.e. equal to the sample mean) Here are two possible hypothesis tests: H0 : µ = 55 H1 : µ = 55 H0 : µ ≤ 55 H1 : µ > 55

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Test Statistics

A test statistic is a function of the sample and the null hypothesis (and may provide evidence against the null hypothesis). Examples:

1 2

If H0 : µ = 55, then X − 55 would be a test statistic. If H0 : µ ≤ 55, then X − 55 would be a test statistic.

Why does the second test statistic make sense given the inequality in the null hypothesis?

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

The One Sample t-Statistic

Let µ0 be the “null” value of the parameter µ (e.g. 55). Then the one sample t-statistic can be written as the following: X − µ0

S √ n

Notice that being a function of the sample, this t-statistic will have a sampling distribution.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Null Distributions for Test Statistics

A null distribution is the sampling distribution for the test statistic when the null hypothesis is true. More exactly, the null distribution is the sampling distribution for the test statistic when θ = θ0 . For our example, the null distribution is the sampling distribution of the t-statistic X − 55

S √ n

when µ = 55.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

The Null Distribution for the t-Statistic

**Suppose we model JE FTS scores as normally distributed with σ unknown. Recall that if X1 , ..., Xn ∼i.i.d. N(µ, σ 2 ) , then X − 55
**

S √ n

∼ tn−1

when µ = 55.

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

**Null Distribution (µ = 55 and n = 520)
**

Null Distribution

0.4 f(test statistic) 0.0 −3 0.1 0.2 0.3

−2

−1

0 test statistic

1

2

3

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

p-Value

The p-value is the probability under the null distribution of getting a sample at least as extreme as the one we got. “Extreme” is deﬁned by the alternative hypothesis. Examples: ˛ H1 : µ = 55 ⇒ p-value = P(tstat ≥ |tobs | ∪ tstat ≤ −|tobs |˛µ = 55) ˛ H1 : µ > 55 ⇒ p-value = P(tstat ≥ tobs ˛µ = 55)

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

**One and Two Sided p-values
**

Two Sided p−value

f(test statistic) 0.4

0.0

0.2

t−obs −t−obs

−3

−2

−1

0 test statistic

1

2

3

**One Sided p−value
**

f(test statistic) 0.4

0.0

0.2

t−obs

−3

−2

−1

0 test statistic

1

2

3

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

Rejection Regions

Recall that α is the probability of Type I Error. Often we want to limit α to 5% while minimizing the probability of Type II Error. This can be accomplished in the following manner.

Two Sided Rejection Region (α=5%) α

f(test statistic) 0.4

0.0

0.2

fences t−obs

−3

−2

−1

0 test statistic

1

2

3

**One Sided Rejection Region (α=5%) α
**

f(test statistic) 0.4

0.0

0.2

fence t−obs

−3

−2

−1

0 test statistic

1

2

3

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

**Rejection Regions and p-values
**

Notice the relationship between α and p-value.

Two Sided Rejection Region (α=5%) α

f(test statistic) 0.4

fences t−obs −t−obs

0.0 −3

0.2

−2

−1

0 test statistic

1

2

3

**One Sided Rejection Region (α=5%) α
**

f(test statistic) 0.4

0.0

0.2

fence t−obs

−3

−2

−1

0 test statistic

1

2

3

Gov2000: Quantitative Methodology for Political Science I

Point Estimation Interval Estimation Testing

**α Rejection Regions and 1 − α CIs
**

Rejection Regions and CIs (α=5%) α

f(X|H 0)

0.0

0.1

0.2

0.3

fences CI

50

52

54 X

56

58

60

Gov2000: Quantitative Methodology for Political Science I

- Harvard Anthropology 1090 Syllabus
- Harvard History 1958 Syllabus
- Harvard History 90f Syllabus
- Harvard Linguistics 171 Handout 3
- Harvard Linguistics 171 Handout 2
- Harvard Linguistics 171 Handout 1
- Harvard Linguistics 130 Syllabus
- Harvard Linguistics 116b Syllabus
- Harvard Linguistics 116a Syllabus
- Harvard Linguistics 115a Lecture Slides
- Harvard Linguistics 115a Syllabus
- Harvard Linguistics 110 Handout 7
- Harvard Linguistics 110 Handout 6
- Harvard Linguistics 110 Handout 5
- Harvard Linguistics 110 Handout 4
- Harvard Linguistics 110 Handout 3
- Harvard Linguistics 110 Handout 2
- Harvard Linguistics 110 Handout 1
- Harvard Systems Biology 200 Handout
- Harvard Systems Biology 200 Syllabus
- Harvard Folklore and Mythology 98a Syllabus
- Harvard Applied Mathematics 205 Homework 1
- Harvard Applied Mathematics 21a Syllabus
- Harvard Economics 2020a Problem Set 4
- Harvard Economics 2020a Problem Set 3

Sign up to vote on this title

UsefulNot usefullecture

lecture

- 00949655%2E2015%2E1028405
- Point Estimation
- Dd 31720725
- STATISTICS Epidemology
- (Statistics, Textbooks and Monographs) Arijit Chaudhuri-Randomized Response and Indirect Questioning Techniques in Surveys-Chapman & Hall_CRC (2011)
- Chapter 7 Summary
- Lecture_notes_Statistics_II.pdf
- Short Introduction to the GMM
- Statistics
- frbrich_wp90-12.pdf
- Paper 76
- G502056278.pdf
- Making Decisions in Assessing Process Capability Index Cpk
- Jackknife
- Kuliah 3 Sampling
- intsta1_lecture8.ppt
- chapters1n2
- Basic Stat 000
- Stat 231 Coursenotes
- Msda3 Notes
- JDS-1110
- Practical Geostatistics 2000-2 Spatial Statistics
- Statistics for Business and Economics
- A GENERAL FAMILY OF DUAL TO RATIO-CUMPRODUCT ESTIMATOR IN SAMPLE SURVEYS
- Ml Estimation Tutorial
- Basic Statistics
- 10.1.1.52.8816.pdf
- Bootstrap Methodology
- The International Journal of Engineering and Science (The IJES)
- 09test+of+Hypothesis+Small+Sample.ppt - Copy (2) - Copy
- Harvard Government 2000 Lecture 3

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd