You are on page 1of 56

Practical Statistics I

Georgi Boshnakov georgi.boshnakov@manchester.ac.uk

School of Mathematics The University of Manchester

2010–2011, Semester 2

Part I

Review of basic concepts and terminology

Types of data

Random samples

Population and sample characteristics

Probability distributions

Quantile function

Descriptive statistics and plots

Standard errors

p-values

 PSI 2 Boshnakov Types of data Random samples Population and sample characteristics Probability distributions Quantile function Descriptive statistics and plots Standard errors p-values

Statistics and data

Statistics is about making inferences (drawing conclusions) from data.

Types of data

numerical

categorical (factors, nominal): e.g. Male/Female

ordered (ordered factors): e.g. grades: 1st class, upper

2nd,

Numerical data

The basic types of numerical data are

continuous (e.g. temperatures)

discrete (e.g. class sizes)

Samples versus complete data

We will assume that the data are representative of the population(s) from which they are drawn but they do not account for every member (subject, item) of the population.

Elections for parliament

Before election—surveys on samples of voters with the aim to make inference about the number of seats for each party, i.e. about the distribution of the voters’ preferences in the population.

After election—count the number of seats for each party. It does not make sense to make inference from subsamples.

We will be concerned mainly with inference from samples but some of the descriptive methods are useful for complete populations as well.

 PSI 3 Boshnakov Types of data Random samples Population and sample characteristics Probability distributions Quantile function Descriptive statistics and plots Standard errors p-values PSI 4 Boshnakov Types of data Random samples Population and sample characteristics Probability distributions Quantile function Descriptive statistics and plots Standard errors p-values

Random samples

A set of data, x 1 ,

, x n , is said to be a random sample if

, x n are observations on some random variables , X n .

x 1 , X 1 ,

The random variables X 1 , identically distributed (i.i.d.).

, X n are independent and

Characteristics of the population distribution.

The common distribution of X 1 ,

population distribution or the underlying distribution.

The mean of the population distribution is known as population mean.

The variance of the population distribution is known as population variance.

, X n is known as the

This terminology extends naturally to other characteristics of the population distribution.

Population and sample characteristics

Population characteristics should not be confused with sample characteristics. The latter are computed from the sample.

For example, the sample mean x¯ = n x i and the population mean µ = E X i are diﬀerent things.

Also, if we take another sample from the same distribution, then we will almost certainly get another value for x¯ , whereas µ remains the same.

1

 PSI 5 Boshnakov Types of data Random samples Population and sample characteristics Probability distributions Quantile function Descriptive statistics and plots Standard errors p-values PSI 6 Boshnakov Types of data Random samples Population and sample characteristics Probability distributions Quantile function Descriptive statistics and plots Standard errors p-values

Sample characteristics as random variables

The term Sample mean has two meanings which are normally clear from the context.

1

The number x¯ = n x i calculated from the data.

¯

The random variable X = n X i .

1

When we talk about the distribution of the sample mean we

¯

have the random variable X = n X i in mind.

1

Similar note applies to all sample characteristics of a distribution/population.

Some terminology conventions

Lectures, notes, textbooks

On my lectures and in the notes I normally omit the qualiﬁer population and simply say mean, variance, distribution, etc.

On the other hand, I usually say sample mean, sample variance, etc., i.e. the qualiﬁer sample is (almost) always present.

Computer output

In computer output the qualiﬁer sample is always omitted.

 PSI 7 Boshnakov Types of data Random samples Population and sample characteristics Probability distributions Quantile function Descriptive statistics and plots Standard errors p-values PSI 8 Boshnakov Types of data Random samples Population and sample characteristics Probability distributions Quantile function Descriptive statistics and plots Standard errors p-values

Statistics

Deﬁnition 3.1

Any quantity computed from the data is called a statistic.

Examples

Sample mean

Sample variance

Sample median

The smallest observation in a sample

Some continuous distributions

N( µ, σ 2 ) — Normal (Gaussian) distribution,

f ( x )=

1

σ 2 π e

1 ( x µ

2

σ

) 2 .

Expo( λ ) — Exponential distribution with rate λ ,

f ( x ) = λe λx

for x 0 .

(mean = 1, variance = 12 )

Gamma( α, β ) — Gamma distribution with shape α and scale β ,

1

f ( x )= β α Γ( α ) x α 1 e x/β ,

for x 0 .

(mean = αβ , variance = αβ 2 )

PSI

9

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

PSI

10

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Some discrete distributions

Bernoulli( p )

P ( X = k )= p

when k = 1

1 p when k = 0

(mean = p , variance = p (1 p ) )

Binom( n, p )

(mean = np , variance = np (1 p ) )

If X 1 ,

, X n are i.i.d. Bernoulli(p ), then X 1 +···+ X n is

Binomial(n, p).

Pois( λ )

k

P ( X = k )= λ k ! e λ

for k = 0, 1 , 2 ,

Parameterisations of distributions

Normal distribution

µ, σ 2 — almost universally adopted, more convenient mathematically.

µ, σ — more intuitive, adopted by R (among others).

Exponential distribution

rate λ — almost universally adopted.

µ = 1— more intuitive in some cases

PSI

11

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

PSI

12

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

 Quantile function PSI 13 Let F be a cumulative distribution function (cdf), p be a number in the interval (0, 1) , and x p be a value such that Boshnakov Types of data Random samples F ( x p ) = p. Population and sample Then we say that x p is the p th quantile of F . characteristics Probability If F is strictly increasing, then the quantile can be written using the inverse cdf as x p = F − 1 ( p ) . If F is not strictly increasing, then F − 1 is not uniquely deﬁned for some values of the argument. In such cases we choose the smallest value of x which satisﬁes the equation F ( x ) = p as the value of the inverse. More formally, distributions Quantile function Descriptive statistics and plots Standard errors p-values F − 1 ( p ) = inf{x : F ( x ) ≥ p } , 0 < p < 1 . The inverse cdf, F − 1 , is called also quantile function and we will denote it by Q ( p ) , i.e. Q ( p )= F − 1 ( p ) . Descriptive statistics and plots PSI 14 Counts Boshnakov Sample statistic Types of data total # of observations (rows) Random samples # of non-missing observations # of missing observations Population and sample characteristics Probability Measures of location distributions Sample statistic Quantile function Descriptive statistics and Mean plots Median Standard errors Mode p-values Quartiles Sample statistic Lower quartile Upper quartile

Descriptive statistics and plots (cont.)

Measures of dispersion

Sample statistic

Standard deviation Variance Range Inter-quartile range Coeﬃcient of variation

Measures of shape

 Statistic Deﬁnition Estimate Skewness E( X − µ ) 3 /σ 3 1 ( x i − x¯) 3 /s 3 n Kurtosis E( X − µ ) 4 /σ 4 1 ( x i − x¯) 4 /s 4 n Kurtosis excess E( X − µ ) 4 /σ 4 −3 1 ( x i − x¯) 4 /s 4 −3 n

Descriptive statistics and plots (cont )

Plots

Histogram

Box plot (a.k.a. Box-and-Whiskers plot)

PSI

15

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

PSI

16

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Standard error of the sample mean

Suppose that we wish to estimate the mean of a population

using a random sample X 1 ,

, X n .

¯

The sample mean, X is used routinely for this purpose but

how good is it?

Let µ be the population mean and σ 2 the population variance. Let also, as usual,

X = 1

¯

n

n

i

=1

X i ,

s 2 =

1

n 1

n

i

=1

( X i X ) 2 .

¯

From probability theory (or from simple calculations) we know that

Var X = σ 2 /n,

¯

i.e. the standard deviation of X is Std X = σ/ n .

A large value of the standard deviation, Std X , suggests

bad estimate of the mean.

A small value of the standard deviation, Std X , suggests

good estimate of the mean.

¯

¯

¯

¯

Standard error of the sample mean (cont.)

In practice, we usually do not know σ and use s in its place.

¯

This gives the estimated standard deviation, s/ n , of X .

¯

The estimated standard deviation of X is normally referred

¯

to as the standard error of X .

We denote the standard error of X by S x¯ .

¯

From the above we have S x¯ = s/ n .

Estimated standard deviations are useful characteristics of estimators of parameters other than the mean. Hence, the following deﬁnition.

PSI

17

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

PSI

18

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Standard errors

Deﬁnition 7.1

The estimated standard deviation of the estimate of a parameter is called standard error.

Notation

ˆ

If θ is an estimate of a parameter θ , then we denote the

ˆ

standard error of θ by S

ˆ

θ .

Typical interpretation (but not the only possible)

Small standard error suggests that the estimate is good.

Large standard error suggests that the estimate is bad.

Standard errors for estimates of parameters are useful in interpretation and evaluation of statistical models and are routinely produced by statistical software.

P-values

Problem Test H 0 against H A at level of signiﬁcance α .

Typical values for α : 0 . 05 , 0 . 01 , 0 . 1 , 0. 001.

Instead of reporting critical values and critical regions computer software gives a more versatile statistic, the

p -value.

Deﬁnition 8.1

The p -value is the smallest signiﬁcance level at which we would reject H 0 in favour of H A .

So,

reject H 0 in favour of H A retain H 0

when α p , when α < p .

Equivalently, the p -value is the probability, when H 0 is true, for the test statistic to be less favourable for H 0 than the observed value of the test statistic.

PSI

19

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

PSI

20

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Informal interpretation of p-values

The equivalent deﬁnition of p-values is convenient for computation.

Example 8.2

Suppose that in a KS test we have d n = 0. 4 . Since we reject H 0 when d n is large, less favourable for H 0 values are those greater than 0 . 4 . Hence, the p -value here is Pr( D n > 0. 4) .

Interpretation of p-values

In practice p values are often interpreted more informally.

Notice that H 0 is rejected for “almost any” α when the p -value is very small. For example, if p = 10 5 you may see expressions such as the following.

The null hypothesis is rejected at any typical signiﬁcance level since p < 10 4 .

Informal interpretation of p-values (cont.)

Another useful way to communicate the information obtained from a p -value is to use expressions like the following.

p > 0 . 1 The data gives no evidence against H 0 . The data seems consistent with H 0 .

0 . 05 < p < 0. 1 The data gives no evidence against H 0 but further investigation may be needed.

0 . 01 < p < 0. 05 The data gives evidence to reject H 0 in favour of H A .

0 . 001 < p < 0 . 01 The data gives strong evidence to reject H 0 in favour of H A .

p < 0 . 001 The data gives very strong evidence to reject H 0 in favour of H A .

These are only guiding examples. The borders and the language used are subjective and may depend on the application.

PSI

21

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

PSI

22

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Linear correlation

Part II

Bivariate data

Spearman’s rank correlation coeﬃcient

Order statistics and ranks

Pearson’s sample correlation coeﬃcient

Let ( x i , y i ) , i = 1,

x¯= 1

n

n

i

=1

s xx =

1

n 1

x i

n

i

=1

s xy =

, n be n pairs of numbers. Let

( x i x¯) 2

1

n 1

n

i =1

y¯= 1

n

n

i

=1

s yy =

1

n 1

y

i

n

i

=1

( x i x¯)( y i y¯)

( y i y¯) 2

The Pearson’s sample correlation coeﬃcient between x and y as deﬁned as

r xy =

n =1 ( x i x¯)( y i y¯)

i

n

i

=1 ( x i x¯) 2 n ( y i y¯) 2 =

i

=1

s xy
.
s xx √ s yy

r xy is usually referred to as sample correlation coeﬃcient or correlation coeﬃcient.

PSI

23

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

PSI

24

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

Correlation coeﬃcient: numerical properties

1r xy 1 .

r xy = ±1 if and only if there is a perfect linear relationship between y i and x i , i.e. there exist constants

a and b such that y i = a + bx i for i = 1,

, n .

r xy is a measure of how well the points x i , y i ,

i = 1,

, n are approximated by a straight line.

The value of r xy does not change if one or both variables are linearly transformed. In particular, r xy does not depend on the units of measurement of the two variables.

r xy = r yx , i.e. r xy does not depend on which variable is called x .

Correlation coeﬃcient: interpretation

The value of the sample correlation coeﬃcient is associated

in the following way with the pattern of a scatter plot of the

, n .

r xy = ±1 —perfect linear relation, all points lie on a single straight line.

r xy = 0—no linear relationship between the points.

r xy close to 1—strong positive linear relationship, i.e. larger x s tend to be paired with larger y s.

r xy close to 1 —strong negative linear relationship, i.e. larger x s tend to be paired with smaller y s.

data points ( x i , y i ) , i =1,

Importance of the qualiﬁer “linear”

There may be strong, even perfect, non-linear relationship between the points when | r xy | < 1. This includes the case r xy =0.

Never forget this.

PSI

25

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

PSI

26

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

Sample correlation coeﬃcient as estimator

Let z i =( x i , y i ) , i = 1,

bivariate distribution with distribution function F ( x, y ) .

In other words, z i , i = 1,

independent bivariate random variables, Z i =( X i , Y i ) ,

, n be a random sample from a

, n is a realization of n

 i = 1, , n , all having the same distribution F . Let

=E X i ,

µ

σ x = Var X i ,

x

2

=E Y i ,

µ

σ = Var Y i ,

y

2

y

ρ = E (( X i µ x )( Y i µ y ))

σ x σ y

.

In this case the sample correlation coeﬃcient, r xy , is an estimate of the population correlation coeﬃcient, ρ, between X and Y .

A test for zero correlation

In data analysis it is often important to test if two variables are independent. This is a diﬃcult task but in many cases it is suﬃcient to answer the simpler question.

Are the variables linearly dependent?

We will devise a test based on the following result.

Theorem 9.1

If ρ Corr( X i , Y i ) = 0, and X i , Y i , i = 1,

independent and normally distributed, then the statistic

, n , are jointly

t

=

r xy n 2

1r

2

xy

has a Student’s t distribution with n 2 degrees of freedom.

PSI

27

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

PSI

28

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

A test for zero correlation (cont.)

Given the data ( x i , y i ) , i = 1,

H 0 : ρ = 0

vs

, n , we wish to test

H A : ρ =0,

at a level of signiﬁcance α .

When H 0 is true, the statistic t = r xy n 2

1 r

xy t n 2 .

2

1. Compute the critical value t n 2; α/ 2 and set the critical region (CR) to { t : | t | > t n 2; α/ 2 } .

2. Compute

3. Compute the observed value of the test statistic

r xy .

t obs = r xy n 2

1 r

2

xy

.

4. Reject H 0 in favour of H A if t obs is in the critical region and retain H 0 otherwise (i.e. reject if | t obs | > t n 2; α/ 2 and retain otherwise).

Conﬁdence intervals for the correlation coeﬃcient

, n be a random sample from a

bivariate distribution with distribution function F ( x, y ) and

correlation coeﬃcient ρ.

We wish to construct a conﬁdence interval for ρ with coverage probability 1 α .

To use the sample correlation coeﬃcient, r xy , we need its distribution which is rather complicated. We will use the following approximate result.

Let Z i =( X i , Y i ) , i = 1,

Let R be the sample correlation coeﬃcient. Consider the following transformation of R :

Z =

2 1 ln 1+ R 1R .

Result 1

The distribution of the random variable Z is approximately

normal with mean 1 2 ln 1+ ρ ρ and variance

1

1

n 3 .

PSI

29

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

PSI

30

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

CI for the correlation coeﬃcient (cont.)

The transformation Z = 1 2 ln Fisher’s z -transform.

Standard calculations can be used to show (bonus qu.!) that a 100(1 α )% CI for ρ is

1+ R

1 R

is often referred to as

1 + r + (1 r ) v 1 + r + (1 r ) u ,

1 + r (1 r ) v , 1 + r (1 r ) u

where u = e 2 z α/ 2 / n 3 , v = e 2 z α/ 2 / n 3 , and z α/ 2 is the upper α/2 quantile of the standard normal distribution.

Spearman’s rank correlation coeﬃcient

Let Z i =( X i , Y i ), i = 1,

bivariate distribution with distribution function F ( x, y ).

Let

, n be a random sample from a

R i = Rank of X i in the sample X 1 ,

S i = Rank of Y i in the sample Y 1 ,

, X n ,

, Y n .

Now we consider the pairs ( R i , S i )

Z i ’s are correlated, then we expect that the ranks ( R i , S i ) will be correlated too. The following notation is useful:

for i = 1, 2 ,

, n . If the

D i = R i S i ,

i = 1,

, n.

PSI

31

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

PSI

32

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

Spearman’s rank correlation coeﬃcient (cont.)

Deﬁnition 10.1

Spearman’s rank correlation coeﬃcient for Z i =( X i , Y i ),

i = 1,

where

, n , is deﬁned as

ρ =

¯ ¯

n

i

=1 ( R i R )( S i S )

n

i

=1 ( R i R ) 2 n =1 ( S i S ) 2 ,

¯

i

¯

R = 1

¯

n

n

i

=1

R

i ,

S = 1

¯

n

n

i

=1

S

i .

So, Spearman’s ρ is obtained by replacing the original data ( X i , Y i ) by their ranks R i and S i , respectively, and calculating the ordinary sample correlation coeﬃcient of ( R i , S i ).

Spearman’s rank correlation coeﬃcient (cont )

When there are no ties in the data, an equivalent expression for Spearman’s ρ may be given in terms of the diﬀerences

D i = R i S i :

ρ =16 n

i =1 D i

2

n ( n 2 1) .

Properties of Spearman’s ρ

1ρ 1.

ρ = 1 if there is a perfect match between the ranks of the X s and Y s.

ρ = 1 if the Y s are in reverse order of that of the corresponding X s.

ρ = 0 when there is no correlation between the ranks of the X s and the Y s.

PSI

33

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

PSI

34

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

Order statistics

Let X 1 , cdf F . Let

, X n be a random sample from a distribution with

X (1) = the smallest among X 1 ,

X (2) = the second smallest among X 1 ,

, X n

.

.

.

X ( r ) =

the r th smallest among X 1 ,

.

.

.

X ( n ) = the largest among X 1 ,

, X n

, X n

, X n

We have

X (1) X (2) ≤···≤ X ( n ) .

The random variables X (1) ,

statistics of the random sample X 1 ,

The r th smallest, X ( r ) , is called the r th order statistic.

, X ( n ) are termed the order

, X n .

Ranks

Let x 1 ,

Deﬁnition (i) The number of observations less than or equal to x i is called its rank.

, x n be a sequence of numbers.

An alternative deﬁnition which may be easier to grasp is:

Deﬁnition (ii) The rank of x i is the position of x i in the

ordered sequence X (1) ,

, X ( n ) .

The rank will be denoted by rank ( x i ).

These two deﬁnitions are equivalent when all elements in the sequence are diﬀerent.

If some elements of x 1 ,

said to be tied), then the second deﬁnition is ambiguous.

Moreover, there is more than one “sensible” way to deﬁne ranks of tied data.

, x n are equal (such elements are

PSI

35

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

PSI

36

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

Ranks (cont.)

Suppose that

x ( l ) < x ( l +1) = x ( l +2) =···= x ( l + m ) < x ( l + m +1) .

Here are some standard ways to deal with ties:

Assign rank l + 1 to the tied data (sport).

Assign rank l + m to the tied data (ﬁrst deﬁnition above).

Assign the average rank to the tied data:

( l + 1) + ( l + 2) + · · · + ( l + m)

m

.

Randomly assign ranks l + 1, l + data.

,l + m to the tied

Ranks of the variables in a random sample X 1 ,

deﬁned in the same way. In this case rank ( X i ) is random as

well.

, X n are

Part III

Simple linear regression

Estimating a mean

Diversion: standard errors and signiﬁcance

The simple regression model

The model

Least squares estimation

Motivation

Least squares estimator

Derivation of the normal equations

Fitted values and residuals

PSI

37

Boshnakov

Linear correlation

Spearman’s rank

correlation

coeﬃcient

Order statistics

and ranks

PSI

38

Boshnakov

Estimating a

mean

Diversion:

standard errors

and signiﬁcance

The simple

regression model

The model

Least squares

estimation

Motivation Least squares estimator Derivation of the normal equations Fitted values and residuals

Estimating a mean

 Let Y i ∼ N( µ, σ 2 ), i = 1, , n , be i.i.d. random variables. Let ε i = Y i − µ for i = 1, , n . Then

Y i = µ + ε i ,

i = 1,

, n,

where ε i N(0 , σ 2 ) are i.i.d

This is an instance of a model of the form

data = mean function + error ,

where

the mean function is the best guess about the data that can be made from the available sources.

the error term cannot be predicted from the available information.

Estimating a mean (cont.)

In this case the mean function is a constant. Usually we do not know the mean function, µ , but we can estimate it by the sample mean,

1

µˆ = Y = n Y i .

¯

Also, we estimate σ 2 by the sample variance,

σˆ 2 =

1

n 1

n

i =1

( Y i Y ) 2 .

¯

The variance of µˆ is σ 2 /n . When we replace σ 2 by σˆ 2 in the last expression we obtain the estimated variance, σˆ 2 /n , of µˆ.

The standard error of µˆ is

S µˆ = σ 2 /n ) = σ/ˆ n.

The test statistic for the null hypothesis H 0 : µ = µ 0 is

¯

µ µ 0 ) /S µˆ = n ( Y µ 0 ) /s and its distribution under H 0 is t n 1 .

Diversion: standard errors and signiﬁcance

A very useful characteristic of an estimated parameter is

given in the following deﬁnition.

Deﬁnition 12.1

The estimated standard deviation of a parameter is called standard error.

Testing the signiﬁcance of a parameter

We say that a parameter is signiﬁcant when we decide to reject the hypothesis that it is equal to zero. The alternative hypothesis is usually that the parameter is not zero.

Interpretation

If a parameter turns out to be signiﬁcant in an analysis, we

may interpret this as evidence that the corresponding predictor variable is important and should not be omitted from the model. We say also that the data provide evidence

against the claim that the parameter is zero.

Simple linear regression model

Consider a dataset consisting of n pairs ( x i , y i ), i =1,

The simple linear regression model describes the relationship between the x s and the y s by the equation

, n .

Y i = α + βx i + ε i

Terminology:

i = 1, 2 ,

, n.

x is the predictor (or independent) variable.

y is the response (or dependent) variable.

ε is the error variable.

We will assume that

ε i ’s are jointly i.i.d. random variables with E ε i = 0 and common variance Var ε i = σ 2 .

x are non-random variables.

ε i are normally distributed.

 PSI 43 Boshnakov , n. Estimating a mean Diversion: standard errors and signiﬁcance The simple regression model The model Least squares estimation Motivation Least squares estimator Derivation of the normal equations Fitted values and residuals

Y i = α + βx i + ε i

i = 1, 2 ,

α + βx i is the mean function.

α and β are the (regression) parameters.

The model is linear because the mean function is a linear combination of the parameters.

The model is simple linear since there is only one predictor variable and the mean function is a straight line as a function of x .

From the assumption that ε i is random it follows that Y i is also random.

Does the assumption that ε i ’s are identically distributed

imply that Y 1 ,

, Y n are also identically distributed?

Motivation. Sum of squares

We do not know the parameters of the model but for any given ( b 0 , b 1 ) we can compute deviations deﬁned by

e i e i ( b 0 , b 1 ) ,

i =1,

= Y i b 0 b 1 x i .

, n

It seems natural to estimate ( α, β ) by values of ( b 0 , b 1 ) which make the deviations e i “small”. To quantify the meaning of “small” we deﬁne the sum of squares of the deviations.

For any ( b 0 , b 1 ) let

S ( b 0 , b 1 )

n

i

=1

e

2

i =

n

i

=1

( Y i b 0 b 1 x i ) 2

Least squares estimator

Principle of least squares

Estimate the parameters by values that make the sum of squares as small as possible.

Applied to the simple regression model, the principle of least squares leads to the following deﬁnition.

Deﬁnition 13.1

ˆ

The pair α, β ) is a least squares estimator (l.s.e.) of ( α, β )

ˆ

if S α, β )S

( b 0 , b 1 ) for any choice of ( b 0 , b 1 ).

Derivation of the normal equations

Let S be the sum of squares,

S ( b 0 , b 1 )

n n

i

=1

e

2

i =

i

=1

( Y i b 0 b 1 x i ) 2

Its minimum may be found by solving the system

Note that

∂S

∂b

0 =0,