You are on page 1of 56

Practical Statistics I

Georgi Boshnakov georgi.boshnakov@manchester.ac.uk

School of Mathematics The University of Manchester

2010–2011, Semester 2

Part I

Review of basic concepts and terminology

Types of data

Random samples

Population and sample characteristics

Probability distributions

Quantile function

Descriptive statistics and plots

Standard errors

p-values

PSI

2

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Statistics and data

Statistics is about making inferences (drawing conclusions) from data.

Types of data

numerical

categorical (factors, nominal): e.g. Male/Female

ordered (ordered factors): e.g. grades: 1st class, upper

2nd,

Numerical data

The basic types of numerical data are

continuous (e.g. temperatures)

discrete (e.g. class sizes)

Samples versus complete data

We will assume that the data are representative of the population(s) from which they are drawn but they do not account for every member (subject, item) of the population.

Elections for parliament

Before election—surveys on samples of voters with the aim to make inference about the number of seats for each party, i.e. about the distribution of the voters’ preferences in the population.

After election—count the number of seats for each party. It does not make sense to make inference from subsamples.

We will be concerned mainly with inference from samples but some of the descriptive methods are useful for complete populations as well.

PSI

3

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

PSI

4

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Random samples

A set of data, x 1 ,

, x n , is said to be a random sample if

, x n are observations on some random variables , X n .

x 1 , X 1 ,

The random variables X 1 , identically distributed (i.i.d.).

, X n are independent and

Characteristics of the population distribution.

The common distribution of X 1 ,

population distribution or the underlying distribution.

The mean of the population distribution is known as population mean.

The variance of the population distribution is known as population variance.

, X n is known as the

This terminology extends naturally to other characteristics of the population distribution.

Population and sample characteristics

Population characteristics should not be confused with sample characteristics. The latter are computed from the sample.

For example, the sample mean x¯ = n x i and the population mean µ = E X i are different things.

Also, if we take another sample from the same distribution, then we will almost certainly get another value for x¯ , whereas µ remains the same.

1

PSI

5

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

PSI

6

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Sample characteristics as random variables

The term Sample mean has two meanings which are normally clear from the context.

1

The number x¯ = n x i calculated from the data.

¯

The random variable X = n X i .

1

When we talk about the distribution of the sample mean we

¯

have the random variable X = n X i in mind.

1

Similar note applies to all sample characteristics of a distribution/population.

Some terminology conventions

Lectures, notes, textbooks

On my lectures and in the notes I normally omit the qualifier population and simply say mean, variance, distribution, etc.

On the other hand, I usually say sample mean, sample variance, etc., i.e. the qualifier sample is (almost) always present.

Computer output

In computer output the qualifier sample is always omitted.

PSI

7

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

PSI

8

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Statistics

Definition 3.1

Any quantity computed from the data is called a statistic.

Examples

Sample mean

Sample variance

Sample median

The smallest observation in a sample

Some continuous distributions

N( µ, σ 2 ) — Normal (Gaussian) distribution,

f ( x )=

1

σ 2 π e

1 ( x µ

2

σ

) 2 .

Expo( λ ) — Exponential distribution with rate λ ,

f ( x ) = λe λx

for x 0 .

(mean = 1, variance = 12 )

Gamma( α, β ) — Gamma distribution with shape α and scale β ,

1

f ( x )= β α Γ( α ) x α 1 e x/β ,

for x 0 .

(mean = αβ , variance = αβ 2 )

PSI

9

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Descriptive statistics and plots Standard errors p-values PSI 10 Boshnakov Types of data Random samples Population

PSI

10

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Some discrete distributions

Bernoulli( p )

P ( X = k )= p

when k = 1

1 p when k = 0

(mean = p , variance = p (1 p ) )

Binom( n, p )

(mean = np , variance = np (1 p ) )

If X 1 ,

, X n are i.i.d. Bernoulli(p ), then X 1 +···+ X n is

Binomial(n, p).

Pois( λ )

k

P ( X = k )= λ k ! e λ

for k = 0, 1 , 2 ,

Parameterisations of distributions

Normal distribution

µ, σ 2 — almost universally adopted, more convenient mathematically.

µ, σ — more intuitive, adopted by R (among others).

Exponential distribution

rate λ — almost universally adopted.

µ = 1— more intuitive in some cases

PSI

11

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Descriptive statistics and plots Standard errors p-values PSI 12 Boshnakov Types of data Random samples Population

PSI

12

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Quantile function

 

PSI

13

 

Let F be a cumulative distribution function (cdf), p be a number in the interval (0, 1) , and x p be a value such that

Boshnakov

Types of data

 

Random samples

 

F ( x p ) = p.

 

Population and

 

sample

 

Then we say that x p is the p th quantile of F .

characteristics

 

Probability

 

If F is strictly increasing, then the quantile can be written using the inverse cdf as x p = F 1 ( p ) . If F is not strictly increasing, then F 1 is not uniquely defined for some values of the argument. In such cases we choose the smallest value of x which satisfies the equation F ( x ) = p as the value of the inverse. More formally,

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

F 1 ( p ) = inf{x : F ( x ) p } ,

0 < p < 1 .

The inverse cdf, F 1 , is called also quantile function and we will denote it by Q ( p ) , i.e.

 

Q ( p )= F 1 ( p ) .

 

Descriptive statistics and plots

PSI

14

 

Counts

Boshnakov

Sample statistic

Types of data

total # of observations (rows)

 

Random samples

# of non-missing observations

# of missing observations

Population and

sample

characteristics

Probability

Measures of location

distributions

Sample statistic

Quantile function

 

Descriptive

statistics and

 

Mean

plots

Median

Standard errors

Mode

p-values

Quartiles

Sample statistic

Lower quartile

Upper quartile

Descriptive statistics and plots (cont.)

Measures of dispersion

Sample statistic

Standard deviation Variance Range Inter-quartile range Coefficient of variation

Measures of shape

Statistic

Definition

Estimate

Skewness

E(

X

µ ) 3 3

1

( x i x¯) 3 /s 3

 

n

Kurtosis

E( X µ ) 4 4

1

( x i x¯) 4 /s 4

n

Kurtosis excess E( X µ ) 4 4 3

1

( x i x¯) 4 /s 4 3

n

Descriptive statistics and plots (cont )

Plots

Histogram

Box plot (a.k.a. Box-and-Whiskers plot)

PSI

15

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Descriptive statistics and plots Standard errors p-values PSI 16 Boshnakov Types of data Random samples Population

PSI

16

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Standard error of the sample mean

Suppose that we wish to estimate the mean of a population

using a random sample X 1 ,

, X n .

¯

The sample mean, X is used routinely for this purpose but

how good is it?

Let µ be the population mean and σ 2 the population variance. Let also, as usual,

X = 1

¯

n

n

i

=1

X i ,

s 2 =

1

n 1

n

i

=1

( X i X ) 2 .

¯

From probability theory (or from simple calculations) we know that

Var X = σ 2 /n,

¯

i.e. the standard deviation of X is Std X = σ/ n .

A large value of the standard deviation, Std X , suggests

bad estimate of the mean.

A small value of the standard deviation, Std X , suggests

good estimate of the mean.

¯

¯

¯

¯

Standard error of the sample mean (cont.)

In practice, we usually do not know σ and use s in its place.

¯

This gives the estimated standard deviation, s/ n , of X .

¯

The estimated standard deviation of X is normally referred

¯

to as the standard error of X .

We denote the standard error of X by S x¯ .

¯

From the above we have S x¯ = s/ n .

Estimated standard deviations are useful characteristics of estimators of parameters other than the mean. Hence, the following definition.

PSI

17

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Descriptive statistics and plots Standard errors p-values PSI 18 Boshnakov Types of data Random samples Population

PSI

18

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Standard errors

Definition 7.1

The estimated standard deviation of the estimate of a parameter is called standard error.

Notation

ˆ

If θ is an estimate of a parameter θ , then we denote the

ˆ

standard error of θ by S

ˆ

θ .

Typical interpretation (but not the only possible)

Small standard error suggests that the estimate is good.

Large standard error suggests that the estimate is bad.

Standard errors for estimates of parameters are useful in interpretation and evaluation of statistical models and are routinely produced by statistical software.

P-values

Problem Test H 0 against H A at level of significance α .

Typical values for α : 0 . 05 , 0 . 01 , 0 . 1 , 0. 001.

Instead of reporting critical values and critical regions computer software gives a more versatile statistic, the

p -value.

Definition 8.1

The p -value is the smallest significance level at which we would reject H 0 in favour of H A .

So,

reject H 0 in favour of H A retain H 0

when α p , when α < p .

Equivalently, the p -value is the probability, when H 0 is true, for the test statistic to be less favourable for H 0 than the observed value of the test statistic.

PSI

19

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Descriptive statistics and plots Standard errors p-values PSI 20 Boshnakov Types of data Random samples Population

PSI

20

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Informal interpretation of p-values

The equivalent definition of p-values is convenient for computation.

Example 8.2

Suppose that in a KS test we have d n = 0. 4 . Since we reject H 0 when d n is large, less favourable for H 0 values are those greater than 0 . 4 . Hence, the p -value here is Pr( D n > 0. 4) .

Interpretation of p-values

In practice p values are often interpreted more informally.

Notice that H 0 is rejected for “almost any” α when the p -value is very small. For example, if p = 10 5 you may see expressions such as the following.

The null hypothesis is rejected at any typical significance level since p < 10 4 .

Informal interpretation of p-values (cont.)

Another useful way to communicate the information obtained from a p -value is to use expressions like the following.

p > 0 . 1 The data gives no evidence against H 0 . The data seems consistent with H 0 .

0 . 05 < p < 0. 1 The data gives no evidence against H 0 but further investigation may be needed.

0 . 01 < p < 0. 05 The data gives evidence to reject H 0 in favour of H A .

0 . 001 < p < 0 . 01 The data gives strong evidence to reject H 0 in favour of H A .

p < 0 . 001 The data gives very strong evidence to reject H 0 in favour of H A .

These are only guiding examples. The borders and the language used are subjective and may depend on the application.

PSI

21

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Descriptive statistics and plots Standard errors p-values PSI 22 Boshnakov Types of data Random samples Population

PSI

22

Boshnakov

Types of data

Random samples

Population and

sample

characteristics

Probability

distributions

Quantile function

Descriptive

statistics and

plots

Standard errors

p-values

Linear correlation

Part II

Bivariate data

Spearman’s rank correlation coefficient

Order statistics and ranks

Pearson’s sample correlation coefficient

Let ( x i , y i ) , i = 1,

x¯= 1

n

n

i

=1

s xx =

1

n 1

x i

n

i

=1

s xy =

, n be n pairs of numbers. Let

( x i x¯) 2

1

n 1

n

i =1

y¯= 1

n

n

i

=1

s yy =

1

n 1

y

i

n

i

=1

( x i x¯)( y i y¯)

( y i y¯) 2

The Pearson’s sample correlation coefficient between x and y as defined as

r xy =

n =1 ( x i x¯)( y i y¯)

i

n

i

=1 ( x i x¯) 2 n ( y i y¯) 2 =

i

=1

s xy √ . s xx √ s yy
s xy
.
s xx √ s yy

r xy is usually referred to as sample correlation coefficient or correlation coefficient.

PSI

23

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

rank correlation coefficient Order statistics and ranks PSI 24 Boshnakov Linear correlation Spearman’s rank

PSI

24

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

Correlation coefficient: numerical properties

1r xy 1 .

r xy = ±1 if and only if there is a perfect linear relationship between y i and x i , i.e. there exist constants

a and b such that y i = a + bx i for i = 1,

, n .

r xy is a measure of how well the points x i , y i ,

i = 1,

, n are approximated by a straight line.

The value of r xy does not change if one or both variables are linearly transformed. In particular, r xy does not depend on the units of measurement of the two variables.

r xy = r yx , i.e. r xy does not depend on which variable is called x .

Correlation coefficient: interpretation

The value of the sample correlation coefficient is associated

in the following way with the pattern of a scatter plot of the

, n .

r xy = ±1 —perfect linear relation, all points lie on a single straight line.

r xy = 0—no linear relationship between the points.

r xy close to 1—strong positive linear relationship, i.e. larger x s tend to be paired with larger y s.

r xy close to 1 —strong negative linear relationship, i.e. larger x s tend to be paired with smaller y s.

data points ( x i , y i ) , i =1,

Importance of the qualifier “linear”

There may be strong, even perfect, non-linear relationship between the points when | r xy | < 1. This includes the case r xy =0.

Never forget this.

PSI

25

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

rank correlation coefficient Order statistics and ranks PSI 26 Boshnakov Linear correlation Spearman’s rank

PSI

26

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

Sample correlation coefficient as estimator

Let z i =( x i , y i ) , i = 1,

bivariate distribution with distribution function F ( x, y ) .

In other words, z i , i = 1,

independent bivariate random variables, Z i =( X i , Y i ) ,

, n be a random sample from a

, n is a realization of n

i = 1,

, n , all having the same distribution F .

Let

=E X i ,

µ

σ x = Var X i ,

x

2

=E Y i ,

µ

σ = Var Y i ,

y

2

y

ρ = E (( X i µ x )( Y i µ y ))

σ x σ y

.

In this case the sample correlation coefficient, r xy , is an estimate of the population correlation coefficient, ρ, between X and Y .

A test for zero correlation

In data analysis it is often important to test if two variables are independent. This is a difficult task but in many cases it is sufficient to answer the simpler question.

Are the variables linearly dependent?

We will devise a test based on the following result.

Theorem 9.1

If ρ Corr( X i , Y i ) = 0, and X i , Y i , i = 1,

independent and normally distributed, then the statistic

, n , are jointly

t

=

r xy n 2

1r

2

xy

has a Student’s t distribution with n 2 degrees of freedom.

PSI

27

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

rank correlation coefficient Order statistics and ranks PSI 28 Boshnakov Linear correlation Spearman’s rank

PSI

28

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

A test for zero correlation (cont.)

Given the data ( x i , y i ) , i = 1,

H 0 : ρ = 0

vs

, n , we wish to test

H A : ρ =0,

at a level of significance α .

When H 0 is true, the statistic t = r xy n 2

1 r

xy t n 2 .

2

1. Compute the critical value t n 2; α/ 2 and set the critical region (CR) to { t : | t | > t n 2; α/ 2 } .

2. Compute

3. Compute the observed value of the test statistic

r xy .

t obs = r xy n 2

1 r

2

xy

.

4. Reject H 0 in favour of H A if t obs is in the critical region and retain H 0 otherwise (i.e. reject if | t obs | > t n 2; α/ 2 and retain otherwise).

Confidence intervals for the correlation coefficient

, n be a random sample from a

bivariate distribution with distribution function F ( x, y ) and

correlation coefficient ρ.

We wish to construct a confidence interval for ρ with coverage probability 1 α .

To use the sample correlation coefficient, r xy , we need its distribution which is rather complicated. We will use the following approximate result.

Let Z i =( X i , Y i ) , i = 1,

Let R be the sample correlation coefficient. Consider the following transformation of R :

Z =

2 1 ln 1+ R 1R .

Result 1

The distribution of the random variable Z is approximately

normal with mean 1 2 ln 1+ ρ ρ and variance

1

1

n 3 .

PSI

29

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

rank correlation coefficient Order statistics and ranks PSI 30 Boshnakov Linear correlation Spearman’s rank

PSI

30

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

CI for the correlation coefficient (cont.)

The transformation Z = 1 2 ln Fisher’s z -transform.

Standard calculations can be used to show (bonus qu.!) that a 100(1 α )% CI for ρ is

1+ R

1 R

is often referred to as

1 + r + (1 r ) v 1 + r + (1 r ) u ,

1 + r (1 r ) v , 1 + r (1 r ) u

where u = e 2 z α/ 2 / n 3 , v = e 2 z α/ 2 / n 3 , and z α/ 2 is the upper α/2 quantile of the standard normal distribution.

Spearman’s rank correlation coefficient

Let Z i =( X i , Y i ), i = 1,

bivariate distribution with distribution function F ( x, y ).

Let

, n be a random sample from a

R i = Rank of X i in the sample X 1 ,

S i = Rank of Y i in the sample Y 1 ,

, X n ,

, Y n .

Now we consider the pairs ( R i , S i )

Z i ’s are correlated, then we expect that the ranks ( R i , S i ) will be correlated too. The following notation is useful:

for i = 1, 2 ,

, n . If the

D i = R i S i ,

i = 1,

, n.

PSI

31

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

rank correlation coefficient Order statistics and ranks PSI 32 Boshnakov Linear correlation Spearman’s rank

PSI

32

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

Spearman’s rank correlation coefficient (cont.)

Definition 10.1

Spearman’s rank correlation coefficient for Z i =( X i , Y i ),

i = 1,

where

, n , is defined as

ρ =

¯ ¯

n

i

=1 ( R i R )( S i S )

n

i

=1 ( R i R ) 2 n =1 ( S i S ) 2 ,

¯

i

¯

R = 1

¯

n

n

i

=1

R

i ,

S = 1

¯

n

n

i

=1

S

i .

So, Spearman’s ρ is obtained by replacing the original data ( X i , Y i ) by their ranks R i and S i , respectively, and calculating the ordinary sample correlation coefficient of ( R i , S i ).

Spearman’s rank correlation coefficient (cont )

When there are no ties in the data, an equivalent expression for Spearman’s ρ may be given in terms of the differences

D i = R i S i :

ρ =16 n

i =1 D i

2

n ( n 2 1) .

Properties of Spearman’s ρ

1ρ 1.

ρ = 1 if there is a perfect match between the ranks of the X s and Y s.

ρ = 1 if the Y s are in reverse order of that of the corresponding X s.

ρ = 0 when there is no correlation between the ranks of the X s and the Y s.

PSI

33

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

rank correlation coefficient Order statistics and ranks PSI 34 Boshnakov Linear correlation Spearman’s rank

PSI

34

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

Order statistics

Let X 1 , cdf F . Let

, X n be a random sample from a distribution with

X (1) = the smallest among X 1 ,

X (2) = the second smallest among X 1 ,

, X n

.

.

.

X ( r ) =

the r th smallest among X 1 ,

.

.

.

X ( n ) = the largest among X 1 ,

, X n

, X n

, X n

We have

X (1) X (2) ≤···≤ X ( n ) .

The random variables X (1) ,

statistics of the random sample X 1 ,

The r th smallest, X ( r ) , is called the r th order statistic.

, X ( n ) are termed the order

, X n .

Ranks

Let x 1 ,

Definition (i) The number of observations less than or equal to x i is called its rank.

, x n be a sequence of numbers.

An alternative definition which may be easier to grasp is:

Definition (ii) The rank of x i is the position of x i in the

ordered sequence X (1) ,

, X ( n ) .

The rank will be denoted by rank ( x i ).

These two definitions are equivalent when all elements in the sequence are different.

If some elements of x 1 ,

said to be tied), then the second definition is ambiguous.

Moreover, there is more than one “sensible” way to define ranks of tied data.

, x n are equal (such elements are

PSI

35

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

rank correlation coefficient Order statistics and ranks PSI 36 Boshnakov Linear correlation Spearman’s rank

PSI

36

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

Ranks (cont.)

Suppose that

x ( l ) < x ( l +1) = x ( l +2) =···= x ( l + m ) < x ( l + m +1) .

Here are some standard ways to deal with ties:

Assign rank l + 1 to the tied data (sport).

Assign rank l + m to the tied data (first definition above).

Assign the average rank to the tied data:

( l + 1) + ( l + 2) + · · · + ( l + m)

m

.

Randomly assign ranks l + 1, l + data.

,l + m to the tied

Ranks of the variables in a random sample X 1 ,

defined in the same way. In this case rank ( X i ) is random as

well.

, X n are

Part III

Simple linear regression

Estimating a mean

Diversion: standard errors and significance

The simple regression model

The model

Least squares estimation

Motivation

Least squares estimator

Derivation of the normal equations

Fitted values and residuals

PSI

37

Boshnakov

Linear correlation

Spearman’s rank

correlation

coefficient

Order statistics

and ranks

rank correlation coefficient Order statistics and ranks PSI 38 Boshnakov Estimating a mean Diversion: standard

PSI

38

Boshnakov

Estimating a

mean

Diversion:

standard errors

and significance

The simple

regression model

The model

Least squares

estimation

Motivation Least squares estimator Derivation of the normal equations Fitted values and residuals

Estimating a mean

Let

Y i N( µ, σ 2 ), i = 1,

, n , be i.i.d. random variables.

Let

ε i = Y i µ for i = 1,

, n . Then

Y i = µ + ε i ,

i = 1,

, n,

where ε i N(0 , σ 2 ) are i.i.d

This is an instance of a model of the form

data = mean function + error ,

where

the mean function is the best guess about the data that can be made from the available sources.

the error term cannot be predicted from the available information.

Estimating a mean (cont.)

In this case the mean function is a constant. Usually we do not know the mean function, µ , but we can estimate it by the sample mean,

1

µˆ = Y = n Y i .

¯

Also, we estimate σ 2 by the sample variance,

σˆ 2 =

1

n 1

n

i =1

( Y i Y ) 2 .

¯

The variance of µˆ is σ 2 /n . When we replace σ 2 by σˆ 2 in the last expression we obtain the estimated variance, σˆ 2 /n , of µˆ.

The standard error of µˆ is

S µˆ = σ 2 /n ) = σ/ˆ n.

The test statistic for the null hypothesis H 0 : µ = µ 0 is

¯

µ µ 0 ) /S µˆ = n ( Y µ 0 ) /s and its distribution under H 0 is t n 1 .

Diversion: standard errors and significance

A very useful characteristic of an estimated parameter is

given in the following definition.

Definition 12.1

The estimated standard deviation of a parameter is called standard error.

Testing the significance of a parameter

We say that a parameter is significant when we decide to reject the hypothesis that it is equal to zero. The alternative hypothesis is usually that the parameter is not zero.

Interpretation

If a parameter turns out to be significant in an analysis, we

may interpret this as evidence that the corresponding predictor variable is important and should not be omitted from the model. We say also that the data provide evidence

against the claim that the parameter is zero.

Simple linear regression model

Consider a dataset consisting of n pairs ( x i , y i ), i =1,

The simple linear regression model describes the relationship between the x s and the y s by the equation

, n .

Y i = α + βx i + ε i

Terminology:

i = 1, 2 ,

, n.

x is the predictor (or independent) variable.

y is the response (or dependent) variable.

ε is the error variable.

We will assume that

ε i ’s are jointly i.i.d. random variables with E ε i = 0 and common variance Var ε i = σ 2 .

x are non-random variables.

ε i are normally distributed.

 

PSI

43

Boshnakov

, n.

Estimating a

mean

Diversion:

standard errors

and significance

The simple

regression model

The model

Least squares

estimation

Motivation Least squares estimator Derivation of the normal equations Fitted values and residuals

Y i = α + βx i + ε i

i = 1, 2 ,

α + βx i is the mean function.

α and β are the (regression) parameters.

The model is linear because the mean function is a linear combination of the parameters.

The model is simple linear since there is only one predictor variable and the mean function is a straight line as a function of x .

From the assumption that ε i is random it follows that Y i is also random.

Does the assumption that ε i ’s are identically distributed

imply that Y 1 ,

, Y n are also identically distributed?

Motivation. Sum of squares

We do not know the parameters of the model but for any given ( b 0 , b 1 ) we can compute deviations defined by

e i e i ( b 0 , b 1 ) ,

i =1,

= Y i b 0 b 1 x i .

, n

It seems natural to estimate ( α, β ) by values of ( b 0 , b 1 ) which make the deviations e i “small”. To quantify the meaning of “small” we define the sum of squares of the deviations.

For any ( b 0 , b 1 ) let

S ( b 0 , b 1 )

n

i

=1

e

2

i =

n

i

=1

( Y i b 0 b 1 x i ) 2

Least squares estimator

Principle of least squares

Estimate the parameters by values that make the sum of squares as small as possible.

Applied to the simple regression model, the principle of least squares leads to the following definition.

Definition 13.1

ˆ

The pair α, β ) is a least squares estimator (l.s.e.) of ( α, β )

ˆ

if S α, β )S

( b 0 , b 1 ) for any choice of ( b 0 , b 1 ).

Derivation of the normal equations

Let S be the sum of squares,

S ( b 0 , b 1 )

n n

i

=1

e

2

i =

i

=1

( Y i b 0 b 1 x i ) 2

Its minimum may be found by solving the system

Note that

∂S

∂b

0 =0,