Practical Statistics I
Georgi Boshnakov georgi.boshnakov@manchester.ac.uk
School of Mathematics The University of Manchester
2010–2011, Semester 2
Part I
Review of basic concepts and terminology
Types of data
Random samples
Population and sample characteristics
Probability distributions
Quantile function
Descriptive statistics and plots
Standard errors
pvalues
PSI 
2 
Boshnakov 

Types of data 

Random samples 

Population and 

sample 

characteristics 

Probability 

distributions 

Quantile function 

Descriptive 

statistics and 

plots 

Standard errors 

pvalues 
Statistics and data
Statistics is about making inferences (drawing conclusions) from data.
Types of data
◮ numerical
◮ categorical (factors, nominal): e.g. Male/Female
◮ ordered (ordered factors): e.g. grades: 1st class, upper
2nd,
Numerical data
The basic types of numerical data are
◮ continuous (e.g. temperatures)
◮ discrete (e.g. class sizes)
Samples versus complete data
We will assume that the data are representative of the population(s) from which they are drawn but they do not account for every member (subject, item) of the population.
Elections for parliament
◮ Before election—surveys on samples of voters with the aim to make inference about the number of seats for each party, i.e. about the distribution of the voters’ preferences in the population.
◮ After election—count the number of seats for each party. It does not make sense to make inference from subsamples.
We will be concerned mainly with inference from samples but some of the descriptive methods are useful for complete populations as well.
PSI 
3 
Boshnakov 

Types of data 

Random samples 

Population and 

sample 

characteristics 

Probability 

distributions 

Quantile function 

Descriptive 

statistics and 

plots 

Standard errors 

pvalues 

PSI 
4 
Boshnakov 

Types of data 

Random samples 

Population and 

sample 

characteristics 

Probability 

distributions 

Quantile function 

Descriptive 

statistics and 

plots 

Standard errors 

pvalues 
Random samples
A set of data, x _{1} ,
, x _{n} , is said to be a random sample if
, x _{n} are observations on some random variables , X _{n} .
◮ x _{1} , X _{1} ,
◮ The random variables X _{1} , identically distributed (i.i.d.).
, X _{n} are independent and
Characteristics of the population distribution.
◮ The common distribution of X _{1} ,
population distribution or the underlying distribution.
◮ The mean of the population distribution is known as population mean.
◮ The variance of the population distribution is known as population variance.
, X _{n} is known as the
This terminology extends naturally to other characteristics of the population distribution.
Population and sample characteristics
Population characteristics should not be confused with sample characteristics. The latter are computed from the sample.
For example, the sample mean x¯ = _{n} ^{} x _{i} and the population mean µ = E X _{i} are diﬀerent things.
Also, if we take another sample from the same distribution, then we will almost certainly get another value for x¯ , whereas µ remains the same.
1
PSI 
5 
Boshnakov 

Types of data 

Random samples 

Population and 

sample 

characteristics 

Probability 

distributions 

Quantile function 

Descriptive 

statistics and 

plots 

Standard errors 

pvalues 

PSI 
6 
Boshnakov 

Types of data 

Random samples 

Population and 

sample 

characteristics 

Probability 

distributions 

Quantile function 

Descriptive 

statistics and 

plots 

Standard errors 

pvalues 
Sample characteristics as random variables
The term Sample mean has two meanings which are normally clear from the context.
1
◮ The number x¯ = _{n} ^{} x _{i} calculated from the data.
¯
◮ The random variable X = _{n} ^{} X _{i} .
1
When we talk about the distribution of the sample mean we
¯
have the random variable X = _{n} ^{} X _{i} in mind.
1
Similar note applies to all sample characteristics of a distribution/population.
Some terminology conventions
Lectures, notes, textbooks
◮ On my lectures and in the notes I normally omit the qualiﬁer population and simply say mean, variance, distribution, etc.
◮ On the other hand, I usually say sample mean, sample variance, etc., i.e. the qualiﬁer sample is (almost) always present.
Computer output
◮ In computer output the qualiﬁer sample is always omitted.
PSI 
7 
Boshnakov 

Types of data 

Random samples 

Population and 

sample 

characteristics 

Probability 

distributions 

Quantile function 

Descriptive 

statistics and 

plots 

Standard errors 

pvalues 

PSI 
8 
Boshnakov 

Types of data 

Random samples 

Population and 

sample 

characteristics 

Probability 

distributions 

Quantile function 

Descriptive 

statistics and 

plots 

Standard errors 

pvalues 
Statistics
Deﬁnition 3.1
Any quantity computed from the data is called a statistic.
Examples
◮ Sample mean
◮ Sample variance
◮ Sample median
◮ The smallest observation in a sample
Some continuous distributions
N( µ, σ ^{2} ) — Normal (Gaussian) distribution,
f ( x )=
1
σ ^{√} 2 π ^{e} −
1 _{(} x − µ
2
σ
^{)} ^{2} .
Expo( λ ) — Exponential distribution with rate λ ,
f ( x ) = λe ^{−} ^{λ}^{x}
for x ≥ 0 .
(mean = 1/λ , variance = 1/λ ^{2} )
Gamma( α, β ) — Gamma distribution with shape α and scale β ,
1
f ( x )= _{β} _{α} _{Γ}_{(} _{α} _{)} x ^{α} ^{−} ^{1} e ^{−} ^{x}^{/}^{β} ,
for x ≥ 0 .
(mean = αβ , variance = αβ ^{2} )
PSI
9
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
PSI
10
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
Some discrete distributions
Bernoulli( p )
_{P} _{(} _{X} _{=} _{k} _{)}_{=} ^{} p
when k = 1
1 − p when k = 0
(mean = p , variance = p (1 − p ) )
Binom( n, p )
(mean = np , variance = np (1 − p ) )
If X _{1} ,
, X _{n} are i.i.d. Bernoulli(p ), then X _{1} +···+ X _{n} is
Binomial(n, p).
Pois( λ )
k
P ( X = k )= ^{λ} _{k} _{!} e ^{−} ^{λ}
for k = 0, 1 , 2 ,
Parameterisations of distributions
Normal distribution
◮ µ, σ ^{2} — almost universally adopted, more convenient mathematically.
◮ µ, σ — more intuitive, adopted by R (among others).
Exponential distribution
◮ rate λ — almost universally adopted.
◮ µ = 1/λ — more intuitive in some cases
PSI
11
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
PSI
12
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
Quantile function 
PSI 13 

Let F be a cumulative distribution function (cdf), p be a number in the interval (0, 1) , and x _{p} be a value such that 
Boshnakov 

Types of data 

Random samples 

F ( x _{p} ) = p. 
Population and 

sample 

Then we say that x _{p} is the p th quantile of F . 
characteristics 

Probability 

If F is strictly increasing, then the quantile can be written using the inverse cdf as x _{p} = F ^{−} ^{1} ( p ) . If F is not strictly increasing, then F ^{−} ^{1} is not uniquely deﬁned for some values of the argument. In such cases we choose the smallest value of x which satisﬁes the equation F ( x ) = p as the value of the inverse. More formally, 
distributions 

Quantile function 

Descriptive 

statistics and 

plots 

Standard errors 

pvalues 

F ^{−} ^{1} ( p ) = inf{x : F ( x ) ≥ p } , 
0 < p < 1 . 

The inverse cdf, F ^{−} ^{1} , is called also quantile function and we will denote it by Q ( p ) , i.e. 

Q ( p )= F ^{−} ^{1} ( p ) . 

Descriptive statistics and plots 
PSI 14 

Counts 
Boshnakov 

Sample statistic 
Types of data 

total # of observations (rows) 
Random samples 

# of nonmissing observations # of missing observations 
Population and 

sample characteristics 

Probability 

Measures of location 
distributions 

Sample statistic 
Quantile function 

Descriptive 

statistics and 

Mean 
plots 

Median 
Standard errors 

Mode 
pvalues 

Quartiles 

Sample statistic 

Lower quartile 

Upper quartile 
Descriptive statistics and plots (cont.)
Measures of dispersion
Sample statistic
Standard deviation Variance Range Interquartile range Coeﬃcient of variation
Measures of shape
Statistic 
Deﬁnition 
Estimate 

Skewness 
E( 
X 
− µ ) ^{3} /σ ^{3} 
1 
^{} ( x _{i} − x¯) ^{3} /s ^{3} 
_{n} 

Kurtosis 
E( X − µ ) ^{4} /σ ^{4} 
1 
^{} ( x _{i} − x¯) ^{4} /s ^{4} 

_{n} 

Kurtosis excess E( X − µ ) ^{4} /σ ^{4} −3 
1 
^{} ( x _{i} − x¯) ^{4} /s ^{4} −3 

_{n} 
Descriptive statistics and plots (cont )
Plots
◮ Histogram
◮ Box plot (a.k.a. BoxandWhiskers plot)
PSI
15
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
PSI
16
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
Standard error of the sample mean
Suppose that we wish to estimate the mean of a population
using a random sample X _{1} ,
, X _{n} .
¯
The sample mean, X is used routinely for this purpose but
how good is it?
Let µ be the population mean and σ ^{2} the population variance. Let also, as usual,
X = ^{1}
¯
n
n
i
=1
X _{i} ,
s ^{2} =
^{1}
n −1
n
i
=1
( X _{i} − X ) ^{2} .
¯
From probability theory (or from simple calculations) we know that
Var ^{} X ^{} = σ ^{2} /n,
¯
i.e. the standard deviation of X is Std ^{} X ^{} = σ/ ^{√} n .
A large value of the standard deviation, Std ^{} X ^{} , suggests
bad estimate of the mean.
A small value of the standard deviation, Std ^{} X ^{} , suggests
good estimate of the mean.
¯
¯
¯
¯
Standard error of the sample mean (cont.)
In practice, we usually do not know σ and use s in its place.
¯
This gives the estimated standard deviation, s/ ^{√} n , of X .
¯
The estimated standard deviation of X is normally referred
¯
to as the standard error of X .
We denote the standard error of X by S _{x}_{¯} .
¯
From the above we have S _{x}_{¯} = s/ ^{√} n .
Estimated standard deviations are useful characteristics of estimators of parameters other than the mean. Hence, the following deﬁnition.
PSI
17
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
PSI
18
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
Standard errors
Deﬁnition 7.1
The estimated standard deviation of the estimate of a parameter is called standard error.
Notation
ˆ
If θ is an estimate of a parameter θ , then we denote the
ˆ
standard error of θ by S
ˆ
θ ^{.}
Typical interpretation (but not the only possible)
◮ Small standard error suggests that the estimate is good.
◮ Large standard error suggests that the estimate is bad.
Standard errors for estimates of parameters are useful in interpretation and evaluation of statistical models and are routinely produced by statistical software.
Pvalues
Problem Test H _{0} against H _{A} at level of signiﬁcance α .
Typical values for α : 0 . 05 , 0 . 01 , 0 . 1 , 0. 001.
Instead of reporting critical values and critical regions computer software gives a more versatile statistic, the
p value.
Deﬁnition 8.1
The p value is the smallest signiﬁcance level at which we would reject H _{0} in favour of H _{A} .
So,
reject H _{0} in favour of H _{A} retain H _{0}
when α ≥ p , when α < p .
Equivalently, the p value is the probability, when H _{0} is true, for the test statistic to be less favourable for H _{0} than the observed value of the test statistic.
PSI
19
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
PSI
20
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
Informal interpretation of pvalues
The equivalent deﬁnition of pvalues is convenient for computation.
Example 8.2
Suppose that in a KS test we have d _{n} = 0. 4 . Since we reject H _{0} when d _{n} is large, less favourable for H _{0} values are those greater than 0 . 4 . Hence, the p value here is Pr( D _{n} > 0. 4) .
Interpretation of pvalues
In practice p values are often interpreted more informally.
Notice that H _{0} is rejected for “almost any” α when the p value is very small. For example, if p = 10 ^{−} ^{5} you may see expressions such as the following.
The null hypothesis is rejected at any typical signiﬁcance level since p < 10 ^{−} ^{4} .
Informal interpretation of pvalues (cont.)
Another useful way to communicate the information obtained from a p value is to use expressions like the following.
p > 0 . 1 The data gives no evidence against H _{0} . The data seems consistent with H _{0} .
0 . 05 < p < 0. 1 The data gives no evidence against H _{0} but further investigation may be needed.
0 . 01 < p < 0. 05 The data gives evidence to reject H _{0} in favour of H _{A} .
0 . 001 < p < 0 . 01 The data gives strong evidence to reject H _{0} in favour of H _{A} .
p < 0 . 001 The data gives very strong evidence to reject H _{0} in favour of H _{A} .
These are only guiding examples. The borders and the language used are subjective and may depend on the application.
PSI
21
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
PSI
22
Boshnakov
Types of data
Random samples
Population and
sample
characteristics
Probability
distributions
Quantile function
Descriptive
statistics and
plots
Standard errors
pvalues
Linear correlation
Part II
Bivariate data
Spearman’s rank correlation coeﬃcient
Order statistics and ranks
Pearson’s sample correlation coeﬃcient
Let ( x _{i} , y _{i} ) , i = 1,
x¯= ^{1}
n
n
i
=1
s xx =
^{1}
n −1
x _{i}
n
i
=1
s xy =
, n be n pairs of numbers. Let
( x _{i} − x¯) ^{2}
^{1}
n −1
n
i =1
y¯= ^{1}
n
n
i
=1
s _{y}_{y} =
^{1}
n −1
y
i
n
i
=1
( x _{i} − x¯)( y _{i} − y¯)
( y _{i} − y¯) ^{2}
The Pearson’s sample correlation coeﬃcient between x and y as deﬁned as
r xy =
n _{=}_{1} ( x _{i} − x¯)( y _{i} − y¯)
i
_{n}
i
_{=}_{1} ( x _{i} − x¯) ^{2} ^{}^{} ^{n} ( y _{i} − y¯) ^{2} ^{=}
i
=1
r _{x}_{y} is usually referred to as sample correlation coeﬃcient or correlation coeﬃcient.
PSI
23
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
PSI
24
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
Correlation coeﬃcient: numerical properties
◮ −1≤ r _{x}_{y} ≤1 .
◮ r _{x}_{y} = ±1 if and only if there is a perfect linear relationship between y _{i} and x _{i} , i.e. there exist constants
a and b such that y _{i} = a + bx _{i} for i = 1,
, n .
◮ r _{x}_{y} is a measure of how well the points x _{i} , y _{i} ,
i = 1,
, n are approximated by a straight line.
◮ The value of r _{x}_{y} does not change if one or both variables are linearly transformed. In particular, r _{x}_{y} does not depend on the units of measurement of the two variables.
◮ r _{x}_{y} = r _{y}_{x} , i.e. r _{x}_{y} does not depend on which variable is called x .
Correlation coeﬃcient: interpretation
The value of the sample correlation coeﬃcient is associated
in the following way with the pattern of a scatter plot of the
, n .
◮ r _{x}_{y} = ±1 —perfect linear relation, all points lie on a single straight line.
◮ r _{x}_{y} = 0—no linear relationship between the points.
◮ r _{x}_{y} close to 1—strong positive linear relationship, i.e. larger x s tend to be paired with larger y s.
◮ r _{x}_{y} close to −1 —strong negative linear relationship, i.e. larger x s tend to be paired with smaller y s.
data points ( x _{i} , y _{i} ) , i =1,
Importance of the qualiﬁer “linear”
There may be strong, even perfect, nonlinear relationship between the points when  r _{x}_{y}  < 1. This includes the case r _{x}_{y} =0.
Never forget this.
PSI
25
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
PSI
26
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
Sample correlation coeﬃcient as estimator
Let z _{i} =( x _{i} , y _{i} ) , i = 1,
bivariate distribution with distribution function F ( x, y ) .
In other words, z _{i} , i = 1,
independent bivariate random variables, Z _{i} =( X _{i} , Y _{i} ) ,
, n be a random sample from a
, n is a realization of n
i = 1, 
, n , all having the same distribution F . 
Let 
=E X _{i} ,
µ
σ _{x} = Var X _{i} ,
_{x}
2
=E Y _{i} ,
µ
σ = Var Y _{i} ,
_{y}
2
y
_{ρ} _{=} E (( X _{i} − µ _{x} )( Y _{i} − µ _{y} ))
σ x σ y
.
In this case the sample correlation coeﬃcient, r _{x}_{y} , is an estimate of the population correlation coeﬃcient, ρ, between X and Y .
A test for zero correlation
In data analysis it is often important to test if two variables are independent. This is a diﬃcult task but in many cases it is suﬃcient to answer the simpler question.
Are the variables linearly dependent?
We will devise a test based on the following result.
Theorem 9.1
If ρ ≡ Corr( X _{i} , Y _{i} ) = 0, and X _{i} , Y _{i} , i = 1,
independent and normally distributed, then the statistic
, n , are jointly
_{t}
_{=}
r _{x}_{y} ^{√} n −2
1− r
2
xy
has a Student’s t distribution with n − 2 degrees of freedom.
PSI
27
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
PSI
28
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
A test for zero correlation (cont.)
Given the data ( x _{i} , y _{i} ) , i = 1,
H _{0} : ρ = 0
vs
, n , we wish to test
H _{A} : ρ =0,
at a level of signiﬁcance α .
When H _{0} is true, the statistic t = ^{r} ^{x}^{y} √ ^{n} ^{−} ^{2}
^{√}
1 − r
_{x}_{y} ∼t n − 2 .
2
1. Compute the critical value t _{n} _{−} _{2}_{;} _{α}_{/} _{2} and set the critical region (CR) to { t :  t  > t _{n} _{−} _{2}_{;} _{α}_{/} _{2} } .
2. Compute
3. Compute the observed value of the test statistic
r _{x}_{y} .
t obs = ^{r} ^{x}^{y} √ ^{n} ^{−} ^{2}
^{√}
1 − r
2
xy
.
4. Reject H _{0} in favour of H _{A} if t _{o}_{b}_{s} is in the critical region and retain H _{0} otherwise (i.e. reject if  t _{o}_{b}_{s}  > t _{n} _{−} _{2}_{;} _{α}_{/} _{2} and retain otherwise).
Conﬁdence intervals for the correlation coeﬃcient
, n be a random sample from a
bivariate distribution with distribution function F ( x, y ) and
correlation coeﬃcient ρ.
We wish to construct a conﬁdence interval for ρ with coverage probability 1 − α .
To use the sample correlation coeﬃcient, r _{x}_{y} , we need its distribution which is rather complicated. We will use the following approximate result.
Let Z _{i} =( X _{i} , Y _{i} ) , i = 1,
Let R be the sample correlation coeﬃcient. Consider the following transformation of R :
Z =
2 1 _{l}_{n} 1+ R 1− R ^{.}
Result 1
The distribution of the random variable Z is approximately
normal with mean ^{1} _{2} ln ^{1}^{+} ^{ρ} _{ρ} and variance
1
−
1
n − 3 ^{.}
PSI
29
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
PSI
30
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
CI for the correlation coeﬃcient (cont.)
The transformation Z = ^{1} _{2} ln Fisher’s z transform.
Standard calculations can be used to show (bonus qu.!) that a 100(1 − α )% CI for ρ is
1+ R
1 − R
is often referred to as
1 + r + (1 − r ) v 1 + r + (1 − r ) u ^{,}
^{} 1 + r − (1 − r ) v _{,} 1 + r − (1 − r ) u
where u = e ^{−} ^{2} ^{z} α/ 2 ^{/} ^{√} ^{n} ^{−} ^{3} , v = e ^{2} ^{z} α/ 2 ^{/} ^{√} ^{n} ^{−} ^{3} , and z _{α}_{/} _{2} is the upper α/2 quantile of the standard normal distribution.
Spearman’s rank correlation coeﬃcient
Let Z _{i} =( X _{i} , Y _{i} ), i = 1,
bivariate distribution with distribution function F ( x, y ).
Let
, n be a random sample from a
R _{i} = Rank of X _{i} in the sample X _{1} ,
S _{i} = Rank of Y _{i} in the sample Y _{1} ,
, X _{n} ,
, Y _{n} .
Now we consider the pairs ( R _{i} , S _{i} )
Z _{i} ’s are correlated, then we expect that the ranks ( R _{i} , S _{i} ) will be correlated too. The following notation is useful:
for i = 1, 2 ,
, n . If the
D _{i} = R _{i} − S _{i} ,
i = 1,
, n.
PSI
31
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
PSI
32
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
Spearman’s rank correlation coeﬃcient (cont.)
Deﬁnition 10.1
Spearman’s rank correlation coeﬃcient for Z _{i} =( X _{i} , Y _{i} ),
i = 1,
where
, n , is deﬁned as
ρ =
¯ ¯
n
i
_{=}_{1} ( R _{i} − R )( S _{i} − S )
^{} ^{} n
i
_{=}_{1} ( R _{i} − R ) ^{2} ^{} ^{n} _{=}_{1} ( S _{i} − S ) ^{2} ,
¯
i
¯
R = ^{1}
¯
n
n
i
=1
R
_{i} ,
S = ^{1}
^{¯}
n
n
i
=1
S
_{i} .
So, Spearman’s ρ is obtained by replacing the original data ( X _{i} , Y _{i} ) by their ranks R _{i} and S _{i} , respectively, and calculating the ordinary sample correlation coeﬃcient of ( R _{i} , S _{i} ).
Spearman’s rank correlation coeﬃcient (cont )
When there are no ties in the data, an equivalent expression for Spearman’s ρ may be given in terms of the diﬀerences
D _{i} = R _{i} − S _{i} :
ρ =1− ^{6} n
i =1 ^{D} i
2
n ( n ^{2} − 1) ^{.}
Properties of Spearman’s ρ
◮ −1≤ ρ ≤1.
◮ ρ = 1 if there is a perfect match between the ranks of the X s and Y s.
◮ ρ = −1 if the Y s are in reverse order of that of the corresponding X s.
◮ ρ = 0 when there is no correlation between the ranks of the X s and the Y s.
PSI
33
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
PSI
34
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
Order statistics
Let X _{1} , cdf F . Let
, X _{n} be a random sample from a distribution with
X _{(}_{1}_{)} = the smallest among X _{1} ,
X _{(}_{2}_{)} = the second smallest among X _{1} ,
, X _{n}
.
.
.
^{X} _{(} _{r} _{)} ^{=}
the r th smallest among X _{1} ,
.
.
.
X _{(} _{n} _{)} = the largest among X _{1} ,
, X _{n}
, X _{n}
, X _{n}
We have
X _{(}_{1}_{)} ≤ X _{(}_{2}_{)} ≤···≤ X _{(} _{n} _{)} .
The random variables X _{(}_{1}_{)} ,
statistics of the random sample X _{1} ,
The r th smallest, X _{(} _{r} _{)} , is called the r th order statistic.
, X _{(} _{n} _{)} are termed the order
, X _{n} .
Ranks
Let x _{1} ,
Deﬁnition (i) The number of observations less than or equal to x _{i} is called its rank.
, x _{n} be a sequence of numbers.
An alternative deﬁnition which may be easier to grasp is:
Deﬁnition (ii) The rank of x _{i} is the position of x _{i} in the
ordered sequence X _{(}_{1}_{)} ,
, X _{(} _{n} _{)} .
The rank will be denoted by rank ( x _{i} ).
These two deﬁnitions are equivalent when all elements in the sequence are diﬀerent.
If some elements of x _{1} ,
said to be tied), then the second deﬁnition is ambiguous.
Moreover, there is more than one “sensible” way to deﬁne ranks of tied data.
, x _{n} are equal (such elements are
PSI
35
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
PSI
36
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
Ranks (cont.)
Suppose that
^{x} ( l ) ^{<} ^{x} ( l +1) ^{=} ^{x} ( l +2) ^{=}^{·}^{·}^{·}^{=} ^{x} ( l + m ) ^{<} ^{x} ( l + m +1) ^{.}
Here are some standard ways to deal with ties:
◮ Assign rank l + 1 to the tied data (sport).
◮ Assign rank l + m to the tied data (ﬁrst deﬁnition above).
◮ Assign the average rank to the tied data:
( l + 1) + ( l + 2) + · · · + ( l + m)
m
^{.}
◮ Randomly assign ranks l + 1, l + data.
,l + m to the tied
Ranks of the variables in a random sample X _{1} ,
deﬁned in the same way. In this case rank ( X _{i} ) is random as
well.
, X _{n} are
Part III
Simple linear regression
Estimating a mean
Diversion: standard errors and signiﬁcance
The simple regression model
The model
Least squares estimation
Motivation
Least squares estimator
Derivation of the normal equations
Fitted values and residuals
PSI
37
Boshnakov
Linear correlation
Spearman’s rank
correlation
coeﬃcient
Order statistics
and ranks
PSI
38
Boshnakov
Estimating a
mean
Diversion:
standard errors
and signiﬁcance
The simple
regression model
The model
Least squares
estimation
Motivation Least squares estimator Derivation of the normal equations Fitted values and residuals
Estimating a mean
Let 
Y _{i} ∼ N( µ, σ ^{2} ), i = 1, 
, n , be i.i.d. random variables. 
Let 
ε _{i} = Y _{i} − µ for i = 1, 
, n . Then 
Y _{i} = µ + ε _{i} ,
i = 1,
, n,
where ε _{i} ∼ N(0 , σ ^{2} ) are i.i.d
This is an instance of a model of the form
data = mean function + error ,
where
◮ the mean function is the best guess about the data that can be made from the available sources.
◮ the error term cannot be predicted from the available information.
Estimating a mean (cont.)
In this case the mean function is a constant. Usually we do not know the mean function, µ , but we can estimate it by the sample mean,
1
µˆ = Y = _{n} ^{} Y _{i} .
¯
Also, we estimate σ ^{2} by the sample variance,
σˆ ^{2} =
^{1}
n −1
n
i =1
( Y _{i} − Y ) ^{2} .
¯
The variance of µˆ is σ ^{2} /n . When we replace σ ^{2} by σˆ ^{2} in the last expression we obtain the estimated variance, σˆ ^{2} /n , of µˆ.
The standard error of µˆ is
S _{µ}_{ˆ} = ^{} (ˆσ ^{2} /n ) = σ/ˆ ^{√} n.
The test statistic for the null hypothesis H _{0} : µ = µ _{0} is
¯
(ˆµ − µ _{0} ) /S _{µ}_{ˆ} = ^{√} n ( Y − µ _{0} ) /s and its distribution under H _{0} is t _{n} _{−} _{1} .
Diversion: standard errors and signiﬁcance
A very useful characteristic of an estimated parameter is
given in the following deﬁnition.
Deﬁnition 12.1
The estimated standard deviation of a parameter is called standard error.
Testing the signiﬁcance of a parameter
We say that a parameter is signiﬁcant when we decide to reject the hypothesis that it is equal to zero. The alternative hypothesis is usually that the parameter is not zero.
Interpretation
If a parameter turns out to be signiﬁcant in an analysis, we
may interpret this as evidence that the corresponding predictor variable is important and should not be omitted from the model. We say also that the data provide evidence
against the claim that the parameter is zero.
Simple linear regression model
Consider a dataset consisting of n pairs ( x _{i} , y _{i} ), i =1,
The simple linear regression model describes the relationship between the x s and the y s by the equation
, n .
Y _{i} = α + βx _{i} + ε _{i}
Terminology:
i = 1, 2 ,
, n.
◮ x is the predictor (or independent) variable.
◮ y is the response (or dependent) variable.
◮ ε is the error variable.
We will assume that
◮ ε _{i} ’s are jointly i.i.d. random variables with E ε _{i} = 0 and common variance Var ε _{i} = σ ^{2} .
◮ x are nonrandom variables.
◮ ε _{i} are normally distributed.
PSI 43 

Boshnakov 

, n. 
Estimating a 
mean 

Diversion: 

standard errors 

and signiﬁcance 

The simple 

regression model 

The model 

Least squares 

estimation 

Motivation Least squares estimator Derivation of the normal equations Fitted values and residuals 
Y _{i} = α + βx _{i} + ε _{i}
i = 1, 2 ,
◮ α + βx _{i} is the mean function.
◮ α and β are the (regression) parameters.
◮ The model is linear because the mean function is a linear combination of the parameters.
◮ The model is simple linear since there is only one predictor variable and the mean function is a straight line as a function of x .
From the assumption that ε _{i} is random it follows that Y _{i} is also random.
Does the assumption that ε _{i} ’s are identically distributed
imply that Y _{1} ,
, Y _{n} are also identically distributed?
Motivation. Sum of squares
We do not know the parameters of the model but for any given ( b _{0} , b _{1} ) we can compute deviations deﬁned by
e _{i} ≡ e _{i} ( b _{0} , b _{1} ) ,
i =1,
= Y _{i} − b _{0} − b _{1} x _{i} .
, n
It seems natural to estimate ( α, β ) by values of ( b _{0} , b _{1} ) which make the deviations e _{i} “small”. To quantify the meaning of “small” we deﬁne the sum of squares of the deviations.
For any ( b _{0} , b _{1} ) let
S ( b _{0} , b _{1} )≡
n
i
=1
e
2
i =
n
i
=1
( Y _{i} − b _{0} − b _{1} x _{i} ) ^{2}
Least squares estimator
Principle of least squares
Estimate the parameters by values that make the sum of squares as small as possible.
Applied to the simple regression model, the principle of least squares leads to the following deﬁnition.
Deﬁnition 13.1
ˆ
The pair (ˆα, β ) is a least squares estimator (l.s.e.) of ( α, β )
ˆ
if S (ˆα, β )≤ S
( b _{0} , b _{1} ) for any choice of ( b _{0} , b _{1} ).
Derivation of the normal equations
Let S be the sum of squares,
S ( b _{0} , b _{1} )≡
n n
i
=1
e
2
i =
i
=1
( Y _{i} − b _{0} − b _{1} x _{i} ) ^{2}
Its minimum may be found by solving the system
Note that
∂S
∂b
0 =0,
Much more than documents.
Discover everything Scribd has to offer, including books and audiobooks from major publishers.
Cancel anytime.