You are on page 1of 6

Ignacio Cascos Fern´ndez a Department of Statistics Universidad Carlos III de Madrid

Parameter estimation
Statistics — 2011–2012
A random sample of size n is formed by n independent random variables X1 , X2 , . . . , Xn sampled from the same population. That is, X1 , X2 , . . . , Xn follow the distribution of X.

1

Statistics (estimators)

Our aim is to obtain information about the population parameters (mean, variance, proportion,. . . ) when only a sample is available. A statistic is any transformation (function) of the observations of a random sample. Consequently, it is a random variable, f (X1 , X2 , . . . , Xn ) . ˆ An estimator of a parameter θ is any statistic θ = f (X1 , X2 , . . . , Xn ) that provides us with approximate value of θ. The value that an estimator assumes in a real sample (when the random variables are substituted by real numbers) is called estimation.

1.1

Properties of estimators

Unbiased estimator. An estimator of a parameter θ is unbiased if is exˆ ˆ pected value is θ, that is, θ is unbiased if E[θ] = θ. The distance of the expectation of the estimator to the true value of the ˆ parameter E[θ] − θ is called bias, ˆ ˆ bias[θ] = E[θ] − θ .

1

X2 . ˆ ˆ Eff[θ1 ] var[θ2 ] The standard error of an estimator is its standard deviation. That is. .1 Sampling distributions Distribution of the sample mean Given a random variable X with mean µ and known standard deviation σ. From several unbiased estimators of a parameter. An estimator is consistent if the probability of it being arbitrarily close to the true value of the parameter converges to one as the sample size tends to infinity. σθ = ˆ ˆ var[θ] . When the standard error depends of the true value of parameter θ. 1 ˆ . It is the minimal requirement for a good estimator. Consistency.Efficiency. 2 . ˆˆ Mean Square Error. we would choose the one with a smaller variance. consider X1 . . The efficiency of an estimator is the inverse to its variance. Xn a random sample from X of size n. Eff[θ] = ˆ var[θ] We can compare two unbiased estimators by means of their relative efficiency. θ1 ] = ˆ ˆ Eff[θ2 ] var[θ1 ] = . which is given by ˆ ˆ RE[θ2 . 2 2. . . we can substitute θ by an estimation in order to obtain the estimated standard error σθ . We can compare biased and unbiased estimators by means of the Mean Square Error given by ˆ ˆ ˆ ˆ MSE[θ] = E[(θ − θ)2 ] = var[θ] + bias[θ]2 .

. X1 . Xn . Finally. Obviously X ∼ B(1. . the distribution of p ˆ is approximately N(p. X2 . p). then 1 X= n n Xi = p ˆ i=1 represents the quotient of the number of individuals from the sample with the characteristic divided by the sample size. . the sample proportion. . p(1 − p)/n ). We will take an individual at random from the population and consider the random variable X that assume value 1 if the individual has the characteristic and 0 otherwise. Let p denote the population proportion of individuals with a certain characteristic. then X also follows a normal distribution.X1 . Xn of a random variable X (defining the population distribution) with mean µ and variance σ. . that is.2 Distribution of the sample variance Consider a random sample X1 . σ/ n). Let us now take a random sample of X of size n. . Xn are n independent random variables with the distribution of X. that is E[S 2 ]. Further. The sample mean is an unbiased estimator of the population mean µ. The sample variance given by n 1 2 (Xi − X)2 S = n − 1 i=1 is an unbiased estimator of the population variance σ 2 . . that is. The sample mean is given by X= 1 n n Xi i=1 which is obviously a random variable. if n ≥ 30. E[X] = µ and its variance is var[X] = σ 2 /n . The sample proportion is a particular case of sample mean. 2. . X2 . . If X is normally distributed. X2 . . after the Central Limit Theorem. . . Distribution of the sample proportion. 3 . after the Central Limit Theorem (if n ≥ 30) the distribution of √ X is approximately N(µ.

.m−1 2 2 SY /σY where Fn−1.2. . . . the distribution of the sample variance S 2 satisfies (n − 1)S 2 ∼ χ2 n−1 σ2 where χ2 stands for the chi-square distribution with n − 1 degrees of freen−1 dom. . The ratio of their sample variances satisfies 2 2 SX /σX ∼ Fn−1.m−1 stands for the F distribution with n − 1 and m − 1 degrees of freedom. and let Y ∼ N(µY . . Our aim is to approximate the true value (θ0 ) of parameter θ. Distribution of the quotient of sample variances (normal populations). 3 Maximum Likelihood Estimation We start with a random sample X1 . Distribution of the sample mean with unknown variance (normal population). Yn be a random sample of Y of size m. Let X ∼ N(µX . X2 . X2 . σ) and X1 . X2 . Xn from a known distribution model depending on one (or several) parameters. we can estimate the variance through the sample variance. . σX ) and X1 . . . Let X ∼ N(µ. . . Xn be a random sample of X of size n. . Xn be a random sample of X of size n. . Xn be a random sample of X of size n. . X2 . σY ) and Y1 . if σ is unknown. . . . Y2 . and the distribution of the sample mean X satisfies X −µ S 2 /n ∼ tn−1 where tn−1 stands for the t distribution with n − 1 degrees of freedom. . . Let X ∼ N(µ. . σ) and X1 . 4 .3 Sampling distributions for normal populations Distribution of the sample variance (normal population).

If our distribution model is discrete. . That is. call it θ Properties of MLEs. x2 . n l(θ|x) = i=1 f (xi |θ) . var[θM L ]) . where f (·|θ) denotes the density mass function under the assumption that the value of the parameter is θ. ˆ • Invariant under bijective transformations. candidate for MLE. Given a distribution model with known range that does not depend on any parameter. First derivative. ˆ • Asymptotically of minimal variance. . ˆ ˆ 4. We obtain the MLE of θ the following way: 1.The Maximum Likelihood Estimator (MLE) is the value of θ that maximizes the likelihood function (joint density or probability function). . . If θM L is the MLE of θ. var[θM L ] = − ˆ ∂ 2 L(θM L ) ∂θ2 −1 . That is. that is. L(θ|x) = ln l(θ|x) ˆ 3. Loglikelihood function. xn ). Solve ∂L(θ|x)/∂θ and find θ. Let us denote our observations by x = (x1 . 2. Check ∂ 2 L(θ)/∂θ2 < 0 to confirm that θ is the MLE ˆM L . then ˆ g(θM L ) is the MLE of g(θ) . of θ. the likelihood function is the joint probability function for discrete models and the joint density function for continuous models. θM L ≈ N(θ. any of them is to be evaluated at the sample and considered as a function of the parameter. The target if to obtain the value of θ that maximizes l(θ|x). if it is continuous. We ˆ must still check that a maximum is attained at θ. x is a vector in Rn . Second derivative. the value of θ that makes our sample more likely. 5 . E[θM L ] →n θ . the MLEs are: ˆ • Asymptotically unbiased. n l(θ|x) = i=1 P (Xi = xi |θ) . ˆ ˆ • Asymptotically normal. Likelihood function. meanwhile.

E[X] = 0 . + Xn follows a Chi-square distribution with n degrees of freedom. . the random variable 2 2 2 Y = X 1 + X2 + . Given X and Y two independent random variables such that X follows a standard normal distribution and Y follows a Chi-square distribution with n degrees of freedom. Student’s t distribution. . χ2 . var[Y ] = 2n. denoted by Z ∼ Fn1 . tn . denoted by Z ∼ tn . . the random variable X/n1 Z= Y /n2 follows a F distribution with n1 and n2 degrees of freedom. . A random variable with distribution t can assume any real value. Given n independent standard n normal random variables X1 . X2 .n2 . Fn1 .Appendix: Common sampling distributions Pearson’s Chi-square distribution. n A Chi-square random variable only assumes positive values and its parameters are: E[Y ] = n . For large enough values of n. Xn . Given X and Y two independent random variables such that X follows a Chi-square distribution with n1 degrees of freedom and Y follows a Chi-square distribution with n2 degrees of freedom. the tn distribution is very similar to the standard normal. 6 . the random variable Z= X Y /n follows a t distribution with n degrees of freedom. denoted by Y ∼ χ2 . . Fisher’s F distribution. var[X] = n n−2 if n ≥ 3.n2 . A random variable with distribution F can only assume positive values. .