You are on page 1of 11

INFERENCE STATISTIC PRACTICE

A. Let X1 , X 2 , , X n be a random sample of size n from the normal population N ( , 2 ) and let
X = ( X1 , X2 , , Xn ) . We have X Nn ( ,2I n ) with  = ( , , ,  ) . Using the properties of
the multivariate normal distribution, we have:
X N ( , 2 / n ) . (1)
> mu <- 2
> sigma <- 3
> Y <- function(n) rnorm(n,mu,sigma)
> n=4
> meanY <- function() mean(Y(n))
> MeanY()
[1] 1.903066
> samplemeanY <- function(m) replicate(m,MeanY())
> m=10000
> hist(samplemeanY(m),freq=0,breaks=40)
> curve(dnorm(x,mu,sigma/sqrt(n)),col="red",add=TRUE)
( n − 1) S2X  2 ( n − 1) , (2)
2
> mu<-2
> sigma<-3
> n<-100
> Y<-function() {
+ x<-rnorm(n,mu,sigma)
+ (n-1)*var(x)/sigma^2
+ }
> vecY<-function(m) replicate(m,Y())
> hist(vecY(1000),main=" Histogram of VecY",xlab="VecY",freq=0,breaks=40)
> curve(dchisq(x,df=n-1),col="red",add=TRUE)

B. In the case of non-normal population, i.e. X1 , X 2 , , X n is only a random sample of size n


from a population with finite mean  and variance  , as an application of the Central Limit
2
Theorem, the above properties are also true for large value of the sample size n, i.e. (1) and (2)
are true when n →  so that for a large value of n, we get from (1) and (2) the asymptotic
( n −1)S2X
(approximate) distributions for X and 2
. As an application, let X1 , X 2 , , X n be a random
sample from Bernoulli distribution B (1, p ) with mean  = p and variance 2 = p (1 − p ) . Since

# i : Xi = 1
X= f
n
is the relative frequency of the number 1 (the number of “success”) in the sample, (1) and (2)
become
f N ( p, p (1 − p ) / n ) . (1’)
> p<-0.7
> n<-100
> X<-function(n)
+ {
+ x<-rbinom(n,1,p)
+ mean(x)
+ }
> samplemeanmauX<-function(m) replicate(m,X(n))
> hist(samplemeanmauX(10000),main = "Histogram of Mean X",xlab =
"meanX",freq=0,breaks=40)
> curve(dnorm(x,p,sqrt(p*(1-p)/n)),col="red",add=TRUE)
( n − 1) f (1 − f ) 2 ( n − 1) , (2’)
p (1 − p )
> p<-0.7
> n<-100
> X<-function(n)
+ {
+ x<-rbinom(n,1,p)
+ ((n-1)*var(x))/(p*(1-p))
+ }
> samplemeanmauX<-function(m) replicate(m,X(n))
> hist(samplemeanmauX(10000),main = "Histogram of Mean X",xlab =
"meanX",freq=0,breaks=40)
> curve(dchisq(x,n-1),col="red",add=TRUE)
C. Construct the confidence intervals for the mean, variance, and proportion.
1. Confidence intervals for the mean
Case of a large sample (n > 30) or of a small sample under the assumption of normality
A confidence interval at level (1 −  ) for the mean  is

 −1 ˆ −1 ˆ 
CI1− =  x − t1n− , x + t1n−
n 
/2 /2
 n
The confidence interval is given by the function t.test().
Using the data DataExcel of student in a university we build a CI for mean of T1’s score.
> t.test(T1,conf.level=0.9)$conf.int
[1] 5.678953 6.213047
attr(,"conf.level")
[1] 0.9
We get the confidence interval [ 5.68, 6.21] with confidence level 0.9.
Simulate meaning of confident interval for mean.
> mu<-2
> sigma<-4
> CIX<-function(n)
+ {
+ x<-rnorm(n,mu,sigma)
+ CI<-c(mean(x)-qt(0.95,df=n-1)*sqrt(var(x)/n),mean(x)+qt(0.95,df=n-
1)*sqrt(var(x)/n));
+ return (CI);
+ }
> n<-100
> countCI<-function(m)
+ {
+ count=0;
+ for (i in 1:m)
+ {
+ CI<-CIX(n);
+ if ((mu>=CI[1])&(mu<=CI[2]))
+ count=count+1;
+ }
+ return(count/m)
+ }
> countCI(100)
[1] 0.89
> countCI(1000)
[1] 0.901
> countCI(10000)
[1] 0.9035
2. Confidence intervals for proportion
Case of large samples ( np  5 and n(1 − p )  5) , we have confidence level at level (1 −  ) for the
unknown proportion p is
 ˆ − p)
p(1 ˆ ˆ 
ˆ − p)
p(1
CI1− (p) =  pˆ − u1− /2 , pˆ + u1− /2 
 n n 
Using the data DataExcel of student in a university, we build a CI for proportion of female at
university.
> f<-table(GT)[1]/length(GT)
> CI<-c(f-qnorm(0.95)*sqrt(f*(1-f)/100),f+qnorm(0.95)*sqrt(f*(1-f)/100))
> CI.prop<-c(f-qnorm(0.95)*sqrt(f*(1-f)/100),f+qnorm(0.95)*sqrt(f*(1-f)/100))
> CI.prop
F F
0.3978231 0.5621769
We get the confidence interval of proportion Female [0.398, 0.562] with confidence level 0.9.
Simulate meaning of confident interval for proportion.
> p<-0.4
> alpha<-0.05
> CIprop<-function(n)
+ {
+ x<-rbinom(n,1,p);
+ f<-mean(x)
+ CI<-c(f-qnorm(1-alpha/2)*sqrt(f*(1-f)/n),
+ f+qnorm(1-alpha/2)*sqrt(f*(1-f)/n))
+ return(CI)
+ }
> CIprop(100)
[1] 0.2565157 0.4434843
> n<-100
> countCI<-function(m)
+ {
+ count=0;
+ for (i in 1:m)
+ {
+ CI<-CIprop(n);
+ if ((p>=CI[1])&&(p<=CI[2]))
+ count=count+1;
+ }
+ return(count/m)
+ }
> countCI(100)
[1] 0.93
> countCI(1000)
[1] 0.947
> countCI(10000)
[1] 0.9521
3. Confidence intervals for variance
Case of samples under the assumption of normality, a confidence interval at level (1 −  ) for the
variance  2 is
 (n − 1)ˆ 2 (n − 1)ˆ 2 
CI1− (2 ) =  n −1 , 
 q1− /2 q n −/21 
Using the data DataExcel of student in a university, we build a CI for variance of T1:
> S2<-var(T1)
> n<-length(T1)
> CIVar<-c((n-1)*S2/qchisq(0.95,df=n-1),(n-1)*S2/qchisq(0.05,df=n-1))
> CIVar
[1] 2.078214 3.323823
Simulate meaning of confident interval for proportion.
> alpha<-0.05
> mu<-2
> sigma<-4
> CIVar<-function(n)
+ {
+ x<-rnorm(n,mu,sigma)
+ s<-sqrt(var(x))
+ CI<-c((n-1)*(s^2)/qchisq(1-alpha/2,df=n-1),
+ (n-1)*(s^2)/qchisq(alpha/2,df=n-1));
+ return (CI);
+ }
> n<-100
> countCI<-function(m)
+ {
+ count=0;
+ for (i in 1:m)
+ {
+ CI<-CIVar(n);
+ if ((sigma^2>=CI[1])&(sigma^2<=CI[2]))
+ count=count+1;
+ }
+ return(count/m)
+ }
> countCI(100)
[1] 0.93
> countCI(1000)
[1] 0.945
> countCI(10000)
[1] 0.9478
D. Test statistical hypothesis on the mean, variance, and proportion.
1. Test statistical hypothesis on the mean
Comparing the theoretical mean and a reference value (case with one sample)
Let X be a quantitative variable with theoretical mean  and variance  2 . Using a sample of size
n, we wish to compare the theoretical mean to a reference value  0 . The hypotheses of the test

are H 0 :  = 0 and H1 :   0 . Under H 0 , when data are normal distribution or large sample

size (n > 30) the test statistic is
 X − 0 
T = n  ~ St (n − 1)
  
Take example that we want to know whether average T1 above 7.5 in DataExcel.
> t.test(T1,mu=6,alternative = "greater")

One Sample t-test

data: T1
t = -0.33575, df = 99, p-value = 0.6311
alternative hypothesis: true mean is greater than 6
95 percent confidence interval:
5.678953 Inf
sample estimates:
mean of x
5.946
We cannot give a positive answer to the question, at the specified risk level  = 5% .
Comparing two theoretical means (case with two samples)
Let X 1 and X 2 be two quantitative variables (measuring the same quantity, but in two different
populations). We assume that X 1 has theoretical mean 1 and variance  12 and that X 2 has
theoretical mean  2 and variance  22 . Using estimations computed from two samples of
respective size n1 and n2 from the two populations, we wish to compare 1 and  2 . The

hypotheses of the test are H 0 : 1 = 2 and H1 : 1   2 . Under H 0 , when data are normal

distribution and X 1 are equal variance X 2 the test statistic is

X1 − X 2
T= ~ St (n1 + n2 − 2)
1 1
 +
n1 n2

(n1 − 1)ˆ12 + (n2 − 1)ˆ 22 2 2


Where ˆ = , ˆ1 , ˆ 2 being estimators of the variance in the two populations.
n1 + n2 − 2
Example that we want to see whether there is a significant difference of T1 and T2.
> t.test(T1,T2,var.equal = TRUE)

Two Sample t-test

data: T1 and T2
t = -1.9093, df = 198, p-value = 0.05766
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.87005122 0.01405122
sample estimates:
mean of x mean of y
5.946 6.374

We cannot give a positive answer to the question, at the specified risk level  = 5% .
2. Test statistical hypothesis on the variance
Comparing two theoretical variances (case with two samples)
This test is often useful as a prerequisite for other tests, such as the comparison of two means in
the case with small samples. Indeed, in this case, the statistic is not the same depending on
whether the variances of X 1 (variable for the first sample) and X 2 (variable for the second
sample) can be considered as equal or not.


The hypotheses of the test are H 0 :  =  and H1 :    22 the test statistic is
2
1
2
2
2
1

ˆ12
T= ~ F (n1 − 1, n2 − 1)
ˆ 22
Example that we want to see whether there is a significant difference of variance of T1 and T2.
> var.test(T1,T2)

F test to compare two variances

data: T1 and T2
F = 1.061, num df = 99, denom df = 99, p-value = 0.769
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7138635 1.5768465
sample estimates:
ratio of variances
1.060968
Example that we want to see whether there is a significant difference of T1 and T2.
> t.test(T1,T2,var.equal = TRUE)

Two Sample t-test

data: T1 and T2
t = -1.9093, df = 198, p-value = 0.05766
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.87005122 0.01405122
sample estimates:
mean of x mean of y
5.946 6.374

We cannot give a positive answer to the question, at the specified risk level  = 5% .
3. Test statistical hypothesis of proportion
Comparing a theoretical proportion to a reference value (case with one sample)
Let p be the unknown frequency of a trait in a given population. We observe data of
presence/absence of this trait on individuals in a sample of size n in this population. The

hypotheses of the test are H 0 : p = p0 and H1 : p  p0 . With H 0 sample must be large enough

( np0  5, n(1 − p0 )  5 the test statistic is

pˆ − p0
U= ~ N (0,1)
p0 (1 − p0 )
n

Example that we want to see whether female rate is less than p0 = 0.4

> prop.test(table(GT)[1],length(GT),0.4,alternative = "less",correct = FALSE)

1-sample proportions test without continuity correction

data: table(GT)[1] out of length(GT), null probability 0.4


X-squared = 2.6667, df = 1, p-value = 0.9488
alternative hypothesis: true p is less than 0.4
95 percent confidence interval:
0.0000000 0.5616158
sample estimates:
p
0.48
We cannot give a positive answer to the question, at the specified risk level  = 5% .
Comparing two theoretical proportions (case with two samples)
Let p1 (respectively p2 ) be the unknown proportion of individuals with a given trait within a
population P1 (respectively P2 ). We wish to compare p1 and p2 . To this end, we use the
frequencies p̂1 and p̂2 of this trait in two representative samples of the two populations, of

respective sizes n1 and n2 . The hypotheses of the test are H 0 : p1 = p2 and H1 : p1  p2 . With H 0

,sample must be large enough ( n1 pˆ  5, n1 (1 − pˆ )  5, n2 pˆ  5, n2 (1 − pˆ )  5) the test statistic is

pˆ1 − pˆ 2 n pˆ + n pˆ
U= ~ N (0,1), pˆ = 1 1 1 1
pˆ (1 − pˆ ) pˆ (1 − pˆ ) n1 + n2
+
n1 n2

Example that we want to see whether female rate in KV1 as female rate in KV2.
> table(GT,KV)
KV
GT 1 2 2NT
F 24 13 11
M 36 6 10
> mytable<-as.matrix(table(GT,KV)[,c(1,2)])
> prop.test(mytable,correct = FALSE)

2-sample test for equality of proportions without


continuity correction

data: mytable
X-squared = 4.6812, df = 1, p-value = 0.03049
alternative hypothesis: two.sided
95 percent confidence interval:
-0.39520594 -0.02178248
sample estimates:
prop 1 prop 2
0.6486486 0.8571429
We can conclude that there is a significant difference in the proportion of female in KV1 and
female rate in KV2 at the  = 5% risk level.

You might also like