INFERENCE STATISTIC PRACTICE

INFERENCE STATISTIC PRACTICE
A. Let X1 , X 2 , , X n be a random sample of size n from the normal population N ( , 2 ) and let
X = ( X1 , X2 , , Xn ) . We have X Nn ( ,2I n ) with  = ( , , ,  ) . Using the properties of
the multivariate normal distribution, we have:
X N ( , 2 / n ) . (1)
> mu <- 2
> sigma <- 3
> Y <- function(n) rnorm(n,mu,sigma)
> n=4
> meanY <- function() mean(Y(n))
> MeanY()
[1] 1.903066
> samplemeanY <- function(m) replicate(m,MeanY())
> m=10000
> hist(samplemeanY(m),freq=0,breaks=40)
> curve(dnorm(x,mu,sigma/sqrt(n)),col="red",add=TRUE)
( n − 1) S2X  2 ( n − 1) , (2)
2
> mu<-2
> sigma<-3
> n<-100
> Y<-function() {
+ x<-rnorm(n,mu,sigma)
+ (n-1)*var(x)/sigma^2
+ }
> vecY<-function(m) replicate(m,Y())
> hist(vecY(1000),main=" Histogram of VecY",xlab="VecY",freq=0,breaks=40)
> curve(dchisq(x,df=n-1),col="red",add=TRUE)
B. In the case of non-normal population, i.e. X1 , X 2 , , X n is only a random sample of size n

from a population with finite mean  and variance  , as an application of the Central Limit
2
Theorem, the above properties are also true for large value of the sample size n, i.e. (1) and (2)
are true when n →  so that for a large value of n, we get from (1) and (2) the asymptotic
( n −1)S2X
(approximate) distributions for X and 2
. As an application, let X1 , X 2 , , X n be a random
sample from Bernoulli distribution B (1, p ) with mean  = p and variance 2 = p (1 − p ) . Since
# i : Xi = 1
X= f
n
is the relative frequency of the number 1 (the number of “success”) in the sample, (1) and (2)
become
f N ( p, p (1 − p ) / n ) . (1’)
> p<-0.7
> n<-100
> X<-function(n)
+ {
+ x<-rbinom(n,1,p)
+ mean(x)
+ }
> samplemeanmauX<-function(m) replicate(m,X(n))
> hist(samplemeanmauX(10000),main = "Histogram of Mean X",xlab =
"meanX",freq=0,breaks=40)
> curve(dnorm(x,p,sqrt(p*(1-p)/n)),col="red",add=TRUE)
( n − 1) f (1 − f ) 2 ( n − 1) , (2’)
p (1 − p )
> p<-0.7
> n<-100
> X<-function(n)
+ {
+ x<-rbinom(n,1,p)
+ ((n-1)*var(x))/(p*(1-p))
+ }
> samplemeanmauX<-function(m) replicate(m,X(n))
> hist(samplemeanmauX(10000),main = "Histogram of Mean X",xlab =
"meanX",freq=0,breaks=40)
> curve(dchisq(x,n-1),col="red",add=TRUE)
C. Construct the confidence intervals for the mean, variance, and proportion.
1. Confidence intervals for the mean
Case of a large sample (n > 30) or of a small sample under the assumption of normality
A confidence interval at level (1 −  ) for the mean  is
 −1 ˆ −1 ˆ 
CI1− =  x − t1n− , x + t1n−
n 
/2 /2
 n
The confidence interval is given by the function t.test().
Using the data DataExcel of student in a university we build a CI for mean of T1’s score.
> t.test(T1,conf.level=0.9)$conf.int
[1] 5.678953 6.213047
attr(,"conf.level")
[1] 0.9
We get the confidence interval [ 5.68, 6.21] with confidence level 0.9.
Simulate meaning of confident interval for mean.
> mu<-2
> sigma<-4
> CIX<-function(n)
+ {
+ CI<-c(mean(x)-qt(0.95,df=n-1)*sqrt(var(x)/n),mean(x)+qt(0.95,df=n-
1)*sqrt(var(x)/n));
+ return (CI);
+ }
> n<-100
> countCI<-function(m)
+ {
+ count=0;
+ for (i in 1:m)
+ {
+ CI<-CIX(n);
+ if ((mu>=CI[1])&(mu<=CI[2]))
+ count=count+1;
+ }
+ return(count/m)
+ }
> countCI(100)
[1] 0.89
> countCI(1000)
[1] 0.901
> countCI(10000)
[1] 0.9035
2. Confidence intervals for proportion
Case of large samples ( np  5 and n(1 − p )  5) , we have confidence level at level (1 −  ) for the
unknown proportion p is
 ˆ − p)
p(1 ˆ ˆ 
ˆ − p)
p(1
CI1− (p) =  pˆ − u1− /2 , pˆ + u1− /2 
 n n 
Using the data DataExcel of student in a university, we build a CI for proportion of female at
university.
> f<-table(GT)[1]/length(GT)
> CI<-c(f-qnorm(0.95)*sqrt(f*(1-f)/100),f+qnorm(0.95)*sqrt(f*(1-f)/100))
> CI.prop<-c(f-qnorm(0.95)*sqrt(f*(1-f)/100),f+qnorm(0.95)*sqrt(f*(1-f)/100))
> CI.prop
F F
0.3978231 0.5621769
We get the confidence interval of proportion Female [0.398, 0.562] with confidence level 0.9.
Simulate meaning of confident interval for proportion.
> p<-0.4
> alpha<-0.05
> CIprop<-function(n)
+ {
+ x<-rbinom(n,1,p);
+ f<-mean(x)
+ CI<-c(f-qnorm(1-alpha/2)*sqrt(f*(1-f)/n),
+ f+qnorm(1-alpha/2)*sqrt(f*(1-f)/n))
+ return(CI)
+ }
> CIprop(100)
[1] 0.2565157 0.4434843
> n<-100
+ {
+ count=0;
+ for (i in 1:m)
+ {
+ CI<-CIprop(n);
+ if ((p>=CI[1])&&(p<=CI[2]))
+ count=count+1;
+ }
+ return(count/m)
+ }
> countCI(100)
[1] 0.93
> countCI(1000)
[1] 0.947
> countCI(10000)
[1] 0.9521
3. Confidence intervals for variance
Case of samples under the assumption of normality, a confidence interval at level (1 −  ) for the
variance  2 is
 (n − 1)ˆ 2 (n − 1)ˆ 2 
CI1− (2 ) =  n −1 , 
 q1− /2 q n −/21 
Using the data DataExcel of student in a university, we build a CI for variance of T1:
> S2<-var(T1)
> n<-length(T1)
> CIVar<-c((n-1)*S2/qchisq(0.95,df=n-1),(n-1)*S2/qchisq(0.05,df=n-1))
> CIVar
[1] 2.078214 3.323823
Simulate meaning of confident interval for proportion.
> alpha<-0.05
> mu<-2
> sigma<-4
> CIVar<-function(n)
+ {
+ s<-sqrt(var(x))
+ CI<-c((n-1)*(s^2)/qchisq(1-alpha/2,df=n-1),
+ (n-1)*(s^2)/qchisq(alpha/2,df=n-1));
+ return (CI);
+ }
> n<-100
+ {
+ count=0;
+ for (i in 1:m)
+ {
+ CI<-CIVar(n);
+ if ((sigma^2>=CI[1])&(sigma^2<=CI[2]))
+ count=count+1;
+ }
+ return(count/m)
+ }
> countCI(100)
[1] 0.93
> countCI(1000)
[1] 0.945
> countCI(10000)
[1] 0.9478
D. Test statistical hypothesis on the mean, variance, and proportion.
1. Test statistical hypothesis on the mean
Comparing the theoretical mean and a reference value (case with one sample)
Let X be a quantitative variable with theoretical mean  and variance  2 . Using a sample of size
n, we wish to compare the theoretical mean to a reference value  0 . The hypotheses of the test

are H 0 :  = 0 and H1 :   0 . Under H 0 , when data are normal distribution or large sample

size (n > 30) the test statistic is
 X − 0 
T = n  ~ St (n − 1)
  
Take example that we want to know whether average T1 above 7.5 in DataExcel.
> t.test(T1,mu=6,alternative = "greater")
One Sample t-test
data: T1
t = -0.33575, df = 99, p-value = 0.6311
alternative hypothesis: true mean is greater than 6
95 percent confidence interval:
5.678953 Inf
sample estimates:
mean of x
5.946
We cannot give a positive answer to the question, at the specified risk level  = 5% .
Comparing two theoretical means (case with two samples)
Let X 1 and X 2 be two quantitative variables (measuring the same quantity, but in two different
populations). We assume that X 1 has theoretical mean 1 and variance  12 and that X 2 has
theoretical mean  2 and variance  22 . Using estimations computed from two samples of
respective size n1 and n2 from the two populations, we wish to compare 1 and  2 . The

hypotheses of the test are H 0 : 1 = 2 and H1 : 1   2 . Under H 0 , when data are normal

distribution and X 1 are equal variance X 2 the test statistic is
X1 − X 2
T= ~ St (n1 + n2 − 2)
1 1
 +
n1 n2
(n1 − 1)ˆ12 + (n2 − 1)ˆ 22 2 2

Where ˆ = , ˆ1 , ˆ 2 being estimators of the variance in the two populations.
n1 + n2 − 2
Example that we want to see whether there is a significant difference of T1 and T2.
> t.test(T1,T2,var.equal = TRUE)
Two Sample t-test
data: T1 and T2
t = -1.9093, df = 198, p-value = 0.05766
alternative hypothesis: true difference in means is not equal to 0
-0.87005122 0.01405122
sample estimates:
mean of x mean of y
5.946 6.374
2. Test statistical hypothesis on the variance
Comparing two theoretical variances (case with two samples)
This test is often useful as a prerequisite for other tests, such as the comparison of two means in
the case with small samples. Indeed, in this case, the statistic is not the same depending on
whether the variances of X 1 (variable for the first sample) and X 2 (variable for the second
sample) can be considered as equal or not.

The hypotheses of the test are H 0 :  =  and H1 :    22 the test statistic is
2
1
2
2
2
1
ˆ12
T= ~ F (n1 − 1, n2 − 1)
ˆ 22
Example that we want to see whether there is a significant difference of variance of T1 and T2.
> var.test(T1,T2)
F test to compare two variances
data: T1 and T2
F = 1.061, num df = 99, denom df = 99, p-value = 0.769
alternative hypothesis: true ratio of variances is not equal to 1
0.7138635 1.5768465
sample estimates:
ratio of variances
1.060968
Example that we want to see whether there is a significant difference of T1 and T2.
> t.test(T1,T2,var.equal = TRUE)
Two Sample t-test
data: T1 and T2
t = -1.9093, df = 198, p-value = 0.05766
alternative hypothesis: true difference in means is not equal to 0
-0.87005122 0.01405122
sample estimates:
mean of x mean of y
5.946 6.374
3. Test statistical hypothesis of proportion
Comparing a theoretical proportion to a reference value (case with one sample)
Let p be the unknown frequency of a trait in a given population. We observe data of
presence/absence of this trait on individuals in a sample of size n in this population. The

hypotheses of the test are H 0 : p = p0 and H1 : p  p0 . With H 0 sample must be large enough

( np0  5, n(1 − p0 )  5 the test statistic is
pˆ − p0
U= ~ N (0,1)
p0 (1 − p0 )
n
Example that we want to see whether female rate is less than p0 = 0.4
> prop.test(table(GT)[1],length(GT),0.4,alternative = "less",correct = FALSE)
1-sample proportions test without continuity correction
data: table(GT)[1] out of length(GT), null probability 0.4

X-squared = 2.6667, df = 1, p-value = 0.9488
alternative hypothesis: true p is less than 0.4
0.0000000 0.5616158
sample estimates:
p
0.48
Comparing two theoretical proportions (case with two samples)
Let p1 (respectively p2 ) be the unknown proportion of individuals with a given trait within a
population P1 (respectively P2 ). We wish to compare p1 and p2 . To this end, we use the
frequencies p̂1 and p̂2 of this trait in two representative samples of the two populations, of

respective sizes n1 and n2 . The hypotheses of the test are H 0 : p1 = p2 and H1 : p1  p2 . With H 0

,sample must be large enough ( n1 pˆ  5, n1 (1 − pˆ )  5, n2 pˆ  5, n2 (1 − pˆ )  5) the test statistic is
pˆ1 − pˆ 2 n pˆ + n pˆ
U= ~ N (0,1), pˆ = 1 1 1 1
pˆ (1 − pˆ ) pˆ (1 − pˆ ) n1 + n2
+
n1 n2
Example that we want to see whether female rate in KV1 as female rate in KV2.
> table(GT,KV)
KV
GT 1 2 2NT
F 24 13 11
M 36 6 10
> mytable<-as.matrix(table(GT,KV)[,c(1,2)])
> prop.test(mytable,correct = FALSE)
2-sample test for equality of proportions without

continuity correction
data: mytable
X-squared = 4.6812, df = 1, p-value = 0.03049
alternative hypothesis: two.sided
-0.39520594 -0.02178248
sample estimates:
prop 1 prop 2
0.6486486 0.8571429
We can conclude that there is a significant difference in the proportion of female in KV1 and
female rate in KV2 at the  = 5% risk level.

INFERENCE STATISTIC PRACTICE

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

INFERENCE STATISTIC PRACTICE

Uploaded by

Copyright:

Available Formats

INFERENCE STATISTIC PRACTICE

B. In the case of non-normal population, i.e. X1 , X 2 , , X n is only a random sample of size n

One Sample t-test

(n1 − 1)ˆ12 + (n2 − 1)ˆ 22 2 2

Two Sample t-test

F test to compare two variances

Two Sample t-test

> prop.test(table(GT)[1],length(GT),0.4,alternative = "less",correct = FALSE)

1-sample proportions test without continuity correction

data: table(GT)[1] out of length(GT), null probability 0.4

2-sample test for equality of proportions without

You might also like