Professional Documents
Culture Documents
A. Let X1 , X 2 , , X n be a random sample of size n from the normal population N ( , 2 ) and let
X = ( X1 , X2 , , Xn ) . We have X Nn ( ,2I n ) with = ( , , , ) . Using the properties of
the multivariate normal distribution, we have:
X N ( , 2 / n ) . (1)
> mu <- 2
> sigma <- 3
> Y <- function(n) rnorm(n,mu,sigma)
> n=4
> meanY <- function() mean(Y(n))
> MeanY()
[1] 1.903066
> samplemeanY <- function(m) replicate(m,MeanY())
> m=10000
> hist(samplemeanY(m),freq=0,breaks=40)
> curve(dnorm(x,mu,sigma/sqrt(n)),col="red",add=TRUE)
( n − 1) S2X 2 ( n − 1) , (2)
2
> mu<-2
> sigma<-3
> n<-100
> Y<-function() {
+ x<-rnorm(n,mu,sigma)
+ (n-1)*var(x)/sigma^2
+ }
> vecY<-function(m) replicate(m,Y())
> hist(vecY(1000),main=" Histogram of VecY",xlab="VecY",freq=0,breaks=40)
> curve(dchisq(x,df=n-1),col="red",add=TRUE)
# i : Xi = 1
X= f
n
is the relative frequency of the number 1 (the number of “success”) in the sample, (1) and (2)
become
f N ( p, p (1 − p ) / n ) . (1’)
> p<-0.7
> n<-100
> X<-function(n)
+ {
+ x<-rbinom(n,1,p)
+ mean(x)
+ }
> samplemeanmauX<-function(m) replicate(m,X(n))
> hist(samplemeanmauX(10000),main = "Histogram of Mean X",xlab =
"meanX",freq=0,breaks=40)
> curve(dnorm(x,p,sqrt(p*(1-p)/n)),col="red",add=TRUE)
( n − 1) f (1 − f ) 2 ( n − 1) , (2’)
p (1 − p )
> p<-0.7
> n<-100
> X<-function(n)
+ {
+ x<-rbinom(n,1,p)
+ ((n-1)*var(x))/(p*(1-p))
+ }
> samplemeanmauX<-function(m) replicate(m,X(n))
> hist(samplemeanmauX(10000),main = "Histogram of Mean X",xlab =
"meanX",freq=0,breaks=40)
> curve(dchisq(x,n-1),col="red",add=TRUE)
C. Construct the confidence intervals for the mean, variance, and proportion.
1. Confidence intervals for the mean
Case of a large sample (n > 30) or of a small sample under the assumption of normality
A confidence interval at level (1 − ) for the mean is
−1 ˆ −1 ˆ
CI1− = x − t1n− , x + t1n−
n
/2 /2
n
The confidence interval is given by the function t.test().
Using the data DataExcel of student in a university we build a CI for mean of T1’s score.
> t.test(T1,conf.level=0.9)$conf.int
[1] 5.678953 6.213047
attr(,"conf.level")
[1] 0.9
We get the confidence interval [ 5.68, 6.21] with confidence level 0.9.
Simulate meaning of confident interval for mean.
> mu<-2
> sigma<-4
> CIX<-function(n)
+ {
+ x<-rnorm(n,mu,sigma)
+ CI<-c(mean(x)-qt(0.95,df=n-1)*sqrt(var(x)/n),mean(x)+qt(0.95,df=n-
1)*sqrt(var(x)/n));
+ return (CI);
+ }
> n<-100
> countCI<-function(m)
+ {
+ count=0;
+ for (i in 1:m)
+ {
+ CI<-CIX(n);
+ if ((mu>=CI[1])&(mu<=CI[2]))
+ count=count+1;
+ }
+ return(count/m)
+ }
> countCI(100)
[1] 0.89
> countCI(1000)
[1] 0.901
> countCI(10000)
[1] 0.9035
2. Confidence intervals for proportion
Case of large samples ( np 5 and n(1 − p ) 5) , we have confidence level at level (1 − ) for the
unknown proportion p is
ˆ − p)
p(1 ˆ ˆ
ˆ − p)
p(1
CI1− (p) = pˆ − u1− /2 , pˆ + u1− /2
n n
Using the data DataExcel of student in a university, we build a CI for proportion of female at
university.
> f<-table(GT)[1]/length(GT)
> CI<-c(f-qnorm(0.95)*sqrt(f*(1-f)/100),f+qnorm(0.95)*sqrt(f*(1-f)/100))
> CI.prop<-c(f-qnorm(0.95)*sqrt(f*(1-f)/100),f+qnorm(0.95)*sqrt(f*(1-f)/100))
> CI.prop
F F
0.3978231 0.5621769
We get the confidence interval of proportion Female [0.398, 0.562] with confidence level 0.9.
Simulate meaning of confident interval for proportion.
> p<-0.4
> alpha<-0.05
> CIprop<-function(n)
+ {
+ x<-rbinom(n,1,p);
+ f<-mean(x)
+ CI<-c(f-qnorm(1-alpha/2)*sqrt(f*(1-f)/n),
+ f+qnorm(1-alpha/2)*sqrt(f*(1-f)/n))
+ return(CI)
+ }
> CIprop(100)
[1] 0.2565157 0.4434843
> n<-100
> countCI<-function(m)
+ {
+ count=0;
+ for (i in 1:m)
+ {
+ CI<-CIprop(n);
+ if ((p>=CI[1])&&(p<=CI[2]))
+ count=count+1;
+ }
+ return(count/m)
+ }
> countCI(100)
[1] 0.93
> countCI(1000)
[1] 0.947
> countCI(10000)
[1] 0.9521
3. Confidence intervals for variance
Case of samples under the assumption of normality, a confidence interval at level (1 − ) for the
variance 2 is
(n − 1)ˆ 2 (n − 1)ˆ 2
CI1− (2 ) = n −1 ,
q1− /2 q n −/21
Using the data DataExcel of student in a university, we build a CI for variance of T1:
> S2<-var(T1)
> n<-length(T1)
> CIVar<-c((n-1)*S2/qchisq(0.95,df=n-1),(n-1)*S2/qchisq(0.05,df=n-1))
> CIVar
[1] 2.078214 3.323823
Simulate meaning of confident interval for proportion.
> alpha<-0.05
> mu<-2
> sigma<-4
> CIVar<-function(n)
+ {
+ x<-rnorm(n,mu,sigma)
+ s<-sqrt(var(x))
+ CI<-c((n-1)*(s^2)/qchisq(1-alpha/2,df=n-1),
+ (n-1)*(s^2)/qchisq(alpha/2,df=n-1));
+ return (CI);
+ }
> n<-100
> countCI<-function(m)
+ {
+ count=0;
+ for (i in 1:m)
+ {
+ CI<-CIVar(n);
+ if ((sigma^2>=CI[1])&(sigma^2<=CI[2]))
+ count=count+1;
+ }
+ return(count/m)
+ }
> countCI(100)
[1] 0.93
> countCI(1000)
[1] 0.945
> countCI(10000)
[1] 0.9478
D. Test statistical hypothesis on the mean, variance, and proportion.
1. Test statistical hypothesis on the mean
Comparing the theoretical mean and a reference value (case with one sample)
Let X be a quantitative variable with theoretical mean and variance 2 . Using a sample of size
n, we wish to compare the theoretical mean to a reference value 0 . The hypotheses of the test
are H 0 : = 0 and H1 : 0 . Under H 0 , when data are normal distribution or large sample
size (n > 30) the test statistic is
X − 0
T = n ~ St (n − 1)
Take example that we want to know whether average T1 above 7.5 in DataExcel.
> t.test(T1,mu=6,alternative = "greater")
data: T1
t = -0.33575, df = 99, p-value = 0.6311
alternative hypothesis: true mean is greater than 6
95 percent confidence interval:
5.678953 Inf
sample estimates:
mean of x
5.946
We cannot give a positive answer to the question, at the specified risk level = 5% .
Comparing two theoretical means (case with two samples)
Let X 1 and X 2 be two quantitative variables (measuring the same quantity, but in two different
populations). We assume that X 1 has theoretical mean 1 and variance 12 and that X 2 has
theoretical mean 2 and variance 22 . Using estimations computed from two samples of
respective size n1 and n2 from the two populations, we wish to compare 1 and 2 . The
hypotheses of the test are H 0 : 1 = 2 and H1 : 1 2 . Under H 0 , when data are normal
distribution and X 1 are equal variance X 2 the test statistic is
X1 − X 2
T= ~ St (n1 + n2 − 2)
1 1
+
n1 n2
data: T1 and T2
t = -1.9093, df = 198, p-value = 0.05766
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.87005122 0.01405122
sample estimates:
mean of x mean of y
5.946 6.374
We cannot give a positive answer to the question, at the specified risk level = 5% .
2. Test statistical hypothesis on the variance
Comparing two theoretical variances (case with two samples)
This test is often useful as a prerequisite for other tests, such as the comparison of two means in
the case with small samples. Indeed, in this case, the statistic is not the same depending on
whether the variances of X 1 (variable for the first sample) and X 2 (variable for the second
sample) can be considered as equal or not.
The hypotheses of the test are H 0 : = and H1 : 22 the test statistic is
2
1
2
2
2
1
ˆ12
T= ~ F (n1 − 1, n2 − 1)
ˆ 22
Example that we want to see whether there is a significant difference of variance of T1 and T2.
> var.test(T1,T2)
data: T1 and T2
F = 1.061, num df = 99, denom df = 99, p-value = 0.769
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7138635 1.5768465
sample estimates:
ratio of variances
1.060968
Example that we want to see whether there is a significant difference of T1 and T2.
> t.test(T1,T2,var.equal = TRUE)
data: T1 and T2
t = -1.9093, df = 198, p-value = 0.05766
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.87005122 0.01405122
sample estimates:
mean of x mean of y
5.946 6.374
We cannot give a positive answer to the question, at the specified risk level = 5% .
3. Test statistical hypothesis of proportion
Comparing a theoretical proportion to a reference value (case with one sample)
Let p be the unknown frequency of a trait in a given population. We observe data of
presence/absence of this trait on individuals in a sample of size n in this population. The
hypotheses of the test are H 0 : p = p0 and H1 : p p0 . With H 0 sample must be large enough
( np0 5, n(1 − p0 ) 5 the test statistic is
pˆ − p0
U= ~ N (0,1)
p0 (1 − p0 )
n
Example that we want to see whether female rate is less than p0 = 0.4
pˆ1 − pˆ 2 n pˆ + n pˆ
U= ~ N (0,1), pˆ = 1 1 1 1
pˆ (1 − pˆ ) pˆ (1 − pˆ ) n1 + n2
+
n1 n2
Example that we want to see whether female rate in KV1 as female rate in KV2.
> table(GT,KV)
KV
GT 1 2 2NT
F 24 13 11
M 36 6 10
> mytable<-as.matrix(table(GT,KV)[,c(1,2)])
> prop.test(mytable,correct = FALSE)
data: mytable
X-squared = 4.6812, df = 1, p-value = 0.03049
alternative hypothesis: two.sided
95 percent confidence interval:
-0.39520594 -0.02178248
sample estimates:
prop 1 prop 2
0.6486486 0.8571429
We can conclude that there is a significant difference in the proportion of female in KV1 and
female rate in KV2 at the = 5% risk level.