Professional Documents
Culture Documents
POPULATION MEAN
2
3.1 Introduction
This lesson deals with inference problems on multivariate population mean. Similar
to univariate population, inference problems may arise in three different scenarios:
Case I: Inference problems regarding a single multivariate population
Case II: Inference problems regarding two means from a paired population
Case III: Inference problems regarding two means from two independent
populations
All three cases have been dealt with briefly in this lesson. All three problems make
use of the Hotelling's T-square statistic.
11/02/2023
3.2 Focus of Analysis
11/02/2023
3.3 Assumptions Made in Each Case
4 Univariate Case Multivariate Case
1. Distribution
The data all have a common mean ; The data have a common mean vector , i.e.,
mathematically,
This also implies that there are no sub-populations
This implies that there is a single population of with different mean vectors.
subjects and no sub-populations with different
means.
2. Homoskedasticity
The data have common variance , The data for all subjects have common variance-
mathematically, covariance matrix , i.e.,
11/02/2023
5
3.Independence
The subjects are independently The subjects are independently sampled.
sampled.
4. Normality
The subjects are sampled from a The subjects are sampled from a multivariate
normal distribution normal distribution.
11/02/2023
3.4 Hypothesis Testing in Each Case
Univariate
6 Case Multivariate Case
Vs.
11/02/2023
3.5 Univariate Statistics: t-test
7
Hypothesis:
Under univariate statistics null hypothesis can be tested using a t-statistic as shown in the
expression below:
Under this -statistic is going to have a distribution with degrees of freedom. We would
reject at level if the absolute value of the test statistic is greater than the critical value from
the -table, evaluated at as shown below:
11/02/2023
3.6 A Naive Approach
8
Following univariate method, a naive approach for testing multivariate hypothesis is to
compute the -test statistics for each individual variable; i.e.,
Thus we could define tj, which would be the t-statistic for the jth variable as shown above.
We may then reject for at least one variable .
11/02/2023
3.6.1 Problem with Family-wide error rate
9
The basic problem with this naive approach is that it does not control the family-wide error
rate. By definition, the family-wide error rate is the probability of rejecting at least one of
the null hypotheses when all of the Ho’s are true.
To understand the family-wide error rate suppose that the experimental variance-covariance
matrix to be diagonal. This implies zero covariances between the variables. If the data are
multivariate normally distributed then this would mean that all of the variables are
independently distributed. In this case, the family wide error rate is
where is the dimension of the multivariate date and is the level of significance.
11/02/2023
3.6.2 Consequence
The10naive approach yields a liberal test. That is, we will tend to reject the null hypothesis
more often than we should.
11/02/2023
3.7 Bonferroni Correction
Under11the Bonferroni Correction we would reject the null hypothesis that mean vector is equal
to our hypothesized mean vector at level if, for at least one variable the absolute value of is
greater than the critical value from the -table with 1 degrees of freedom evaluated at for at
least one variable between and .
Note: For independent data, this yields a family-wide error rate of approximately α. For
example, if we are looking at an α = 0.05, family-wide error rate is shown in the table below
for different values of p. You can see that these are all close to the desired level of 0.05.
2 3 4 5 10 20 50
family-wide 0.049375 0.049171 0.049070 0.049010 0.048890 0.048830 0.048794
error rate 11/02/2023
3.8 Hotelling’s T-Square
12
Where
11/02/2023
13
The statistic is called Hotellng’s in honor of Harald Hotelling, a pioneer in multivariate
analysis, who first obtained its sampling distribution. Here is the estimated covariance
matrix of .
follows as distribution
That is
we would reject the null hypothesis, Ho, at level α if the test statistic F is greater than
the critical value from the F-table with p and n-p degrees of freedom, evaluated at level α.
11/02/2023
Example 3.1:
14
Let the data matrix for a random sample of size from a bivariate normal population be
Evaluate the observed for . What is the sampling distribution of in this case?
11/02/2023
16
Test the hypothesis
Vs.
11/02/2023
17
3.9 Confidence Regions and Simultaneous Comparisons of Component Means
Let consider the following example,
Vitamin C 75 mg 78.9 mg
11/02/2023
19
11/02/2023
Q2. If they fail to meet the guidelines, then which nutrients the women fail to meet the
20
guidelines?
Let us first compare the univariate case with the analogous multivariate case in the following
tables.
A naive approach to addressing the above is by calculating of Confidence Intervals for each
of the nutritional intake levels, one-at-a-time, using univariate method as shown below:
If we consider only a single variable, we can say with (1 - α) × 100% confidence that the
interval includes the corresponding population mean.
11/02/2023
A one-at-a-time 95% confidence interval for calcium is given by the following where values are substituted into
21
the formula and calculated are shown below:
The one-at-a-time confidence intervals are summarized in the table below:
11/02/2023
Here23the squared Mahalanobis distance between and μ is being used. Note that a closely-
related equation for a hyper-ellipse is
11/02/2023
24
Units along the eigenvectors . Beginning at the center , the axis of the confidence ellipsoid
are
Where
The ratio of the ’s will help identify relative amount along pairs of axis.
11/02/2023
25
11/02/2023
26
11/02/2023
27
11/02/2023
Simultaneous Confidence Statements
28
A (1 - α) × 100% confidence ellipse yields simultaneous (1 - α) × 100% confidence intervals
for all linear combinations of the variable means. Consider linear combinations of population
means as below:
Moreover
11/02/2023
29
Where and are the sample mean vector and covariance matrix of the ’s respectively.
Simultaneous confidence intervals can be developed from a consideration of confidence
intervals for for a various choices of The argument proceeds as follows.
11/02/2023
For a fixed and unknown, a confidence interval for is based on student’s t-ratio
30
Or
11/02/2023
31
In terms of interpreting the (1 - α) × 100% confidence ellipse, we can say that we are (1
- α) × 100% confident that all such confidence intervals cover their respective linear
combinations of the treatment means, regardless of what linear combinations we may
wish to consider. In particular, we can consider the trivial linear combinations which
correspond to the individual variables. So this says that we going to be also (1 - α) ×
100% confident that all of the intervals given in the expression below:
11/02/2023
Result 3.1:
Let 32be a random sample from an population with positive definite. Then , simultaneously
for all c, the interval
11/02/2023
33
𝑥2−
√
𝑝 ( 𝑛−1 )
( 𝑛− 𝑝 )
𝐹 𝑝 ,𝑛 −𝑝 ,𝛼
√ 𝑠22
√𝑛
≤ 𝜇 2 ≤ 𝑥 2+
√𝑝 ( 𝑛− 1 )
( 𝑛−𝑝 )
𝐹 𝑝, 𝑛−𝑝 ,𝛼
√
√𝑛
𝑠22
All hold simultaneously with confidence coefficient (1 - α). Note that, without modifying the
coefficient (1 - α) we can make statements about the differences corresponding to. In this
case and we have the statement
11/02/2023
34
In addition, we can conclude the statements about belonging to the sample mean-centered
ellipses
and still maintain the confidence coefficient (1 - α) for the whole set of statements.
11/02/2023
Example 3.5:(Simultaneous confidence intervals as shadows of the confidence ellipsoid)
35
Find the 95% confidence ellipse for the means of the fourth roots of the door-closed and
door-open microwave radiation measurements.
11/02/2023
Example 3.6: (Constructing simultaneous confidence intervals and ellipses)
36 scores obtained by n = 87 college students on the College Level Examination Program
The
(CLEP) subtestsand the College Qualification Test (CQT) subtests andare given. Construct
simultaneous confidence interval.
Where
11/02/2023
Example 3.7:(Constructing Bonferroni simultaneous confidence intervals and
37
comparing
them with -intervals) Let us return to the microwave oven radiation data we shall obtain the
simultaneous 95% Bonferroni confidence intervals for the means,
11/02/2023
3.1038Profile Plots
If the data is of a very large dimension, tables of simultaneous or Bonferroni confidence
intervals are hard to grasp at a cursory glance. A better approach is to visualize the
coverage of the confidence intervals through a profile plot.
Procedure
A profile plot is obtained by using the following three step procedure:
Step 1: Standardize each of the observations by dividing them by their hypothesized
means.
11/02/2023
39
Step 2: Compute the sample mean for the 's to obtain sample means corresponding to each
of the variables . These sample means are then plotted against the variable .
Step 3: Plot either simultaneous or Bonferroni confidence bands for the population mean
of the transformed variables,
Simultaneous confidence bands are given by the usual formula, using the 's instead of the
usual 's as shown below:
11/02/2023
40
The same substitutions are made for the Bonferroni confidence band formula:
11/02/2023
3.11 41
Large Sample Inferences about a Population Mean Vector
When the sample size is large, tests of hypotheses and confidence regions for can be constructed
without the assumption of a normal population. for large n, we are able to make inferences about the
population mean even though the parent distribution is discrete. In fact, serious departures from a
normal population can be overcome by large sample sizes. Both tests of hypotheses and simultaneous
confidence statements will then possess (approximately) their nominal levels. The advantages
associated with large samples may be partially offset by a loss in
sample information caused by using only the summary statistics , and . On the other hand, since is a
sufficient summary for normal populations. the closer the underlying population is to multivariate
normal, the more efficiently the sample information will be utilized in making inferences. All large-
sample inferences about are based on a -distribution.
11/02/2023
Result
42 3.2: Let be a random sample from a population with mean and positive definite
covariance matrix . When is large, the hypothesis , is rejected in favor of a, at a level of
significance approximately , if the observed
Result 3.3: Let be a random sample from a population with mean and positive definite covariance .
If is large,
11/02/2023
Example 3.8 :(Constructing large sample simultaneous confidence intervals)
43 educator tested thousands of Finnish students on their native musical ability in order educator
A music
tested thousands of Finnish students on their native musical ability in order to set national norms in
Finland. Summary statistics for part of the data set are given in Table 5.5. These statistics are based on
a sample of n = 96 Finnish 12th graders.
11/02/2023