Attribution Non-Commercial (BY-NC)

346 views

Attribution Non-Commercial (BY-NC)

- Sas Chapter 10 Asda Analysis Examples Replication Winter 2010 Sas
- BIOL 300 Desharnais FinalExamBKey (2010)
- Tan et al
- Cashew is a High Value Tree Crop That is Well Suited to Being Grown in Several Parts of Africa
- 4.Statistical Analysis for Assessing Knowledge and Attitude on HIVAIDS
- Path Spss Amos
- GNC_test_I
- chap11
- Modelling and Optimization of Process Parameters for Tig
- rems5953 homework 4 miguel llovera da corte spv
- Byrne cap 6
- Vikram Mpc (2)
- Fendri et al Afr J Biotech.pdf
- Final_Report_dan.pdf
- 22ANOVAmixed.pdf
- JESD
- 13812141045_KRISNA MURTI DARPITA SAKTI_B_Latihan 1.pdf
- Linear Regression
- Ardl Model
- Livro de regressão de modelos

You are on page 1of 10

5, page 1

Case Study 5.1

A randomized experiment to compare lifetimes in six different diets among female mice.

• Randomization: female mice were randomly assigned to the six treatments. Randomization

ensures no bias in the assignment of mice to treatments. It does not guarantee that the

groups will be identical, but it allows us to use probability to assess whether the differences

observed could have occurred by chance.

• Replication – important for estimating variability within groups

• Other?

Scope of inference

• The scope of inference is to what would have happened if all mice had been fed each diet.

• The scope of inference can be expanded further if these mice can be viewed as

representative of a larger population of female mice.

• Since it’s an experiment, we can infer cause-and-effect if the experiment was well-run.

Comparison of all six diets was of interest, but there were some specific comparisons that were

of interest as outlined in Display 5.3.

procedures (if normality assumptions are satisfied), there are some advantages to a more

comprehensive procedure:

• If the variability within each treatment is about the same for all treatments, then it makes

sense to estimate a pooled standard deviation from all the treatments even if we’re only

comparing any two at a time.

• We may want to carry out more complicated comparisons, such as a comparison of a

control group to the average of the other five groups.

• A standard first question of interest when comparing several groups is whether there is

evidence that any of the means are different from each other. Comparing all the treatments

pairwise with two-sample t tests results in a lot of individual tests (15 for 6 treatments). An

overall test of equality of all the treatment means is much more efficient and will not suffer

from the problem of running multiple tests (where statistically significant results have to be

considered in the context of how many tests were run).

An Ideal Model which allows the problems above to be solved fairly easily

• Population distributions are normal

• Population standard deviations are equal

• Independent random samples from each population (a randomized experiment satisfies this

assumption)

Chap. 5, page 2

This model is exactly the model for the pooled two-sample t-test when there are two groups:

different means, but common standard deviation

The assumption of equal standard deviations is very important and must be checked. If there are

large differences in variability, this may be of interest in and of itself and the reasons for this

should be addressed. Often, differing variability is caused by higher values of the variable in

some groups than another. For example, the variability in lifetimes of animals is likely to be

greater the longer they tend to live. Transformations (such as log) can sometimes solve this

problem.

The two-sample pooled t procedure for comparing any pairs of means, say µ1 and µ 2 uses

1 1 (n1 − 1) s12 + (n2 − 1) s 22

Y1 − Y2 and SE( Y1 − Y2 ) = s p + where s p = .

n1 n2 n1 + n2 − 2

The only change in adapting this to several groups is to use the pooled standard deviation from

all of the groups if the assumption of equal standard deviations seems reasonable.

Descriptives

Months survived

NP 49 27.40 6.134 6.4 35.5

N/N85 57 32.69 5.125 17.9 42.3

N/R50 71 42.30 7.768 18.6 51.9

R/R50 56 42.89 6.683 24.2 50.7

N/R lopro 56 39.69 6.992 23.4 49.7

N/R40 60 45.12 6.703 19.6 54.6

The equal variance assumption seems reasonable for this experiment so we will use the pooled standard

deviation from all 6 treatments.

sp =

(n1 − 1) + (n2 − 1) + … + (n I − 1)

48(6.134 2 ) + 56(5.125 2 ) + … + 59(6.703 2 )

= = 44.599 = 6.678

48 + 56 + … + 59

The degrees of freedom for the t distribution when you use this pooled standard deviation is the

denominator in the above expression which is n − I , where n is the total sample size (349 in our

example) and I is the number of groups or treatments (6 in our example). So we use a t with 343

degrees of freedom for the mice experiment.

Chap. 5, page 3

• One desired comparison is between groups 1 and 2: the unrestricted non-purified diet

(NP) to a standard 85 calorie diet (N/N85). The result is summarized in part e) on p. 116.

1 1 1 1

First, note that SE( Y1 − Y2 ) = s p + = 6.678 + = 1.301.

n1 n 2 49 57

Y1 − Y2 ± t 343 (.975) SE(Y1 − Y2 ) = 35.5 – 42.3 ± 1.967 (1.301)

= -6.8 ± 2.56 ≈ -9.4 months to -4.2 months

Conclusion: It is estimated that the 85 calorie standard diet increases mean life

expectancy by 6.8 months over an unrestricted diet with a 95% confidence interval of 4.2

to 9.4 months.

• A test of the null hypothesis that µ 1 = µ 2 against a one-sided alternative that µ 1 < µ 2

(we would have to decide before collecting the data that we were only interested in

detecting an increase in mean life expectancy with the 85 calorie diet):

Y1 − Y2 − 6.8

Test statistic = = = -5.23

SE(Y1 − Y2 ) 1.301

Conclusion: The data provide very strong evidence that the 85-calorie diet increases life

expectancy over the unrestricted diet.

Note: if the equal standard deviations assumption did not appear reasonable, then we could

have done the confidence interval and hypothesis test the usual way using the pooled

standard deviation from the two groups or the unpooled Welch’s t procedures. The

advantage of pooling all 6 groups is a better estimate with increased degrees of freedom.

Designed to answer the question: is there evidence of a difference between any of the means?

hypothesis is that at least one mean is different from the others. The alternative hypothesis

would include all these possibilities:

• All the means are different from one another

• Five means are the same and one is different

Chap. 5, page 4

• Three of the means are the same, the other three are the same but different from the first

group

The idea of a one-sided alternative hypothesis is meaningless with three or more groups.

Testing the hypothesis of equal means relies on a general approach which we will use frequently

in the rest of the course:

Full model: a general model which adequately describes the data.

Reduced model: a special case of the full model obtained by imposing the restriction of the

null hypothesis.

For testing the equality of several population means, these models are:

Full model: the population distributions are normal with the same standard deviations, but

different (possibly) means

Reduced model: the population distributions are normal with the same standard deviations,

and the same means

The general idea is that we “fit” both these models to the data (like regression). Each model

gives a predicted value for every case. The full model uses each observation’s group mean as the

predicted value. The reduced model uses the mean of all the observations together. We then

measure how well the data fit the models by computing the sum of squared residuals. The full

model can fit no worse than the reduced model because the reduced model is a special case of the

full model.

Group 1 2 3 4 5 6

Full Y1 Y2 Y3 Y4 Y5 Y6

Reduced Y Y Y Y Y Y

Example:

To illustrate these calculations, we’ll use a small hypothetical example, with 3 groups and 10

observations in all.

Group 2: 12.1 14.2 16.0 16.5 n2 =4 Y2 = 14.7 s 2 = 1.995

Group 3: 20.9 24.4 27.3 n3 =3 Y3 = 24.2 s3 = 3.205

Total: n =10 Y = 17.1 s p = 2.535

Chap. 5, page 5

(reduced (reduced residual Predicted Residual Squared

Group Obs Response model) model) (reduced) (full model) (full model) residual (full)

1 1 10.7 17.1 -6.4 40.96 13.2 -2.5 6.25

1 2 13.2 17.1 -3.9 15.21 13.2 0.0 0.00

1 3 15.7 17.1 -1.4 1.96 13.2 2.5 6.25

2 1 12.1 17.1 -5.0 25.00 14.7 -2.6 6.76

2 2 14.2 17.1 -2.9 8.41 14.7 -0.5 0.25

2 3 16.0 17.1 -1.1 1.21 14.7 1.3 1.69

2 4 16.5 17.1 -0.6 0.36 14.7 1.8 3.24

3 1 20.9 17.1 3.8 14.44 24.2 -3.3 10.89

3 2 24.4 17.1 7.3 53.29 24.2 0.2 0.04

3 3 27.3 17.1 10.2 104.04 24.2 3.1 9.61

Total 264.88 44.98

Extra sum of squares = Residual sum of squares (reduced) – Residual sum of squares (full)

The residual sum of squares for a model represents the variability in the original data which is

not explained by the model. The extra sum of squares therefore represents the amount of the

unexplained variability in the reduced model that is explained by the full model.

The question now is whether the improved fit represents something real or could just be

attributed to sampling variability. We use the F-statistic to test the null hypothesis that the

populations follow the reduced model against the alternative that they follow the full model and

not the reduced model.

F-statistic =

σˆ full

2

Extra degrees of freedom = # params for full model – # params for reduced model

=4–2=2

σ̂ full

2

= estimate of σ 2 based on full model = s 2p (square of pooled standard deviation)

The numerator of the F-statistic is the average reduction in residual sum of squares for each

parameter added and the denominator is the reduction we would expect per extra parameter just

by chance.

Chap. 5, page 6

2.5352

degrees of freedom and denominator degrees of freedom.

ANOVA

Response

Sum of

Squares df Mean Square F Sig.

Between Groups 219.900 2 109.950 17.111 .002

Within Groups 44.980 7 6.426

Total 264.880 9

I ni

Total sum of squares = SST = ∑∑ (Y

i =1 j =1

ij − Y )2

I

Sum of squares between groups = SSB = ∑ n (Y − Y )

i =1

i i

2

I ni

Sum of squares within groups = SSW = ∑∑ (Y

i =1 j =1

ij − Yi ) 2

Note that, SST = SSB + SSW and Extra sum of squares = SST – SSW, hence SSB = ESS.

SSB

Mean square between groups = MSB =

I −1

SSW

Mean square within groups = MSW = = s 2p

n−I

Between groups SSB I-1 MSB MSB/MSW

Within groups SSW n-I MSW

Total SST n-1

Chap. 5, page 7

It’s easiest to see if the sample sizes are equal: n1 = n2 = … = n I . Call the common sample size

n*. Remember that we always assume that the population distributions are normal, the standard

deviations are all equal, and the samples are independent.

correct.

If the population means are equal (i.e., if the null hypothesis is true) then

Y1 is N( µ , σ / n * )

Y2 is N( µ , σ / n * )

…

YL is N( µ , σ / n * )

Since the samples are independent, Y1 , Y2 ,…, YI are like a random sample from a normal

population with mean µ and standard deviation σ / n * . Therefore, the sample variance of

Y1 , Y2 ,… , YI is an estimate of σ 2 / n * :

1 I σ2

∑ (Yi − Y )2

I − 1 i =1

is an estimate of

n*

.

1 I

Hence, ∑ n *(Yi − Y )2 = MSB is an estimate of σ 2 .

I − 1 i =1

To summarize:

• MSB is an estimate of σ 2 only if the reduced model (the equal means model) is correct.

If the reduced model is not correct, then MSB will tend to overestimate σ 2 .

•

Therefore,

• if the null hypothesis is true (i.e., the equal means model is correct), then MSB/MSW

should be 1 except for sampling error

• if the null hypothesis is false, MSB/MSW will tend to be bigger than 1

• if the null hypothesis is true, the sampling distribution of MSB/MSW is an F distribution

with I-1 d.f. in the numerator and n-I d.f. in the denominator.

• large values of MSB/MSW are evidence in favor of the alternative hypothesis; therefore,

the P-value is the area to the right of MSB/MSW in the F distribution.

Chap. 5, page 8

ANOVA

Months survived

Sum of

Squares df Mean Square F Sig.

Between Groups 12733.942 5 2546.788 57.104 .00000

Within Groups 15297.415 343 44.599

Total 28031.357 348

Conclusion: There is overwhelming evidence that there is a difference in the mean lifetimes

under the different diets. This does not mean that all the diets are different, only that at least one

of them is.

Robustness to assumptions: see Section 5.5.1, p. 130. The main distributional assumptions we

need to worry about are:

• Population standard deviations are roughly equal

• There are no extreme outliers; the F-test is not resistant to outliers, particularly with small

samples

We can judge these assumptions from side-by-side dotplots or boxplots of the raw data. Judging

equality of standard deviations is a little easier if we subtract off the mean of group. That is we

examine the residuals for the full (separate means) model: Yij − Yi . As in regression, we plot the

residuals versus the predicted values. The predicted value for an observation is the group mean.

Judging from this plot, the original boxplots, and the sample standard deviations, there doesn’t

seem to be any reason to doubt the assumptions of the F test.

Chap. 5, page 9

Examining models between the separate means and the equal means models

Suppose we wanted to examine the model which assumes the two control groups (NP and

N/N85) have the same mean lifetime and the remaining four calorie restricted diets have the

same mean lifetime. The question is: how much of the difference among the means is due

simply to the differences between these two groups of diets?

This is a two-mean model that is between the separate means model (with 6 parameters to

describe the means) and the equal means model (with parameter to describe the means).

Separate means

Two means

Equal means

These three models are said to be nested because each model is a special case of the ones above

it.

We can test the two means model against the separate means model in SPSS by creating a new

categorical value which identifies the first two diets as group 1 and the remaining four diets as

group 2. We then run the ANOVA with this new variable as the explanatory variable.

ANOVA

Months survived

Sum of

Squares df Mean Square F Sig.

Between Groups 11131.393 1 11131.393 228.556 .000

Within Groups 16899.964 347 48.703

Total 28031.357 348

This ANOVA table is comparing the two means model to the equal means model. We see that it

is significant. Now, to compare the two-means model to the separate means model we need to

use the sums of square to compute a new F statistic. Recall

F=

σˆ full

2

where

Chap. 5, page 10

ANOVA

Months survived

Sum of

Squares df Mean Square F Sig.

Between Groups 12733.942 5 2546.788 57.104 .00000

Within Groups 15297.415 343 44.599

Total 28031.357 348

ANOVA

Months survived

Sum of

Squares df Mean Square F Sig.

Between Groups 11131.393 1 11131.393 228.556 .000

Within Groups 16899.964 347 48.703

Total 28031.357 348

Calculate the F statistic to test the separate means model against the two-means model:

F , =

- Sas Chapter 10 Asda Analysis Examples Replication Winter 2010 SasUploaded bySarbarup Banerjee
- BIOL 300 Desharnais FinalExamBKey (2010)Uploaded byeeptestbank
- Tan et alUploaded byreeank
- Cashew is a High Value Tree Crop That is Well Suited to Being Grown in Several Parts of AfricaUploaded byPrudence Lugendo
- Path Spss AmosUploaded byShikhar Virmani
- 4.Statistical Analysis for Assessing Knowledge and Attitude on HIVAIDSUploaded bySTATPERSON PUBLISHING CORPORATION
- GNC_test_IUploaded byBenny Cahyanto
- chap11Uploaded byDavid Robayo Martínez
- Modelling and Optimization of Process Parameters for TigUploaded bybalajigandhirajan
- rems5953 homework 4 miguel llovera da corte spvUploaded byapi-201648540
- Byrne cap 6Uploaded byChicacloset Quito
- Vikram Mpc (2)Uploaded byVikram Pratap Singh
- Fendri et al Afr J Biotech.pdfUploaded byLamiouna Lamlouma
- Final_Report_dan.pdfUploaded byAmalina Idris Alphonso
- 22ANOVAmixed.pdfUploaded byHoracio Miranda Vargas
- JESDUploaded byIriNa En
- 13812141045_KRISNA MURTI DARPITA SAKTI_B_Latihan 1.pdfUploaded byElpin Andrianto
- Linear RegressionUploaded bytvvsagar
- Ardl ModelUploaded bysakiaslam
- Livro de regressão de modelosUploaded byCharlesDayan
- HW2.docUploaded byRuth Limbo
- 10.1186%2F1471-2156-10-23.pdfUploaded byAditya Gudibanda
- Fat Particle SizeUploaded byShai Villalba
- Data Anna 1Uploaded bySuaeba Nur
- 205SIMUL3FACTORESUploaded byGiusty Guerrero De La Hoz
- ChiUploaded byAnonymous qRAAceP
- 37910-85-80780-1-10-20180604Uploaded byIda Ayu Adiatmayani Peling
- Biological Hypercomputation and DegreesUploaded byCarlos Eduardo Maldonado
- Chap 013Uploaded byBruno Marins
- Chapter 13 StatsUploaded bykartik

- Model- vs. design-based sampling and variance estimationUploaded byFanny Sylvia C.
- ReviewChaps3-4Uploaded byFanny Sylvia C.
- SampleSizeCalcRevisitedUploaded byFanny Sylvia C.
- ReviewChaps1-2Uploaded byFanny Sylvia C.
- Hypo%26PowerLectureUploaded byFanny Sylvia C.
- Non%26ParaBootUploaded byFanny Sylvia C.
- Chapter 21Uploaded byFanny Sylvia C.
- Chapter 20Uploaded byFanny Sylvia C.
- Chapter 14Uploaded byFanny Sylvia C.
- Chapter 13Uploaded byFanny Sylvia C.
- Chapter 12Uploaded byFanny Sylvia C.
- Chapter 11Uploaded byFanny Sylvia C.
- Chapter 8Uploaded byFanny Sylvia C.
- Chapter 10Uploaded byFanny Sylvia C.
- Chapter 9Uploaded byFanny Sylvia C.
- Chapter 6Uploaded byFanny Sylvia C.
- Chapter5p2LectureUploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- An Ova PowerUploaded byFanny Sylvia C.
- Intro BootstrapUploaded byMichalaki Xrisoula
- Good Article on Standard Error vs Standard DeviationUploaded byAshok Kumar Bharathidasan
- Data Modeling: General Linear Model &Statistical InferenceUploaded byFanny Sylvia C.
- Bio Math 94 CLUSTERING POPULATIONS BY MIXED LINEAR MODELSUploaded byFanny Sylvia C.
- GRM: Generalized Regression Model for Clustering Linear SequencesUploaded byFanny Sylvia C.
- Clustering in the Linear ModelUploaded byFanny Sylvia C.
- R Matrix TutorUploaded byFanny Sylvia C.
- The not so Short Introduction to LaTeXUploaded byoetiker
- Close Out NettingUploaded byFanny Sylvia C.

- Election BohmeUploaded byFarhad Jo
- Catholic IsmUploaded byOrsolya Halmi
- 21059777 Sai Prerana Translation in EnglishUploaded byjaideep247
- Lec2 Intro SMUploaded byShubakar Reddy
- PSA ProposalUploaded bynikko norman
- Chapter 19 Special-Occasion SpeakingUploaded byJodieBowers
- luke hoUploaded byapi-141156084
- La Sangrada FamiliaUploaded bynbtrujillo
- PoulotUploaded byjonfeld
- Act 1950Uploaded byDewi Furkan Dogan
- Code of Ethics for Professional TeachersUploaded byJoverlie Canoy
- rgtrwgrUploaded byraj
- Blinn College myphil2321syl12A.pdfUploaded bySyëd Khān
- Press Conference Script Stem1a10Uploaded bywyrmczar
- Kilaya Sadhana 3Uploaded byDevino Ang
- Black Fraternities and Freemasonry.docxUploaded byAnthony Boyd
- Quality Assurance in NursingUploaded byAnusha Verghese
- SimulationUploaded byneilmijares
- Berninger Et Al (2006) Working Memory, Dyslexia, RASUploaded bydiana_carreira
- Direct Instruction Lesson PlanUploaded byjacquiserra
- Khutbah.com - Khutbahs by Almaghrib Institute InstructorsUploaded byAzmir Bin Hashim
- Becoming a Neoliberal Subject Wc Selfhood Risk SocietyUploaded byGallagher Roisin
- 2015-2019 strategic plan bookUploaded byapi-266184438
- Finite Element Methods in Mechanical DesignUploaded bygpskumar22
- TP1.pdfUploaded byDragana Coric
- Escaping the Eurocentric Gaze in Arundhati Roy's The God of Small Things (1997)Uploaded bytristo251
- wengerangroupassignmentUploaded byapi-272637257
- Shortcircuit-IECUploaded byDanielAlejandroRamosQuero
- Lean TCO by Tim O'MearaUploaded byBuyers Meeting Point
- 00034 Erickson Evangelical InterpretationUploaded bymetu sea