You are on page 1of 45

共變數分析

Analysis of covariance
(ANCOVA)
生物統計諮詢中心
蔡培癸
大綱
1.ANOVA 回顧
2. 何謂 ANCOVA
3. 不同的研究問題
4.ANCOVA 的假設
5. 基礎方程式
6.SPSS 範例
Anova recap (1)
• You will remember that
– ANOVA is used to test for significant differences
between means (2 or more groups) by analyzing
variance.
– The total variation among a set of scores is divided into
between groups and within groups variation.
– Between groups variation = systematic variance due to
differences in DV means between groups
– Within groups variation = unsystematic variance
(error), due to all other sources of variance
ANOVA recap (2)
• ANOVA Model
F = Between groups variation (due to IV)
Within groups variation (due to error)
– F ratio compares the two sources of variation (after taking
into account the degree of freedom).
– The expected value of F is 1, because (under the
assumption of the NULL hypothesis) both sources of
variance are assessing the same underlying population
variability among scores
– The more that F exceeds one, the greater the indication
that differences in means between groups did not occur by
chance due to sampling error
ANCOVA 模式
• Y=β0+β1X+β2A+E
– Y- 連續依變數 (DV)
– X- 連續自變數 .( 共變數 covariate ; CV)
– A- 類別自變數 .( 因子 factor ; IV)
– E- 殘差
– β0 、 β1 、 β2- 母體的迴歸係數
• 若將 E 刪除,可得到 Ŷ=β 0 + β1 X + β 2 A
虛擬變數
• 在 ANCOA 模式內,類別自變數 A 有 a 層
次, A 應如何編碼才合適 ?
• 1. 以 1 個變數來處理
• 2. 以虛擬變數 (dummy variable) 處

1. 以 1 個變數來處理
• 如用 1,2,3,…,a 來處理
• (1) 當 A=1 , X=x 時:Ŷ1 =β 0 +β1x+β 2 × 1
• (2) 當 A=2 , X=x 時:Ŷ2 =β 0 +β1x+β 2 × 2
• (3) 當 A=3 , X=x 時:
• Ŷ3 =β 0 +β1X+β 2 × 3


• (2)-(1)
• (3)-(2)
ˆˆ -Y =β
Y2 1 2
• (3)-(1) ˆˆ -Y =β
Y3 2 2
ˆˆ -Y =2 × β
Y3 1 2
1. 以 1 個變數來處理… (2)
• 設 A=1,2,3…,a ,
– A=1 與 A=2 時, X 相同, 差異相同 (β2)

– A=2 與 A=3 時, X 相同, 差異相同 (β )
2

– A=3 與 A=1 時, X 相同,
Ŷ (2×β )
差異不同 2


• A 為名義變數,層次間無順序,層次間非等距差異不
適合的處理方法
2. 以虛擬變數 (dummy variable)
處理
• 當 A 有 a 層次,設 a-1 個虛擬變數
• 虛擬變數數值設定 0 、 1 。當 A 為某ㄧ層次,則
該虛擬變數便是 1 ,其他為 0 。
– 用 0 、 1 設定虛擬變數數值得到的結果最好解釋。
• 當所有虛擬變數數值為 0 時 參考組
(Reference group)
– 注意:人數太少,不適合做為參考組
2. 以虛擬變數處理… (2)
• 例: A=1,2,3…,a ,並以 A=1 為參考組
A A2 A3 … Aa
1 0 0 … 0
2 1 0 … 0
3 0 1 … 0



a 0 0 … 1
用虛擬變數名稱來記錄原來 A 的層次
A=2A2=1 其他 =0 ; A=3A3=1…
2. 以虛擬變數處理… (3)
• ANCOVA 模式變成:
Ŷ=β 0 +β1X+β 2A 2 +β 3A 3 +...+β a A a
• (1) 當 X=x , A=1 , A2=A3=…=Aa=0 :
Ŷ1 =β 0 +β1x
• (2) 當 X=x , A=2 , A2=1 , A3=…=Aa=0 :
Ŷ2 =β 0 +β1x+β 2
• (3) 當 X=x , A=3 , A3=1 , A2=A4=…=Aa=0 :


• Ŷ3 =β 0 +β1x+β 3
• (2)-(1)

• (3)-(1)
ˆˆ -Y =β
Y2 1 2

ˆˆ -Y =β
Y3 1 3
2. 以虛擬變數處理… (4)
• 即 A=1 與 A=2 , X 相同 Ŷ 差異相同 (β2)

• A=1 與 A=3 , X 相同
Ŷ 差異相同 (β3)

• 用虛擬變數處理符合 A 為名義變數,其 Ŷ間
的差異不相等
ANCOVA 的意義
• 用代數方式了解 ANCOVA 的意義
– 類別因子的編碼設定完成
– ANOCOVA 模式決定
• 即當 A=1 時, X 與 Y 為ㄧ直線關係
– 斜率 =β1 ,截距 =β0
• 即當 A=2 時, X 與 Y 為ㄧ直線關係
– 斜率 =β1 ,截距 =β0+ β2
• 即當 A=3 時, X 與 Y 為ㄧ直線關係
– 斜率 =β1 ,截距 =β0+ β3

ANCOVA 是 X 與 Y 的直線關係
– 不隨 A 層次改變 ( 平行,無交互作用 interaction)
– 當 X 相同,因子 A 不同層次與 A=1 之 差異為 β2 、 β3….


ANCOVA 圖形,暫設 A 因子只有 3 層次
• ANCOVA 重要目的依類別因子 A 內部同
層次之連續變數 Y 平均數是否有差異
– 1. 調整 ( 控制 )X 對 Y 的影響 ( 因線性關
係)
– 2. 對 β2 、 β3... 作顯著性檢定 ( 檢定與
0 有無差異 )
Basic Example
• DV: 不同教學法後的成績
• IV: 三種教學法 ( 演講法、編序法、啟發
法)
• Covariate (CV): 學生智力
– 因為智力因素是影響學習成績的共變量
• 共變量是幫助我們抽離部份的變異量,使
得組與組之間最後成績不受共變量影響。
ANCOVA

• ANCOVA is an extension of ANOVA in which main
effects and interactions of IVs on DV are assessed
after removal of the effects of one or more
covariates.

• A covariate is a source of extraneous variation; when
removed from the DV, it should reduce the magnitude
of the error term
2 major purposes of ANCOVA
1. To increase the sensitivity of the test by
reducing error
2. To adjust the means on the DV itself to
what it would be if all cases scored
identically on the CV(s)
To increase sensitivity of the test in
experimental design
• Most common use of ANCOVA
• ANCOVA increases the power of an F test by removing
unsystematic variance (noise) in the DV.

IV IV
ANOVA DV ANCOVA DV Covariate

Error Error
To adjust the means on the DVs in
non-experimental design

• In situations where subjects can’t be randomly
assigned:
– Differences between subjects on covariates are
removed so the remaining differences would be the
real effects of IV(s) on DV.
– Example: 新生兒奶粉比較,比較哪種奶粉好
• Possible covariates
– 新生兒體重
Example (continued)
• Main effects of IVs
• Interactions among IVs
– After partialling out covariates, and accounting for main
effects, is there an interaction among IVs.
• Planned comparisons, contrasts, and trend analysis
– Planned comparison: After partialling out covariate.
– Trend analysis only relevant for ordered (rather than
categorical) levels of IVs
Kinds of research questions (2)
• Effects of covariates
– The covariate can be construed either as a nuisance factor or as a
legitimate (continuous) IV in its own right:
• Strength of association
– If a main effect is found, how much variance is accounted for by
the IV(s)? Usually expressed as eta-squared, a ratio of effect
variance vs. total variance. You can ask this question about
interactions and covariates as well.
• Parameter estimates
– Assuming significant main effects or interactions, what are the
estimated means for each group (adjusted under the assumption of
equality between the sample and population means for the
covariate. Covariate also has an estimate in the form of a B/beta.
Assumptions of ANCOVA (1)
• Same as ANOVA*
– Normal distribution
– Homogeneity of variance in all of the cells of the design
– Absence of Outliers
– Random sampling (with replacement)
– Random assignment (for experiments)
• Special concerns: linearity among CV(s) and DV,
homogeneity of regression
– Absence of multicollinearity
• Covariates should not be highly correlated with each other.
• A CV with SMC (R2)> .50 (using other CVs as predictors) is
considered redundant and should be deleted.
Assumptions of ANCOVA (2)

Group 3 Group 3
Group 2 Group 2
DV (Y)

DV (Y)
Group 1

Group 1

Covariate (X) Covariate (X)
Homogeneity of regression Heterogeneity of regression
ANCOVA vs ANOVA
(data from Tabachnick, p.283, 287-289)
Source of Adjusted SS df MS F
Variance

ANCOVA Between
groups
366.202 2 183.101 6.13*

Within groups 149.439 5 29.888

*p < .05

Source of SS df MS F
Variance
Between 432.889 2 216.444 4.52
ANOVA groups
Within groups 287.333 6 47.889
ANCOVA – SPSS example
Same dataset from Tabachnick, p. 283

IV
Covariate
DV
Analyze  General Linear Model  Univariate
DV Click here for graphs
of adjusted means

IV(s)

Covariate

Click here for
adjusted means (N/A
here)
Put IVs here for
estimated means
Output (1) - ANCOVA
Between-Subjects Factors

N
Treatment-IV 1 3
2 3
3 3

IV = Groups
Treatment groups = 1 & 2
Control group = 3
Output (2) - ANCOVA
Adjusted SS

Tests of Between-Subjects Effects

Dependent Variable: Posttest-DV
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 570.784a 3 190.261 6.366 .037
Between groups Intercept 29.103 1 29.103 .974 .369
PRETEST 137.895 1 137.895 4.614 .084
GROUP 366.201 2 183.101 6.126 .045
Error 149.439 5 29.888
Total 82707.000 9
Within groups
Corrected Total 720.222 8
a. R Squared = .793 (Adjusted R Squared = .668)

F = 6.13*
*p < .05
ANOVA
Use the same data but ignore the effect of covariate

ANOVA

Posttest-DV
Sum of
Squares df Mean Square F Sig.
Between Groups 432.889 2 216.444 4.520 .063
Within Groups 287.333 6 47.889
Total 720.222 8

P value is greater than .05,
not significant
Output (3) - ANCOVA

Estimated marginal means
Treatment-IV

Dependent Variable: Posttest-DV
95% Confidence Interval
Treatment-IV Mean Std. Error Lower Bound Upper Bound
1 101.337 a 3.160 93.214 109.461
2 98.486a 3.204 90.250 106.722
3 86.510 a 3.181 78.334 94.686
a. Evaluated at covariates appeared in the model: Pretest-Covariate
= 86.11.
General Linear Model

To put it all together..
You may wonder...
• You may notice that..
– ANOVA, ANCOVA, and regression are very similar.
– Regression involves 2 or more continuous variables (1
IV and 1 DV)
– ANOVA has at least 1 categorical variable (IV) and
exactly 1 continuous variable (DV)
– ANCOVA deals with at least 1 categorical variable
(IV), at least 1 continuous variable (covariate: CV) and
a single continuous DV
– MANOVA and MANCOVA are similar except that they
allow for multiple, intercorrelated DVs
The tests are similar..
• Regression
– IV = level of computer skills
– DV = Self-reported attitude toward the system
• ANOVA
– IV = Grouping variable – High, Medium, Low
levels of computer skills
– DV = Self-reported attitude toward the system
Compare the outputs (1)
Compare the outputs (2)
• ANOVA
And the answer is..
• ANOVA (and its variation) and
regression are applications of the
general linear model (GLM).
Regression and ANOVA
• They both use the mathematical notation,
Yi = α + β 1X1i + β 2X2i + β 3X3i + ... + β pXpi + ε i

where Yi represents the score of subject i on the dependent variable
α represents the mean of the population when the value of X is zero
Xs are values on the independent variables (i.e., they provide information on group membership)
β s are effect parameters (regression coefficients) that indicate something regarding the
relationship between a particular independent variable and the dependent variable
ε represents error.

• Also, they share the same goal..
– To estimate the effect parameters, given information on the population
mean, group membership, and scores on the dependent.
Some GLM techniques we’ve covered
• Multiple regression
– All Xs continuous, Y continuous
• ANOVA
– all Xs discrete, Y continuous
• ANCOVA
– Some Xs continuous and some discrete, Y
continuous
Coding categorical variables for
regression analysis

• Group memberships will be coded and used as a
categorical IV in a regression analysis.
• Regression with categorical predictors and
ANOVA will yield identical overall results
– Though intermediate results and their interpretation
may differ due to the coding process.
• The coding schemes commonly used are:
– Dummy coding (1 & 0)
Regression versus ANOVA
• Regression with categorical IVs
– Regression provides a natural measure of effect size in
standard output (R2)
– More flexible in that both continuous and categorical variables
can be used in the analysis
– Facilitate the test of competing hypotheses by entering
independent variables in a sequential or hierarchical manner
– Unbalanced designs are not problematic as in ANOVA
– However, ANOVA is more computationally efficient (but we
have SPSS).
The End