24 views

Uploaded by mooderf

save

You are on page 1of 19

**STA303H5S - Winter 2014: Data Analysis II
**

LECTURE 2: One Way ANOVA and Linear Regression Ramya Thinniyam

January 9th, 2014

! ! ! ! ! ! ! ! ! ! ! ! !

**The Spock Conspiracy Trial
**

Q: Is there evidence of gender bias in the jury selection of Spock’s trial? A1: Last Class: Used a two-sample t-test to answer the question of interest. H0 : µspock = µother vs. Ha : µspock = µother t-test Method Pooled t-test (assuming equal variances) Satterthwaite Approximation Test Statistic 5.67 7.16 p-value < 0.0001 < 0.0001

Concluded that there is very strong evidence of a difference in the mean percentage of women on Spock’s judge’s venires and that of the other judges.

1 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

**Spock Conspiracy Trial
**

A2: Use a Linear Model approach / ANOVA

2 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

**Spock Conspiracy Trial
**

A2: Use a Linear Model approach / ANOVA Recall: A Multiple Linear Regression model Yi = β0 + β1 X1,i + β2 X2,i . . . + βp Xp,i + +ei ; for i = 1, 2, . . . , n

Yi : response for the i th case (quantitative variable) X1,i , X2,i , . . . , Xp,i : predictors for i th case (quantitative or categorical) ei : error term for the i th case, where ei iid ∼ N (0, σ 2 ) β0 , β1 , . . . , βp : regression coefﬁcients/parameters, β0 : intercept n : number of cases / sample size

<- P predictor

If we are interested in using a factor/categorical variable with levels, then we model with − 1 indicator/dummy variables. Choose one level as the default (has no indicator variable) and all the other levels do. Q: Why do we use − 1 indicator variables instead of ?

2 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

**Spock Conspiracy Trial
**

A2: Use a Linear Model approach / ANOVA Recall: A Multiple Linear Regression model Yi = β0 + β1 X1,i + β2 X2,i . . . + βp Xp,i + +ei ; for i = 1, 2, . . . , n

Yi : response for the i th case (quantitative variable) X1,i , X2,i , . . . , Xp,i : predictors for i th case (quantitative or categorical) ei : error term for the i th case, where ei iid ∼ N (0, σ 2 ) β0 , β1 , . . . , βp : regression coefﬁcients/parameters, β0 : intercept n : number of cases / sample size

If we are interested in using a factor/categorical variable with levels, then we model with − 1 indicator/dummy variables. Choose one level as the default (has no indicator variable) and all the other levels do. A:

1, 2 ,3 ...,or l -1, by default it Q: Why do we use − 1 indicator variables instead of ? belong In level l.

2 / 11

If case does not belong to level

! ! ! ! ! ! ! ! ! ! ! ! !

**Using Indicator Variables
**

Suppose a factor has levels, we can deﬁne indicator variables as follows. For k = 1, 2, . . . , − 1 Ik , i = 1, 0, if ith case belongs in factor level k otherwise

Then, in Spock example: Ispock,i = 1, 0, if ith venire has Spock’s judge otherwise

**Fit the model: Yi = β0 + β1 Ispock,i + ei for i = 1, 2, . . . , 46 where Yi = % women on ith venire → Simple Linear Regression Model
**

3 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

**Least Squares Estimates of Regression Parameters
**

ˆ0 ≡ b0 = y ¯ − b1 x ¯ β ˆ1 ≡ b1 = SSXY /SSXX = β

n ¯ i =1 (xi − x )(yi − n ¯ 2 i =1 (xi − x )

¯) y

=

n ¯¯ i =1 xi yi − nx y n 2 ¯2 i = 1 xi − n x

Q: In Spock example, what are the following quantities? xi = n i = 1 xi = ¯= x n 2 i = 1 xi = n i =1 xi yi =

4 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

**Least Squares Estimates of Regression Parameters
**

ˆ0 ≡ b0 = y ¯ − b1 x ¯ β ˆ1 ≡ b1 = SSXY /SSXX = β

n ¯ i =1 (xi − x )(yi − n ¯ 2 i =1 (xi − x )

¯) y

=

n ¯¯ i =1 xi yi − nx y n 2 ¯2 i = 1 xi − n x

**Q: In Spock example, what are the following quantities? xi = n i = 1 xi = ¯= x n 2 i = 1 xi = n i =1 xi yi = A:
**

4 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Parameter Interpretation

For the model Yi = β0 + β1 Ispock,i + ei : E (Yi ) = β0 + β1 , β0 , if ith venire has Spock’s judge if ith venire has another judge

So, β0 is the mean % of women in other judge’s venires β1 is the difference in the mean % of women (response) between Spock’s and other judge’s venires β1 = 0: no difference between mean % women in Spock’s and other judges β1 > 0 : mean % women is higher for Spock’s than other judges β1 < 0: % women is lower for Spock’s than other judges

Caution: If the factor has more levels, interpretation is slightly different: expectations are relative to the default factor level. Write out the model using indicators and take expectations to correctly interpret the parameters.

5 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

**Regression Parameter Estimates
**

The parameter estimates in Spock’s example simplify to: ¯spock − y ¯other b1 = y

6 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

**Regression Parameter Estimates
**

The parameter estimates in Spock’s example simplify to: ¯spock − y ¯other b1 = y Proof:

6 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

**Regression Parameter Estimates
**

The parameter estimates in Spock’s example simplify to: ¯spock − y ¯other b1 = y Proof:

¯other . Homework Exercise: Show b0 = y

6 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

**Testing using a Linear Regression Model
**

H0 : β1 = β10 vs Ha : β1 = β10 t= b1 − β10 ∼ tn−2 under H0 se(b1 )

**Assuming the following hold: Correct form of the model Gauss-Markov Conditions:
**

1. E (ei ) = 0 2. Var (ei ) = σ 2 (constant) 3. E (ei ej ) = 0 for i = j (uncorrelated errors)

ei are Normal Testing if the means differ is equivalent to testing if the β1 parameter is signiﬁcant in the regression.

7 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Connection to ANOVA

When β10 = 0 (like in Spock example), using the linear model is the same as One-Way Analysis of Variance (ANOVA): 1 factor - testing if the means of the groups are different. In general, it can be extended to multiple factors and factors with more than two levels: testing if all the factor level means are equal or if any of them differ. We will discuss ANOVA next class and use it to answer the questions of interest in Spock Conspiracy case study:

8 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Connection to ANOVA

When β10 = 0 (like in Spock example), using the linear model is the same as One-Way Analysis of Variance (ANOVA): 1 factor - testing if the means of the groups are different. In general, it can be extended to multiple factors and factors with more than two levels: testing if all the factor level means are equal or if any of them differ. We will discuss ANOVA next class and use it to answer the questions of interest in Spock Conspiracy case study:

Question of Interest 1: Is there evidence of difference in mean percent of women on Spock’s judge’s venires when compared to other judges? → One-Way ANOVA with 2 factor levels (Spock and other) Question of Interest 2: Is there evidence that there are differences in womens representation in venires of the other 6 judges? → One-Way ANOVA with 6 factor levels (A,B,C,D,E,F)

8 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

**Spock Linear Model in ‘R’
**

> I_spock=rep(0,46) > I_spock [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [23] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > for(i in 1:length(judge1)) { if (judge1[i]=="SPOCK"){ I_spock[i]=1 } } > I_spock [1] 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [23] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

9 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

> spock_linearreg = lm(percentwomen ˜ I_spock) > summary(spock_linearreg) Call: lm(formula = percentwomen ˜ I_spock) Residuals: Min 1Q -12.9919 -4.6669

Median 0.2581

3Q 3.7854

Max 19.4081

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 29.492 1.160 25.42 < 2e-16 *** I_spock -14.870 2.623 -5.67 1.03e-06 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 7.056 on 44 degrees of freedom Multiple R-squared: 0.4222, Adjusted R-squared: 0.409 F-statistic: 32.15 on 1 and 44 DF, p-value: 1.03e-06

10 / 11

! ! ! ! !

**Example: Spock Conspiracy
**

Q: Answer the ﬁrst question of interest using a linear model approach. Include all the necessary elements, assumptions, and make a conclusion. A:

11 / 11

- anova (Real).pdfUploaded byRille Lu
- ANOVA Computer OutputUploaded byBorad M. Barkachary
- regressionUploaded byateeb1
- ANOVAUploaded byRowel Centeno
- Panel Data 1Uploaded byprernaorabhakar
- Final Bqt ReportUploaded by09108082
- W3 Sample Questionnaire and Dummy Tables.pdfUploaded byLalitha Pramaswaran
- A Study on the Workplace Spirituality ClimateUploaded byKarthika Sundharam
- AnovaUploaded byarvindekar6687
- anova.docUploaded byAbdul Moid
- Practice+Problems2_4031_F14Uploaded bycthunder_1
- Correlation and Regression With RUploaded byst
- Research PapersUploaded byAli Zulfiqar
- S3_Day1_Paper5.PDFUploaded byJFF
- Quality of Higher Education in Public and Private Universities in Bangladesh_SubmissionUploaded byKhondaker Sazzadul Karim
- 1.IJBRAUG20181Uploaded byTJPRC Publications
- TEACHERS PERFORMANCE MANAGEMENT SYSTEM AT ISOMORPHIC HIGHER EDUCATIONAL INSTITUTIONS.Uploaded byIJAR Journal
- 6 Basic Statistical ToolsUploaded byshuchikhandu
- Analysis of Grooming Behavior and Its Utility in Studying...Smolinsky2009Uploaded byBerenice Romero
- Factors Affecting Feding Rate , Reproduction and Growth of an Oligochaete Lumbriculus VariegatusUploaded byjoaogarces
- 5-Factorial-Expts.pdfUploaded bygigi