You are on page 1of 10

Linear Regression 

What was done?


A multiple linear regression analysis was performed to examine the influence of the variables gender, group B,
group C, group A, group D, bachelor's degree, some college, master's degree, associate's degree, high school, lunch
and test preparation course on the variable Running Avg.

Model Summary
The regression model showed that the variables gender, group B, group C, group A, group D, bachelor's degree,
some college, master's degree, associate's degree, high school, lunch and test preparation course explained 24.23%
of the variance from the variable Running Avg. An ANOVA was used to test whether this value was significantly
different from zero. Using the present sample, it was found that the effect was significantly different from
zero, F=26.3, p = <.001, R2 = 0.24.

Regression coefficients
The following regression model is obtained:
Running Avg = 70.47 -3.72 · gender -5.4 · group B -4.54 · group C -6.93 · group A -1.8 · group D +7.08 · bachelor's
degree +3.61 · some college +8.63 · master's degree +4.54 · associate's degree -0.63 · high school -8.78 ·
lunch +7.64 · test preparation course

When all independent variables are zero, the value of the variable Running Avg is 70.47.
If the value of the variable gender changes by one unit, the value of the variable Running Avg changes by -3.72.
If the value of the variable group B changes by one unit, the value of the variable Running Avg changes by -5.4.
If the value of the variable group C changes by one unit, the value of the variable Running Avg changes by -4.54.
If the value of the variable group A changes by one unit, the value of the variable Running Avg changes by -6.93.
If the value of the variable group D changes by one unit, the value of the variable Running Avg changes by -1.8.
If the value of the variable bachelor's degree changes by one unit, the value of the variable Running Avg changes
by 7.08.
If the value of the variable some college changes by one unit, the value of the variable Running Avg changes by
3.61.
If the value of the variable master's degree changes by one unit, the value of the variable Running Avg changes by
8.63.
If the value of the variable associate's degree changes by one unit, the value of the variable Running Avg changes
by 4.54.
If the value of the variable high school changes by one unit, the value of the variable Running Avg changes by -0.63.
If the value of the variable lunch changes by one unit, the value of the variable Running Avg changes by -8.78.
If the value of the variable test preparation course changes by one unit, the value of the variable Running Avg
changes by 7.64.
Standardized regression coefficients
The standardized coefficients beta are independent of the measured variable and are always between -1 and 1. The
larger the amount of beta, the greater the contribution of the respective independent variable to explain the
dependent variable Running Avg . In this model, the variable lunch has the greatest influence on the variable
Running Avg.

p-value

The calculated regression coefficients refer to the sample used for the calculation of the regression analysis,
therefore it is of interest whether the individual coefficients only deviate from zero by chance or whether they also
deviate from zero in the population. To test this, the null hypothesis is made for each coefficient that it is equal to
zero in the population.

The standard error now indicates how much the respective coefficient will scatter on average when the regression
analysis is calculated for a further sample.

The test statistic t is then calculated from the standard error and the coefficient.

The p-value for the coefficient of gender is <.001. Thus, the p-value is smaller than the significance level of 0.05 and
the null hypothesis that the coefficient of gender is zero in the population is rejected . Thus, it is assumed that the
coefficient for the variable age in the population is different from zero.

The p-value for the coefficient of group B is <.001. Thus, the p-value is smaller than the significance level of 0.05
and the null hypothesis that the coefficient of group B is zero in the population is rejected . Thus, it is assumed that
the coefficient for the variable age in the population is different from zero.

The p-value for the coefficient of group C is <.001. Thus, the p-value is smaller than the significance level of 0.05
and the null hypothesis that the coefficient of group C is zero in the population is rejected . Thus, it is assumed that
the coefficient for the variable age in the population is different from zero.

The p-value for the coefficient of group A is <.001. Thus, the p-value is smaller than the significance level of 0.05
and the null hypothesis that the coefficient of group A is zero in the population is rejected . Thus, it is assumed that
the coefficient for the variable age in the population is different from zero.

The p-value for the coefficient of group D is .171. Thus, the p-value is greater than the significance level of 0.05 and
the null hypothesis that the coefficient of group D is zero in the population is maintained. Thus, it is assumed that
the coefficient for the variable age in the population is not different from zero.
The p-value for the coefficient of bachelor's degree is <.001. Thus, the p-value is smaller than the significance level
of 0.05 and the null hypothesis that the coefficient of bachelor's degree is zero in the population is rejected . Thus,
it is assumed that the coefficient for the variable age in the population is different from zero.

The p-value for the coefficient of some college is .004. Thus, the p-value is smaller than the significance level of
0.05 and the null hypothesis that the coefficient of some college is zero in the population is rejected . Thus, it is
assumed that the coefficient for the variable age in the population is different from zero.

The p-value for the coefficient of master's degree is <.001. Thus, the p-value is smaller than the significance level of
0.05 and the null hypothesis that the coefficient of master's degree is zero in the population is rejected . Thus, it is
assumed that the coefficient for the variable age in the population is different from zero.

The p-value for the coefficient of associate's degree is <.001. Thus, the p-value is smaller than the significance level
of 0.05 and the null hypothesis that the coefficient of associate's degree is zero in the population is rejected . Thus,
it is assumed that the coefficient for the variable age in the population is different from zero.

The p-value for the coefficient of high school is .627. Thus, the p-value is greater than the significance level of 0.05
and the null hypothesis that the coefficient of high school is zero in the population is maintained. Thus, it is
assumed that the coefficient for the variable age in the population is not different from zero.

The p-value for the coefficient of lunch is <.001. Thus, the p-value is smaller than the significance level of 0.05 and
the null hypothesis that the coefficient of lunch is zero in the population is rejected . Thus, it is assumed that the
coefficient for the variable age in the population is different from zero.

The p-value for the coefficient of test preparation course is <.001. Thus, the p-value is smaller than the significance
level of 0.05 and the null hypothesis that the coefficient of test preparation course is zero in the population is
rejected . Thus, it is assumed that the coefficient for the variable age in the population is different from zero.

Cohens f2
f2

gender 0.31
group B 0.31
group C 0.31
group A 0.3
group D 0.3
bachelor's degree 0.3
some college 0.3
master's degree 0.3
associate's degree 0.3
high school 0.3
lunch 0.3
test preparation course 0.3

Normality of errors

Tests for normal distribution of Residuum

    Statistics    p   


Kolmogorov-Smirnov    0.02    .607   
Kolmogorov-Smirnov (Lilliefors Corr.)    0.02    .182   
Shapiro-Wilk    0.99    <.001   
Anderson-Darling    0.97    .015   
Multicollinearity
Problematic if Tolerance < 0.10 or VIF > 10
Modell    Toleranz    VIF   
gender    0.99    1.01   
group B    0.52    1.94   
group C    0.44    2.25   
group A    0.66    1.52   
group D    0.47    2.15   
bachelor's degree    0.68    1.47   
some college    0.56    1.77   
master's degree    0.79    1.27   
associate's degree    0.57    1.77   
high school    0.59    1.71   
lunch    0.99    1.01   
test preparation course    0.98    1.02   

Heteroskedasticity
Fleiss Kappa Reliability analysis

Summary

An inter-rater reliability analysis was performed between the dependent samples


of gender, Running Avg, race/ethnicity, parental level of education, lunch and test
preparation course. For this purpose, the Fleiss Kappa was calculated, which is a
measure of the agreement between more than two dependent categorical samples.
The Fleiss Kappa showed that there was no agreement between
samples gender, Running Avg, race/ethnicity, parental level of education, lunch and test
preparation course with κ= -0.06.
Fleiss Kappa    Standard Error    lower 95% CI    upper 95% CI    p   
-0.06    0    -0.06    -0.06       
ANOVA

Analysis of variance
A one-factor analysis of variance has shown that there is a significant difference
between the categorical variable race/ethnicity and the variable Running Avg F =
9.1, p = <.001 Thus, with the available data, the null hypothesis is rejected.

Post hoc Test


The ANOVA showed that there was a significant difference. A Bonferroni Post hoc test
was used to compare the groups in pairs to find out which was significantly different.
The Bonferroni post-hoc test revealed that the pairwise group comparisons of group B -
group E, group C - group E, group A - group D and group A - group E have a p-value
less than 0.05 and thus, based on the available data, it can be assumed that these
groups are each significantly different pairwise.
    N    Mean    Std. Deviation   
group B    190    65.47    14.73   
group C    319    67.13    13.87   
group A    89    62.99    14.45   
group D    262    69.18    13.25   
group E    140    72.75    14.57   
Total    1,000    67.77    14.26   
Levene test of variance equality

F    df1    df2    p   


0.56    4    995    .688   

Anova

Mean Critical F-
    Sum of Squares    df    Squares    F    p    Value   
Between 7,163.33    4    1,790.83    9.1    <.001    2.38   
Groups   
Within 195,904.31    995    196.89   
Groups   
Total    203,067.64    999                   

Bonferroni Post-hoc-Tests
95% CI lower 95% CI upper
        Mean diff.    Std. Error    p    limit    limit   
group B     group C    -1.66    1.286    1    -5.4    2.07   
group B     group A    2.48    1.802    1    -2.76    7.71   
group B     group D    -3.71    1.337    .056    -7.59    0.17   
group B     group E    -7.28    1.563    <.001    -11.82    -2.74   
group C     group A    4.14    1.682    .14    -0.75    9.03   
group C     group D    -2.05    1.17    .804    -5.45    1.35   
group C     group E    -5.62    1.423    .001    -9.75    -1.49   
group A     group D    -6.19    1.722    .003    -11.19    -1.19   
group A     group E    -9.76    1.902    <.001    -15.28    -4.23   
group D     group E    -3.57    1.469    .152    -7.84    0.69   

Post-Hoc-Test Scheffe
Critical Scheffe value    9.52    

Variables    Average difference    F   


group B - group C    -1.66    1.67   
group B - group A    2.48    1.89   
group B - group D    -3.71    7.7   
group B - group E    -7.28    21.72   
group C - group A    4.14    6.06   
group C - group D    -2.05    3.06   
group C - group E    -5.62    15.61   
group A - group D    -6.19    12.92   
group A - group E    -9.76    26.32   
group D - group E    -3.57    5.91   

You might also like