You are on page 1of 4

JTMS-03 Applied Statistics with R

Spring Semester 2023

Lab 11 – Partial correlation and multiple linear regression – Solution


April 25, 2023

You work at the Federal Ministry of Labor and Social Affairs, and have to devise evidence-based policy
recommendations for improving social cohesion in Germany with a focus on migration. Therefore, you draw
on the data of the 2017 Social Cohesion Radar. The study developed an index of social cohesion for the
so-called planning regions of Germany (Raumordnungsregionen1) and related it to various regional
characteristics. Your analyses employing bivariate correlations point to a positive, moderately strong and
statistically significant association between the share of migrants and the level of cohesion, namely –
regions with a greater share of migrants are more cohesive (see Lab 09). In a team discussion, your
colleagues advise you to conduct additional analyses in order to make sure that this is not a spurious
relationship. In particular, they suggest controlling for the level of economic affluence of the regions in terms
of their per capita GDP.

Data SCR2017_ROR.sav
Source Bertelsmann Stiftung (2017)2
Variables (only the relevant ones)
region Planning region
abscoh Degree of social cohesion in region (0= very weak to 100= very strong)
sharemgrnt Share of migrants (of the total population) in the region
gdpEUR Gross domestic product per capita of region (in thousand EUR)

Reading the data in R

setwd("Type/your/directory/here")
library(foreign)

data.lab11 <- read.spss("SCR2017_ROR.sav", header= TRUE, to.data.frame= TRUE,


use.value.labels= FALSE, use.missings= TRUE)
attach(data.lab11)

Tasks
1. Data preparation: Previous analyses on the same data (Lab 10) showed that the distribution of GDP
deviates from normality (pronounced positive skew and leptokurtosis). Transform the variable by taking
the natural logarithm of the raw values. Describe the distribution of the transformed values.

lngdp <- log(gdpEUR)


data.lab11 <- data.frame(data.lab11, lngdp)

Transforming the raw values of per capita GDP by taking their natural logarithm alleviates the detected
deviations from a normal distribution. The tendencies towards positive skewness (g1 = 0.80) and

1
Germany has 96 planning regions. However, in order to keep the total sample size within feasible limits (over 5,000
respondents), the study put some neighboring planning regions with similar socio-demographic characteristics
together, thereby arriving at 79 ‘homogenized’ regions.
2
https://www.bertelsmann-stiftung.de/en/publications/publication/did/sozialer-zusammenhalt-in-deutschland-2017

1
leptokurtosis (g2 = 0.67) are still present, but contained within acceptable limits. In so far, it can be assumed
that the transformed values adhere to normality.

library(psych)
describe(lngdp)
## vars n mean sd median min max range skew kurtosis se
## X1 1 79 3.47 0.23 3.46 3.06 4.22 1.16 0.8 0.67 0.03

2. Use the method of linear regression in order to control for the affluence of the regions in the relationship
between the share of migrants and cohesion. Assess the significance of the estimates at the 5 % level.
a) Start with a model on the bivariate relationship between the share of migrants and social cohesion.
Report and interpret the unstandardized and standardized regression coefficients.

m1 <- lm(abscoh ~ sharemgrnt)


library(lm.beta)
summary(lm.beta(m1))
##
## Call:
## lm(formula = abscoh ~ sharemgrnt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.6511 -1.7760 0.1509 1.8164 4.3950
##
## Coefficients:
## Estimate Standardized Std. Error t value Pr(>|t|)
## (Intercept) 59.77525 0.00000 0.53205 112.350 < 2e-16 ***
## sharemgrnt 0.18530 0.36676 0.05356 3.459 0.000886 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.213 on 77 degrees of freedom
## Multiple R-squared: 0.1345, Adjusted R-squared: 0.1233
## F-statistic: 11.97 on 1 and 77 DF, p-value: 0.0008857

According to the first model, which considers only the effect of the share of migrants, the relationship of the
latter with the regional level of cohesion is positive and highly significant: b = 0.19, p < 0.01. An additional
percent of migrants in a region is related to an increase of about 0.19 points in its level of cohesion. In other
words, regions with a greater share of migrants are significantly more cohesive.

According to the standardized estimate (β = 0.37), a one standard deviation increase in the share of
migrants is related to an increase of 0.37 standard deviations in the level of social cohesion. The size of
the standardized regression coefficient, thus, points to a moderate relationship.

b) In a second step, specify a model that controls for the effect of the economic affluence of the
regions. Report and interpret the unstandardized and standardized regression coefficients.

According to the second model, which controls for the economic affluence of the regions, the effect of the
latter is positive and significant at the 5 % level in a two-sided test: b = 3.86, p = 0.04. Holding the share of

2
migrants constant, an additional unit of GDP3, as transformed using the natural logarithm of the raw values,
boosts social cohesion by about 3.86 points. The standardized coefficient (β = 0.38) informs that a one
standard deviation increase in GDP is associated with an increase in social cohesion of about 0.38 standard
deviations.

As to the effect of the share of migrants, once economic affluence has been taken into account, the
relationship between the share of migrants in the regions and their level of social cohesion loses its
statistical significance (p = 0.78). Moreover, the relationship weakens considerably: At the average level of
economic affluence, an additional percent of migrants in a region is now associated with an increase in the
level of cohesion of only about 0.03 points (b = 0.03). The size of the standardized regression coefficient
(β = 0.05) makes it even clearer that the relationship has basically weakened to zero due to the inclusion
of GDP.

m2 <- lm(abscoh ~ sharemgrnt + lngdp)


summary(lm.beta(m2))
##
## Call:
## lm(formula = abscoh ~ sharemgrnt + lngdp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5482 -1.5091 -0.2397 1.5023 4.0397
##
## Coefficients:
## Estimate Standardized Std. Error t value Pr(>|t|)
## (Intercept) 47.77017 0.00000 5.78572 8.257 3.55e-12 ***
## sharemgrnt 0.02644 0.05233 0.09254 0.286 0.7759
## lngdp 3.86436 0.38161 1.85482 2.083 0.0406 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.166 on 76 degrees of freedom
## Multiple R-squared: 0.1813, Adjusted R-squared: 0.1597
## F-statistic: 8.414 on 2 and 76 DF, p-value: 0.0005004

c) Compare the two models with an appropriate test. Report the evidence regarding the amount of
explained variance by each model and its change between the two models.

Model 1 explains about 13.45 % (R2M1 = 0.1345) of the variation in social cohesion. The overall ANOVA
test of model fit renders this amount statistically significant: F(1,77) = 11.97, p < 0.01. Taken together, the
share of migrants and GDP (Model 2) explain about 18.13 % (R2M2 = 0.1813) of the differences in social
cohesion across the regions. According to the respective overall ANOVA test of model fit, this amount of
explained variance is also statistically significant: F(2,76) = 8.41, p < 0.01. As such, the inclusion of GDP
adds 4.68 % (R2M2 – R2M1) to the explanation of the differences in social cohesion above and beyond the

3
The interpretation of the unstandardized regression coefficient of GDP is technically correct. However, its
substantive interpretation is not straightforward due to the fact that a natural logarithmic transformation is, effectively,
a non-linear approach. To exemplify the issue, consider the observed range of the transformed GDP values. It goes
from 3.06 to 4.22. A one-unit increase from, e.g., 3.06 to 4.06 corresponds to about 36.647 thousand Euros, i.e.
exp(4.06) – exp(3.06), but a one-unit increase from, e.g., 3.22 to 4.22 corresponds to about 43.005 thousand Euros,
i.e. exp(4.22) – exp(3.22).

3
effect of the share of migrants. This change in R2 is statistically significant at the 5% level: F(1,76) = 4.34,
p = 0.04.

anova(m1, m2)
## Analysis of Variance Table
##
## Model 1: abscoh ~ sharemgrnt
## Model 2: abscoh ~ sharemgrnt + lngdp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 77 376.99
## 2 76 356.62 1 20.368 4.3406 0.04058 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

d) What is the term for your analytical strategy thus far?

The analytical strategy applied here is hierarchical regression.

e) Investigate, report and interpret the bivariate correlations among the three variables involved in
your analyses.

cor(data.frame(abscoh, sharemgrnt, lngdp))


## abscoh sharemgrnt lngdp
## abscoh 1.0000000 0.3667606 0.4247281
## sharemgrnt 0.3667606 1.0000000 0.8239544
## lngdp 0.4247281 0.8239544 1.0000000

Social cohesion exhibits positive and moderate correlations with the share of migrants (r = 0.37) and with
economic affluence (r = 0.42). This suggests that the more cohesive regions have a greater share of
migrants and are economically more affluent. As to the correlation between the share of migrants and
economic affluence, it is also positive, suggesting that a greater share of migrants in a region is associated
with a higher level of economic affluence. The strength of this relationship (r = 0.82) points to a very close
correspondence between the share of migrants and the economic affluence of the regions.

f) Conclude with a technical and a substantive explanation for the relationship between the share of
migrants and the degree of social cohesion in the German regions.

The technical explanation for the disappearance of the relationship between the share of migrants and
social cohesion, once GDP has been accounted for, involves the very strong correlation between the share
of migrants and GDP. This is a sign of collinearity between the two predictors in the regression model. On
substantive grounds, the evidence leads to the conclusion that the relationship between the share of
migrants and social cohesion is spurious. It is driven by the economic affluence of the regions: richer regions
attract more migrants and are, at the same time, more cohesive.

You might also like