You are on page 1of 2

Data analysis class 2022/2023

Exercise sheet 9

Exercise
The file « temperature.csv » contains data on monthly temperatures in 35 European
cities, as well as the annual temperature average, the range of temperatures, and the
region in which the city lies.

1) Open the file, attach it to the working memory of R and ask for its structure.
2) Give the correlation matrix of the quantitative variables (variables 3 to 18). From
which value upwards, the correlations are statistically significant? Which
variables are significantly correlated to the average temperature?
3) Create the binary variable “Origin” which has value one for the two first regions
and value two for regions 3 and 4.
4) Test if the variable “Mean” is normally distributed. Use the appropriate test to
check if the cities in the first origin group have a significantly different mean
temperature from those in the second origin group.
5) Perform a multiple regression to explain the annual mean temperature as a
function of the quantitative variables (variables 3 to 18). Exclude multicolinearity
problems (take out one by one all variables with variance inflation factors larger
than 4) and variables which do not have enough explanatory power if necessary.
Which is the final model and how much of the variance of the mean temperature
does it explain? Is that astonishing?
6) Compute the standardized residuals of the linear model and add them to the data
set. Check outliers graphically by plotting the standardized residuals. Add dotted
horizontal lines at -2 and 2 and add the names of the cities to the plot. Which cities
are outliers ?
7) Define the residuals of the final model and add it to the dataset. Test if the
residuals are correlated with the explanatory variables of the final model. What
does this imply for the hypotheses of the model?
8) Define a simple regression model to explain the mean temperature as a function of
October temperatures. Compare this model to an exponential one.
9) Draw a scatterplot with these 2 models in different colours.
10) Do a principal component analysis of the quantitative variables (variables 3 to 18).
How many components should we keep?
11) Give an interpretation of the 2 first principal components. Save the first two
factors as variables and add them to the data file. Which cities have extreme
values in the two components?

You might also like