You are on page 1of 2

Data analysis class 2022/2023

Exercise sheet 10

Exercise

The file « crime2.csv » contains data on crimes committed in 92 US cities. The


variables in the file are the following:

pop population
crimes total number crimes
unem unemployment rate
officers number police officers
pcinc per capita income
west = 1 if city in west
nrtheast = 1 if city in NE
south = 1 if city in south
year 82 or 87
area land area in square miles
popden people per sq mile
crmrte crimes per 1000 people
offarea officers per sq mile
polpc police per 1000 people

1) Open the file, attach it to the working memory of R and ask for its structure.
2) Give the correlation matrix of the data set. From which value upwards, the
correlations are statistically significant? Which variables are significantly
correlated to the number of crimes?
3) Delete the variable “crmrte” from the dataset (and do not use it anymore for the
rest of the exercise).
4) Test if the variable “crimes” is normally distributed. Use the appropriate test to
check if the number of crimes in 1987 is significantly different from 1982.
5) Perform a multiple regression to explain the number of crimes as a function of the
other variables. Exclude multicollinearity problems (keep only variables with an
inflation factor smaller than 4). Which variables do you exclude and why does that
make sense?
6) Exclude variables which do not have enough explanatory power. Which is the
final model and how much of the variance of the number of crimes does it
explain? If we consider 2 identical cities, one of which is in the south, what is the
difference of the number of crimes between them?
7) Compute the standardized residuals of the linear model and add them to the data
set. Define the subset of all outliers. How many outliers are there?
8) Do a principal component analysis of all original variables (except crmrte). How
many components should we keep?
9) Give an interpretation of the kept principal components. Save the components as
variables and add them to the data file.
10) Save the final version of the data file as excel file under the name “exam.csv”.

You might also like