You are on page 1of 2

Data analysis class 2022/2023

Exercise sheet 13

Exercise
The file « USJudgeRatings», contained in the memory of R (you do not need to open
it!) gives Lawyers' ratings of state judges in the US Superior Courts. The ratings are
on a scale from 0 to 10.
The 12 variables in the file are the following:
CONT Number of contacts of lawyer with judge.
INTG Judicial integrity.
DMNR Demeanor.
DILG Diligence.
CFMG Case flow managing.
DECI Prompt decisions.
PREP Preparation for trial.
FAMI Familiarity with law.
ORAL Sound oral rulings.
WRIT Sound written rulings.
PHYS Physical ability.
RTEN Worthy of retention.

1) Rename the file “USJudgeRatings” to “USJR”


2) Attach the file to the working memory of R and ask for its structure.
3) Give the correlation matrix of the variables. From which value upwards, the
correlations are statistically significant? Which variables are significantly
correlated to the judicial integrity?
4) Test if the variables “DILG” and “PREP” are normally distributed. Use the
appropriate test to check if these two variables have a significantly different mean.
5) Perform a multiple regression to explain the judicial integrity as a function of the
other variables. Exclude multicollinearity problems (take out one by one all
variables with variance inflation factors larger than 5). How many explaining
variables are we allowed to keep for the multiple regression model?
6) Exclude variables which do not have enough explanatory power if necessary.
Which is the final model and how much of the variance does it explain?
7) Define the subset of outliers to the final model. How many are there?
8) Define a linear regression model to explain INTG as a function of DMNR, then a
parabolic and an exponential model. Which of the models is the best one?
9) Draw a scatterplot with these three regression models in different colours.
10) Do a principal component analysis of the variables. How many components
should we keep?
11) Give an interpretation of the 2 first principal components. Save the scores of the
first two factors and add them to the data file. Which judges have extreme values
in the two components?
12) Save the final version of the data file as excel file under the name “exam.csv”.

You might also like