Patient Satisfaction: A hospital admin. wants to study the relationship between patient statisfaction(Y) and Patients Age(X1 in years), severity of illness(X2, an index) and anxiety level (X3, an index). The admin. randomly selected 46 patients and collected the data presented below where larger values of Y, X2, X3 are associated with more patient satisfaction, increased severity of illness, and more anxiety.
a) Prepare a histogram for each of the predictor variables. Are there any noteworthy features revealed by these plots?
This is a histogram of patient age (X1). In this plot we see that there is some uniformity among the ages of the patients except for people who are 20-25. There are a similar number of people who between the ages of 25-40 as there are between ages of 40-55.
This is a histogram of the severity of illness (X2). In this histogram we can definitely see most individuals had an illness severity between 45-55 and then there is a sharp decline on both sides.
This is a histogram that shows the frequency distribution of anxiety levels (X3). We see that the majority of the patients had an anxiety level between 1.8-2.4 where it then declined. We also see that the biggest group of patients had am anxiety level between 2.2-2.4.
b) Obtain the scatter plot matrix. Interpret these and state your principal findings
This is a multilevel scatterplot. Looking at the bottom three plots, we see that age and severity look like to have a positive linear relationship. That is as age increases the severity of the illness increases. Also there seems to be good positive linear relationship between age and anxiety levels. That is as age increases the anxiety levels tend to increase. Lastly we take a look at anxiety level vs severity and there appears to be a good positive linear association between the two variables. Basically as anxiety level increases so does the severity level of the illness.
c) Fit the regression model (6.5) for the three predictor variables to the data and state the
estimated regression function. How is b2 interpreted?
Call: |
|||
lm(formula = y_i ~ x_i1 + x_i2 + x_i3) |
|||
Coefficients: |
|||
(Intercept) |
x_i1 |
x_i2 |
x_i3 |
158.491 |
-1.142 |
-0.442 |
-13.470 |
Basically this equation is Y_hat= 158.491-1.142X _{i}_{1} -0.442X _{i}_{2} -13.470X _{i}_{3}_{.} In other words Y_hat=158.491-1.142*Patient’s age-0.442*illness severity-13.470*anxiety level.
We estimate that the average patient satisfaction decreases by -0.442 as the severity of the
illness increases by one unit when patient age
and anxiety level
are held constant.
d) Obtain the residuals and prepare a boxplot of the residuals. Do there appear to be any outliers?
There appears to be outliers as we can see that are residuals toward 15 and at the other extreme which is towards -15. The mean is always zero since the sum of all the residuals add up to zero.
e) Plot the residuals against Y_hat, each of the predictor variables. Also prepare a normal
probability plot. Interpret your plots and summarize your findings
Plots of Residuals against each predictor variable
Normal Probability Plot
Comments: For each of the plots that have the residuals vs. a predictor variable, we see that the residuals are nicely spread out around the line y=0. However in each of the plots I do see some outliers which may need to be examined further. Also for the normal probability plot we see that the residuals fit moderately well around the line which indicates that there could be some deviation from normality.
6.16- Refer to Patient Satisfaction problem 6.15. Assume that the regression model 6.5 for three predictor variables with independent normal errors is appropriate.
a) Test whether there is a regression relation: Use alpha=.05 State the alternatives, decision rule, and conclusion. What does your test imply about B1,B2, and B3? What is the p-value of the test?
Alternatives: Our null hypothesis is H0: B_1=B_2=B_3 and the alternative hypothesis is at least one B_k is nonzero for 1<=k<=3 Decision Rule: The test statistic value is F=MSR/MSE=(8275.4+480.9+364.2)/3/101.2=30.04 and the critical value is F(.95,3,42)=2.219059 Conclusion: Since F=30.04> 2.219059 we have enough statistical evidence to reject the null hypothesis. Hence we conclude that at least one parameter is nonzero.
The p-value of this test is 0.4878 which is much larger than our alpha value of .05.
b) Obtain joint interval estimates of B1, B2, and B3 using a 99 percent family confidence
coefficient. Interpret your results.
We are 99% confident that B1 is between -1.721626 and -.5623744, B2 is between -1.7696 and .885599,
and B3 is between -32.6887 and 5.68868. (Small round off errors due to R).
Problem 6.17
Refer to patient satisfaction problem 6.15. Assume that regression model 6.5 for three predictor variables with independent normal error terms is appropriate.
a) Obtain an interval estimate of the mean satisfaction when X_h1=35, X_h2=45, and X_h3=2.2.
Use a 90% confidence coefficient. Interpret your confidence interval.
We are 90 percent confident that when x_h1=35, xh2=45,xh3=2.2 the mean response is between 64.53663 and 73.45737 (slight error due to multiplying several matrices).
b) Obtain a prediction interval for a new patients satisfaction when X_h1=35, X_h2=45, and
X_h3=2.2. Use a 90% confidence coefficient. Interpret your confidence interval.
With confidence coefficient .90 we predict that a new patient when xh1=35, xh2=35, xh3=2.2 will have a satisfaction level between 51.50965 and 86.51092
R Code
#Problem 6.15
data=read.table("C:/Users/Hellangel31/Desktop/CH06PR15.txt")
names(data)=c("Patient Satisfaction","Age","Severity","Anxiety") data data$"Patient Satisfaction" #Part A hist(data$"Age",xlab="Patient Age",main="Histogram of Patient Age") hist(data$"Severity", xlab="Severity of Illness", main="Histogram of Illness Severity") hist(data$"Anxiety",xlab="Anxiety Level",main="Histogram of Anxiety Levels")
#Part B pairs(data[c("Age","Severity","Anxiety")])
#Part C y_i=data$"Patient Satisfaction"
x_i1=data$"Age"
x_i2=data$"Severity"
x_i3=data$"Anxiety"
model=lm(y_i~x_i1+x_i2+x_i3)
model #Part D model$residuals boxplot(model$residuals,main="Residuals")
#Part E #Plot of residuals agains Y bar plot(model$fitted.values,model$residuals,xlab="Fitted Values",ylab="Residuals") #Plot of residuals against each predictor variable plot(x_i1,model$residuals,xlab="Age of Patient",ylab="Residuals") plot(x_i2,model$residuals,xlab="Severity of Illness",ylab="Residuals") plot(x_i3,model$residuals,xlab="Anxiety Level",ylab="Residuals")
abline(h=0)
#QQ Plot qqnorm(model$residuals) qqline(model$residuals) anova(model)
#Problem 6.16b x=cbind(rep(1,46),data$"Age", data$"Severity", data$"Anxiety") var_b= 101.2*solve(t(x) %*% x) var_b s_b=sqrt(var_b) #s_b1=0.21483, s_b2=0.492056, s_b3=7.100963
#Confidence interval for B1 lower1= -1.142-qt(.995,42)*0.21483
upper1=-1.142+qt(.995,42)*0.21483
#Confidence interval for B2
lower2=-.442-qt(.995,42)*0.492056
upper2=-.442+qt(.995,42)*0.492056
#Confidence Interval for B3
lower3=-13.470-qt(.995,42)*7.100963
upper3=-13.470+qt(.995,42)*7.100963
#Problem 6.16C anova(model) summary(model)
R=sqrt(9120.5/13369.3)
#problem 6.17A
x_h=cbind(c(1,35,45,2.2))
t(x_h)
b=cbind(c(158.491,-1.142,-.442,-13.470))
y_h=t(x_h) %*% b s_y_h=sqrt(100.2*t(x_h) %*% solve(t(x) %*% x) %*%x_h)
#Lower Limit
y_h-qt(1-.10/2,42)*s_y_h
#Upper Limit
y_h+qt(1-.10/2,42)*s_y_h
# Problem 6.17b
predict(model,newdata=data.frame(x_i1=35,x_i2=45,x_i3=2.2),interval="prediction",level=.90)