141 views

Uploaded by Shruti Pandey

Assignment for Data Analytics using R

save

- The Dutch Flower Cluster
- Mission Hospital Case Study
- Fiji Water and Corporate Social Responsibility
- Agri Assignment Final
- Mission Hospital Case Solution_Sec A
- IMC Case Analysis Gillette Dry Idea (A)
- Deutsche Bank and the Road to Basel III
- Marico Supply Chain
- Assignment 2.1
- Wonderla (1)
- Vanraj Mini Tractor-Case Study(2)
- Assignment3 Group
- Group 8 Gillette Dry Idea
- ChemGrow Sales and Distribution
- Saffola Cooking Oil- The Repositioning Journey
- Describe a Situation Where You Achieved Something Significant After Overcoming Obstacles
- Data Mining Tutorial
- Chapter 04 Answers
- Teuer Furniture B Case solution
- Case Study on Scientific Glass Inc,: Inventory Management
- 8399320
- ARMA Eviews Prev
- Toys R Us Case Analysis
- 5017a123
- Strategy Map of Domestic Auto Parts
- Managerial Economics (Chapter 4)
- Impact Of Corruption on Economic Growth
- A&D High Tech _ Case study
- Blood Diamond - Ethical Analysis
- Biplots 1
- Mountain Dew
- ceat case
- Gillette Dry Idea
- HIDESIGNCaseSubmission_Group1_SectionB
- Mess Bill 19 April 2016
- Gillette Case Submission Group 1 Section B
- Arbit Presentation
- Land Acquisition India
- Chapter 2 AM
- New Laptop
- List PGP 18-19
- New Text Document.txt

You are on page 1of 9

**Develop a suitable simple linear regression model to check if
**

there is any relationship between “Total Cost to Hospital” and

“AGE”. For the fitted model, interpret the regression coefficient

corresponding to “AGE”.

> library("ISLR", lib.loc="~/R/win-library/3.3")

> d<-read.csv('E:/KOZHI official/4. Term 4/DA-R/Assignment 2 mission

hospital/Mission_2.csv',header=T)

> names(d)

[1] "SL."

"AGE"

"GENDER"

[4] "MALE"

"MARITAL.STATUS"

"UNMARRIED"

[7] "KEY.COMPLAINTS..CODE"

"ACHD"

"CAD.DVD"

[10] "CAD.SVD"

"CAD.TVD"

"CAD.VSD"

.

.

.

> attach(d)

> mod_1<-lm(TOTAL.COST.TO.HOSPITAL~AGE)

> summary(mod_1)

Call:

lm(formula = TOTAL.COST.TO.HOSPITAL ~ AGE)

Residuals:

Min

1Q Median

3Q

Max

-232683 -61888 -19440 28238 600773

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 141216.6 10610.7 13.309 < 2e-16 ***

AGE

1991.2

273.8 7.273 4.67e-12 ***

--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 111400 on 246 degrees of freedom

Multiple R-squared: 0.177, Adjusted R-squared: 0.1736

F-statistic: 52.9 on 1 and 246 DF, p-value: 4.672e-12

> plot(mod_1,which=c(1,2))

HOSPITAL) ~ AGE) Residuals: Min 1Q Median 3Q Max -1. p-value: 4. Log model > mod_2<-lm(log(TOTAL. This shows that the variances are not same. which=c(1.00536 0.01 ‘*’ 0.1894 F-statistic: 58.1927. Error t value Pr(>|t|) (Intercept) 11.TO. and the pattern that was first visible in the previous graph is not there.814724 0.001118 7.1 ‘ ’ 1 Residual standard error: 0.TO.51748 -0.001 ‘**’ 0. Also the normal QQ Plot shows a better the fit of normality than the previous plot.455 on 246 degrees of freedom Multiple R-squared: 0.The graphs above show that the assumptions of normality and homoscedasticity is not being followed.25388 1. Hence the underlying assumptions for a linear relationship are not satisfied.043326 272. So we try the log linear model. codes: 0 ‘***’ 0. the values are clustered with lower fitted values and far apart with higher fitted values. as in the residual vs fitted graph we can see a pattern. The beta 1 shows that one unit change in age will change the total cost to hospital by a factor of Rs. Adjusted R-squared: 0.212e-13 > plot(mod_2.693 < 2e-16 *** AGE 0. Similarly the Normal QQ Plot shows that the plot of the values deviate from the normal line.05 ‘.24402 -0. 1. they depend on the covariance of fitted values.7 on 1 and 246 DF.008565 0.COST.HOSPITAL)~AGE) > summary(mod_2) Call: lm(formula = log(TOTAL.’ 0.21e-13 *** --Signif.662 4.39912 Coefficients: Estimate Std.2)) We see in the residual vs fitted graph that it shows random variances.COST.0086 .

Suppose Mission Hospital is planning to introduce a package price for the treatment and has decided to charge INR 250. At the time of admission. Interpret the results.34373 13.455) [1] 0.data.COST.HOSPITAL~GENDER) > plot(mod_2.3411574 34% is the probability that the treatment cost exceeds package price. which=c(1.2.TO. What is the probability that the treatment cost will exceed the package price? Do you think that the Mission Hospital should revise the package price? Residual standard error acts as proxy for sigma square Residual standard error: 0.000 for patients of age 50 years.COST.HOSPITAL)~GENDER) > plot(mod_4.2)) > mod_4<-lm(log(TOTAL.14223 > exp(p[2]) [1] 84434. 4.TO. > mod_3<-lm(TOTAL. The hospital should not revise the package price as it is greater than the mean.24298 11.2)) . what will be the minimum cost of treatment for this patient at 95% confidence level? > p<-predict(mod_2. suppose a patient’s age is 50 years. Build a simple linear regression model between “Total Cost to Hospital” and “GENDER”.frame(AGE=50). Based on the fitted model in (1). which=c(1.interval="prediction") >p fit lwr upr 1 12.41 3.455 on 246 degrees of freedom > 1-pnorm((log(250000)-p[1])/.

Error t value Pr(>|t|) (Intercept) 11.837 0.004934 > contrasts(GENDER) M F0 M1 > exp(0. Adjusted R-squared: 0.28273 -0.4983 on 246 degrees of freedom Multiple R-squared: 0.001 ‘**’ 0. The dummy variable formed is GENDERM.01 ‘*’ 0.’ 0.210242 Gender being a qualitative variable becomes a dummy variable here in the regression model.TO.COST.02774 F-statistic: 8.19082) [1] 1.57082 Coefficients: Estimate Std.> summary(mod_4) Call: lm(formula = log(TOTAL.1 ‘ ’ 1 Residual standard error: 0.06726 2. . 5.93436 0.05 ‘.00493 ** --Signif.210242 and p value shows it to be significant.048 on 1 and 246 DF.865 < 2e-16 *** GENDERM 0.31142 -0.19082 0. Build a simple linear regression model between “Total Cost to Hospital” and “MARITAL STATUS”.26109 1.HOSPITAL) ~ GENDER) Residuals: Min 1Q Median 3Q Max -1. The model shows that for males the total cost to hospital will be increased by a factor of 1.05503 216. p-value: 0.08258 0. Interpret the results.03168. The contrast command shows that it is coded as 1 for male and 0 for female. codes: 0 ‘***’ 0.

01 ‘*’ 0.STATUS) Residuals: Min 1Q Median 3Q Max -1.001 ‘**’ 0.’ 0. The contrast command shows that it is coded as 1 for unmarried and 0 for married.4641 on 246 degrees of freedom Multiple R-squared: 0. Adjusted R-squared: 0.STATUS) UNMARRIED MARRIED 0 UNMARRIED 1 > summary(mod_6) Call: lm(formula = log(TOTAL. The model .COST.05944 -6.HOSPITAL) ~ MARITAL.1 ‘ ’ 1 Residual standard error: 0.847 6e-11 *** --Signif.STATUSUNMARRIED -0.40697 0.STATUSUNNMARRIED.1601.04466 275.TO.29182 0.05 ‘.2360 -0.4042 Coefficients: Estimate Std.229 <2e-16 *** MARITAL.40697) [1] 0.1566 F-statistic: 46.2396 1. codes: 0 ‘***’ 0.> contrasts(MARITAL. Error t value Pr(>|t|) (Intercept) 12.0334 0. p-value: 5.88 on 1 and 246 DF.3608 -0. The dummy variable formed is MARITAL.998e-11 > exp(-0.6656642 Marital Status being a qualitative variable becomes a dummy variable here in the regression model.

The total cost will be multiplied by a factor of 0. in the combined model.132570 -0.80578 --Signif.2019. Adjusted R-squared: 0. Gender and marital status are insignificant as seen by p value.1 ‘ ’ 1 Residual standard error: 0.394e-12 Only Age is significant.2603 -0.HOSPITAL)~AGE+GENDER+MARITAL.09667 .STATUS) > > summary(mod_11) Call: lm(formula = log(TOTAL. and “AGE.COST. Build a multiple linear regression model with “Total Cost to Hospital” as dependent variable.0104 0.011 < 2e-16 *** AGE 0. the effect is not significant.05 ‘.790187 0.001 ‘**’ 0.6656642 and the p value shows that it is significant.COST. p-value: 6.TO.151136 78.246 0.4543 on 244 degrees of freedom Multiple R-squared: 0.TO. Error t value Pr(>|t|) (Intercept) 11. the gender and marital status show a lot of significant impact on the total cost to hospital. “GENDER” and “MARITAL STATUS” as predictors. however.STATUSUNMARRIED -0.00308 ** GENDERM 0. 6.104211 0. however in 4 and 5 these variables were coming as significant.HOSPITAL) ~ AGE + GENDER + MARITAL. This shows if considered independently. MARITAL.3529 Coefficients: Estimate Std.STATUS) Residuals: Min 1Q Median 3Q Max -1.989 0. codes: 0 ‘***’ 0.062490 1.’ 0.shows that for unmarried people the total cost to hospital will be decreased.58 on 3 and 244 DF.668 0. .007637 0.1921 F-statistic: 20. Compare the results with that of (4) and (5).01 ‘*’ 0.002555 2.032630 0.5285 -0. mod_11<-lm(log(TOTAL.2470 1.

2809374 0.tertalogy+PM.general + other.1444585 0.341603 .4186867 0.677148 CAD..tertalogy 0.668 0.0005591 0...TVD+CAD.HOSPITAL)~AGE+MALE+UNMARRIED+ACHD+CAD.9e-05 *** BODY.general+other.0023049 -0.2406915 1.respiratory 0.VSD + RHD + BODY.0964430 0.nervous 0.031108 * PM.heart+other.000433 *** CAD.244907 RHD 0.0022855 0.96533 -0.nervous + other.514 0.4989676 20.0606913 0.122602 other.130964 other.heart + other.6289222 0.WEIGHT 0.234 3.443 0.PULSE 0.SVD 0.596 0.HEIGHT 0. Build a multiple linear regression model with appropriate set of predictors.VSD 0.640 0. Comment on the performance of the fitted model.537890 BODY.4193210 1.0016910 0.WEIGHT + BODY.505364 ACHD 0.785 0. Identify the statistically significant predictors that the Mission Hospital can use in predicting “Total Cost to Hospital”.1454933 0.respiratory+other.442926 OS.1300201 3.ASD 0.nervou s+other. Error t value Pr(>|t|) (Intercept) 10.1408546 2.0019315 2.112 0.LOW + RR + Diabetes1 + Diabetes2 + hypertension1 + hypertension2 + hypertension3 + other + HB + UREA + CREATININE + AMBULANCE + TRANSFERRED + ELECTIVE) Residuals: Min 1Q Median 3Q Max -0.417 0.VSD+RHD+BODY.HIGH -0..COST.1152326 2.954 0.1333216 4.DVD+CAD.DVD + CAD.SVD + CAD..573 0.1517427 1.HEIGHT + HR.19462 1.3141862 1.general -1.0716926 -0.2303903 0.567339 UNMARRIED 0.tertalogy + PM.357 0.0736222 0.7. > mod_9<lm(log(TOTAL.175 0.2947377 0.VSD + OS.4195765 0.HOSPITAL) ~ AGE + MALE + UNMARRIED + ACHD + CAD.3492459 0.respiratory + other.TO.PULSE + BP.4634972 -3.268025 CAD.3220618 0.TO.011488 * other.HEIGHT+HR.01659 0.3441462 0.769 0..TVD 0.552 0.2061631 0.ASD + other.1693884 2.19165 Coefficients: Estimate Std.167 0.741381 HR.heart 0.TVD + CAD.4675391 0.COST.518 0.009129 ** BP.0030825 2..VSD+OS.VSD 0.6509382 0.ASD+other.0085850 0.5645466 0.WEIGHT+BODY.331 0..015670 * CAD.000577 *** other.HIGH + BP.721494 other.006015 ** MALE -0.558 0.DVD 0.SVD+ CAD.HIGH+BP.0410937 0.0037020 0.882 < 2e-16 *** AGE 0.0050994 0.L OW+RR+Diabetes1+Diabetes2+hypertension1+hypertension2+hypertension3+other+HB +UREA+CREATININE+AMBULANCE+TRANSFERRED+ELECTIVE) > > summary(mod_9) Call: lm(formula = log(TOTAL..18093 -0.PULSE+BP.617 0.0021987 0.3684828 0.

0090719 1.000399 *** CREATININE 0.757307 CREATININE 0.0008210 0.806 0.1999772 0.tertalogy 0.401122 0.282 0.000186 *** CAD.190 0.1756235 1.419724 -3.PULSE + CREATININE) Residuals: Min 1Q Median 3Q Max -1.0005388 0..PULSE+CREATININE) > summary(mod_10) Call: lm(formula = log(TOTAL. p-value: < 2.221803 0.TO.167 0.0703775 0..058343 .heart 0.532 0.2e-16 .001 ‘**’ 0.1271125 2.100450 4.176893 62.1217057 -0.001 ‘**’ 0.974447 0.105391 3.003162 ** other.571496 Diabetes2 0.064466 3.114124 2.907 0.680 0.543 0.037444 * AMBULANCE 0.11e-06 *** HR.310 0. Error t value Pr(>|t|) (Intercept) 10.1 ‘ ’ 1 Residual standard error: 0.600 0.HOSPITAL) ~ AGE + CAD.BP.0027892 0.general + other.0026521 0.3199244 0.001594 3.813456 UREA 0.05 ‘.26342 Coefficients: Estimate Std.040 < 2e-16 *** AGE 0.882 2.tertalogy+RHD+HR.TO.71 on 9 and 205 DF.2090071 0.0032198 -0.02119 0.236 0.LOW -0.general -1.109755 3.19485 1.HOSPITAL)~AGE+CAD.0623585 0. codes: 0 ‘***’ 0.1643344 -0.000101 *** CAD.288918 0.3979 F-statistic: 16.4061 on 205 degrees of freedom (33 observations deleted due to missingness) Multiple R-squared: 0.223745 0.388842 0.3965 on 156 degrees of freedom (57 observations deleted due to missingness) Multiple R-squared: 0.0173013 0.567 0.001672 3.570932 HB 0.569 0.heart+other.’ 0. p-value: 5.3115261 0.PULSE 0.987 0.05 ‘.DVD+CAD.19 on 34 and 156 DF.177 0.471 0.074259 2.570339 other -0.5307. Diabetes1 -0.778221 --Signif.2662347 0.’ 0.490360 0.174e-13 The significant predictors are highlighted in yellow in the table above.0118002 0.472 0.1239298 -0.0878894 0.000298 *** other.544496 0.328 0.743607 TRANSFERRED -0.240923 ELECTIVE 0.tertalogy + RHD + HR.TVD+other.01 ‘*’ 0.01 ‘*’ 0.965 0.1048268 0.TVD + other.568 0.2261663 -1.4285 F-statistic: 5.TVD 0.4232..143028 hypertension3 0.235820 hypertension1 -0.general+ other.005739 0.0931856 0.20151 -0.COST. Adjusted R-squared: 0.512 0. Adjusted R-squared: 0. > mod_10<lm(log(TOTAL.609116 hypertension2 -0.099 0.2667857 0. codes: 0 ‘***’ 0.012103 * RHD 0.000490 *** other.1137384 0.867311 RR 0.DVD 0.06605 -0.006630 0.COST.1 ‘ ’ 1 Residual standard error: 0.000633 *** --Signif.DVD + CAD.2203463 0.heart + other.1496889 -1.

32% of the model.3979.The fitted model with all the significant predictor also has a multiple r square of 42. .32% and the adjusted r square of 0. showing the model explains 42.

- The Dutch Flower ClusterUploaded bykhizariima2011
- Mission Hospital Case StudyUploaded byAbhishekKumar
- Fiji Water and Corporate Social ResponsibilityUploaded byAlex Nicky C Vr
- Agri Assignment FinalUploaded bykanika1992
- Mission Hospital Case Solution_Sec AUploaded byKaran Kakkar
- IMC Case Analysis Gillette Dry Idea (A)Uploaded bymahtaabk
- Deutsche Bank and the Road to Basel IIIUploaded byAdharsh R Nair
- Marico Supply ChainUploaded byRichesh Krishnan
- Assignment 2.1Uploaded bySagar
- Wonderla (1)Uploaded byVishal Kumar
- Vanraj Mini Tractor-Case Study(2)Uploaded byShiva_Kumar_8013
- Assignment3 GroupUploaded byrakesh
- Group 8 Gillette Dry IdeaUploaded byAnubhuti Gupta
- ChemGrow Sales and DistributionUploaded byanshum_dua
- Saffola Cooking Oil- The Repositioning JourneyUploaded byShruti Mandal
- Describe a Situation Where You Achieved Something Significant After Overcoming ObstaclesUploaded byAkanksha2690
- Data Mining TutorialUploaded byAbhinav Pandey
- Chapter 04 AnswersUploaded bysadfj545
- Teuer Furniture B Case solutionUploaded byShubham Gupta
- Case Study on Scientific Glass Inc,: Inventory ManagementUploaded byOnur Yılmaz
- 8399320Uploaded byNeeta Joshi
- ARMA Eviews PrevUploaded byNuno Azevedo
- Toys R Us Case AnalysisUploaded byHarsh Asthana
- 5017a123Uploaded byTakeIt!
- Strategy Map of Domestic Auto PartsUploaded byRegina Samson
- Managerial Economics (Chapter 4)Uploaded byapi-3703724
- Impact Of Corruption on Economic GrowthUploaded byHamzah Bhatti
- A&D High Tech _ Case studyUploaded byAhmed
- Blood Diamond - Ethical AnalysisUploaded bysheran23
- Biplots 1Uploaded byÁngel Quintero Sánchez

- Mountain DewUploaded byShruti Pandey
- ceat caseUploaded byShruti Pandey
- Gillette Dry IdeaUploaded byShruti Pandey
- HIDESIGNCaseSubmission_Group1_SectionBUploaded byShruti Pandey
- Mess Bill 19 April 2016Uploaded byShruti Pandey
- Gillette Case Submission Group 1 Section BUploaded byShruti Pandey
- Arbit PresentationUploaded byShruti Pandey
- Land Acquisition IndiaUploaded byShruti Pandey
- Chapter 2 AMUploaded byShruti Pandey
- New LaptopUploaded byShruti Pandey
- List PGP 18-19Uploaded byShruti Pandey
- New Text Document.txtUploaded byShruti Pandey