Professional Documents
Culture Documents
1 Introduction .................................................................................................................................. 3
2 Abbreviations ................................................................................................................................ 3
3 Identification of dataset .................................................................................................................. 3
4 Description of the dataset ............................................................................................................... 3
5 Set “Objective” .............................................................................................................................. 3
6 The Hypothesis being tested ........................................................................................................... 3
7 Multiple regression modelling to identify prediction expression ....................................................... 4
8 Scatterplot to analyse correlation among variables .......................................................................... 4
9 Highlights/ conclusions ................................................................................................................... 6
DDMB project report submission against dataset “Salary of university professor” by Group-5
1 Introduction
This project report provides a broad understanding on the analysis done by our group on the selected dataset. Set
of few dataset was provided by professor initially, later on we selected one out of all given options, concluded it
while analysing through multiple regression modelling.
2 Abbreviations
Abbreviation Description
DDMB Data-driven decision making for business
VIF Variance inflation factor
p-value Probability of obtaining the observed results, assuming that the null hypothesis is true
3 Identification of dataset
We identified the dataset “Salary of the University Professors” based on the following factors-
5 Set “Objective”
To identify correct predictors that influence the salary of professor (Response variable) in a university using
correlation & multiple regression modelling. Also to find out the relationship among different variables.
Analysis: Since there are few categorical variables as well along with numerical variable so we have drawn “Fit
model” curve and plotted “Indicator function parameterization” rather than “parameter estimates”. Further, we
compared the p-values with level of significance (0.05) and accepted the null hypothesis for greater value and
rejected for lower than 0.05 values.
Outcomes:
Based on the p-value, we can conclude there is no relation between gender and salary so this variable
can be avoided in further analysis.
All other variables are considered as predictors based on their p-values.
Page 3 of 6
DDMB project report submission against dataset “Salary of university professor” by Group-5
While considering both types of variables, we come up with prediction expression which is as follows-
Salary = 129661.85 - (32456.15 X Assoc. Prof) – (45287.69 X Asst. Prof) - (14505.15 X Discipline A) +
(534.63 X Years since Ph.D.) – (476.72 X Years in service)
Page 4 of 6
DDMB project report submission against dataset “Salary of university professor” by Group-5
Option-1 Option-2
Actual dataset Predicted values when both variables Predicted values when both variables
considered (Option-1) rejected (Option-2)
Salary data given in Prediction expression: Salary = 129661.85 - Prediction expression: Salary =
dataset (32456.15 X Assoc. Prof) – (45287.69 X Asst. 133549.12 - (34082.3 X Assoc. Prof) –
Prof) - (14505.15 X Discipline A) + (534.63 X (47843.84 X Asst. Prof) - (13760.96 X
Years since Ph.D.) – (476.72 X Years in service) Discipline A)
Avg. salary = 113706.5 Avg. predicted salary = 113706.5 Avg. predicted salary = 113706.5
Std. Dev. = 30289.04 Std. Dev. = 20423.81 Std. Dev. = 20204.87
Min. salary = 57800 Min. salary = 66763.55 Min. salary = 71944.33
Max. salary = 231545 Max. salary = 142676.8 Max. salary = 133549.1
- R square = 45.25% R square = 44.49%
As per aforesaid table, we can conclude that option-1 (considering both variables) is better approach than
option-2 (rejecting both variables).
Page 5 of 6
DDMB project report submission against dataset “Salary of university professor” by Group-5
9 Highlights/ conclusions
1. There are 4 variables: 2 are numerical & 2 are categorical which affect the response/ salary.
2. Two variables viz. Years since Ph.D. and Years in service are closely coupled.
3. VIF value for both aforesaid variables is observed more than 5 but less than 10. It is an indication of
multicollinearity but still we considered these variables for prediction expression because of better
closeness to original dataset and higher R square value.
4. R-Square Value: For linear function we got R-square value as 0.4525 or 45.25%
Rank: Salary increases while increasing rank. Highest to Prof and least to Asst. Prof.
Years since PhD: Salary increases slightly while increasing number of years after Ph.D.
Years in service: Salary slightly decreasing while increasing service years, this needs further
detail analysis and exploration.
Page 6 of 6