Professional Documents
Culture Documents
EMPLOYEE COMPENSATION
GROUP 4
ARVIND M | H21010
ASHRAY SHARMA | H21012
SRAVYA RANGAVAJJULA | H21025
NIHARIKA PRADHAN | H21032
PINAKI MISHRA | H21034
1
ACKNOWLEDGEMENT
We also want to express our gratitude to Prof. P. C. Padhan for giving us the chance to conduct this
research and gain first-hand knowledge of how econometrics is used and used in the specialized field
of human resource management. We gained fresh insights into the field by quantitatively analyzing
human resource management and human behaviour, and our research and experimenting with the
same led to extensive learning.
2
Table of Contents
ACKNOWLEDGEMENT ............................................................................................................................................ 2
TOPIC ...................................................................................................................................................................... 4
PART – A ................................................................................................................................................................. 5
A. VARIOUS REGRESSION MODELS ..................................................................................................................... 5
Straight Line Regression Model: A = β1 + β2*D.............................................................................................. 5
Reciprocal Regression Model: A = β1 + β2/D ................................................................................................. 6
Exponential Regression Model: log(A) = β1 + β2*log(D) ................................................................................ 7
Parabolic Regression Model: A = β1 + β2*D + β3*(D^2) ................................................................................ 8
B. COMPARISON OF THE MODELS ...................................................................................................................... 9
C. HYPOTHESIS TESTING FOR THE BEST MODEL ............................................................................................... 10
1. Sign of the coefficient ............................................................................................................................... 10
2. T-test ......................................................................................................................................................... 10
3. F-Test........................................................................................................................................................ 10
4. Chi-Square Test ......................................................................................................................................... 11
PART - B ................................................................................................................................................................ 11
MULTIPLE LINEAR REGRESSION MODEL ........................................................................................................... 11
1. T-test ......................................................................................................................................................... 12
2. F-test ......................................................................................................................................................... 13
3. Normality .................................................................................................................................................. 13
4. Multicollinearity ........................................................................................................................................ 14
5. Heteroscedasticity .................................................................................................................................... 14
6. Autocorrelation ......................................................................................................................................... 16
7. Chow Test.................................................................................................................................................. 16
NOTE : ........................................................................................................................................................... 17
Owing to the size of the data set, Chi Square was not possible. Hence, it has been excluded above. ........ 17
3
TOPIC
The research aims at studying a relation between various factors on the monthly compensation of the
employees, especially the cash component of the salary.
We chose this topic because since the dawn of the epidemic, many companies, both big and small,
have made changes to their compensation structure, amongst huge layoffs and mass firings. And now
that the things are getting better, firms are trying to analyze past data to figure out what should they
pay their employees to retain the employees and avoid mass attrition. This will help the firms formulate
better compensation policies and benefit programs for the employees based on their demographics
and personalized details.
• Older employees tend to earn much more than younger employees, but this can mostly be
attributed to the experience they gain and how high they are in the corporate ladder given the
time they have been working.
• Distance from home does not have a significant impact on the monthly income. This has been
established in high-quality research that distance between residence and workplace has a high
affect on employee absenteeism and attrition rates, but not on the compensation structure.
• The number of companies an employee has worked in before the current organization also has
no significant impact on the employee’s salary. Some people choose to stay with a firm for a
long term and grow internally while others might see as switching jobs as a faster and easier
way to rise up in designations and compensation.
• Percent salary hike refers to how much the salary has been increased since the last promotion.
There is a positive correlation between increase in salary and the monthly compensation.
• The total number of years a person has as experience positively impact an employee’s
compensation, when other variables are held constant.
4
PART – A
Model 1:
Straight Line Regression Model: A = β1 + β2*D
Interpretation:
Covariance = -5028.61
3. R2 = 0.2479
R2 of the model is 0.2479, which means that 24.79% of the observed variation can be explained
by the model's inputs.
5
4. AIC = 28621.58
5. CV=0.6278
Model 2:
Reciprocal Regression Model: A = β1 + β2/D
Interpretation:
Covariance = -7846223687
3. R2 = 0.2236
R2 of the model is 0.2236, which means that 22.36% of the observed variation can be explained
by the model's inputs.
4. AIC=28668.22
5. CV=0.6379
6
Model 3:
Exponential Regression Model: log(A) = β1 + β2*log(D)
Interpretation:
Covariance = 0.1322619
3. R2 = 0.2544
R2 of the model is 0.2544, which means that 25.44% of the observed variation can be explained
by the model's inputs.
4. AIC=2543.27
5. CV=8.82277e-05
7
Model 4:
Parabolic Regression Model: A = β1 + β2*D + β3*(D^2)
Interpretation:
• The impact on the Monthly compensation as β1 increases, will not be equal to the impact on
the Monthly compensation as β2 increases
• The impact on the Monthly compensation as β2 increases, will not be equal to the impact on
the Monthly compensation as β3 increases
• The impact on the Monthly compensation as β1 increases, will not be equal to the impact on
the Monthly compensation as β2 increases
3. R2 = 0.2481
R2 of the model is 0.2481, which means that 24.81% of the observed variation can be explained
by the model's inputs.
8
4. AIC = 28623.13
5. CV=0.627776
1.The Akaike information criterion (AIC) is a mathematical method for evaluating how well a
model fits the data it was generated from.
2.The coefficient of variation (CV) is a relative measure of variability that indicates the size of a
standard deviation in relation to its mean.
**Note: Since our data is cross sectional, we are not using the AIC values to choose the model
We will choosing the data based on the CV value because the model with the lowest CV
value is the best model.
S. No Model CV Value
1 Straight Line Regression Model 0.6278
2 Reciprocal Regression Model 0.6379
Exponential Regression Model 8.82277e-05
3
4 Parabolic Regression Model 0.627776
Exponential Regression Model is the best because it has the lowest CV value.
9
C. HYPOTHESIS TESTING FOR THE BEST MODEL
The sign of the coefficient is positive which indicates that as Age increases by 1 unit, the Monthly
Compensation also increases by a miniscule value. This matches with our theory.
2. T-test
Null Hypothesis: There is no relationship between the Age and Monthly Income (H0 : β2= 0)
Alternate Hypothesis: There is a relationship between the Age and Monthly Income(H1: β2 ≠ 0)
The t value corresponding to the independent variable is 0.000, which means that it is statistically
significant. Hence the null hypothesis will be rejected.
3. F-Test
Null Hypothesis: There is no relationship between the Age and Monthly Income (H0 : β2= 0)
Alternate Hypothesis: There is a relationship between the Age and Monthly Income(H1: β2 ≠ 0)
Since the p value corresponding to the F value is 0.000, hence we can reject the null hypothesis.
10
4. Chi-Square Test
Null Hypothesis: There is no relationship between the Age and Monthly Income (H0 : β2= 0)
Alternate Hypothesis: There is a relationship between the Age and Monthly Income(H1: β2 ≠ 0)
The Chi-squared test can be used to measure the goodness-of-fit of your trained regression model on
the training, validation, or test data sets.
The result shows that the p value is significant, hence we reject the null hypothesis.
PART - B
MULTIPLE LINEAR REGRESSION MODEL
Independent Variables:
a) Age (A)
b) Distance from Home (DFH)
c) No. of companies (NC)
d) Total Working Years (TWY)
11
Multiple Linear Regression Model:
Std.
Estimate Error t value Pr(>|t|)
(Intercept) 2088.89 310.30 6.731 2.398e-11 ***
Age -22.95 10.00 -2.294 0.02191 .
TotalWorkingYears 490.42 15.18 33.313 < 2.2e-16 ***
NumCompaniesWorked -57.27 34.37 -1.666 0.09589 .
DistanceFromHome -12.62 9.31 -1.355 0.17553
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
1. T-test
H0: There is no statistically significant relationship between monthly income and age.
Ha: There is a significant relationship between monthly income and age.
H0: There is no statistically significant relationship between monthly income and total working years.
Ha: There is a significant relationship between monthly income and total working years.
H0: There is no statistically significant relationship between monthly income and number of
companies worked.
Ha: There is a significant relationship between monthly income and number of companies worked.
H0: There is no statistically significant relationship between monthly income and distance from home.
Ha: There is a significant relationship between monthly income and distance from home.
Since p value is greater than 0.05 for NumCompaniesWorked and DistanceFromHome, we fail to
reject the null hypothesis and conclude that there is no statistically significant relationship.
However, since p value is lower than 0.05 for TotalWorkingYears and Age, we reject the null
hypothesis and conclude that there is a statistically significant relationship amongst the variables.
12
2. F-test
H0: All the regression coefficients are equal to zero.
Ha: All the regression coefficients are not equal to zero.
Since observed p value / significance of F-test is close to 0, null hypothesis can be rejected..
3. Normality
To test the normality, we first used the above formula to compute the Jarque-Bera Statistic.
Upon testing normality of the residuals, the Jarque-Bera statistic was found to have a large value of
77.525 with p value less than 0.0001, which is lower than significance level alpha of 0.05. Hence, the
null hypothesis is rejected, and the interpretation follows that the residuals are not normally
distributed.
To correct this, we transformed the dependent variable to square root of monthly income. Upon
testing for normality again, we found Jarque-Bera statistic to be equal to 5.648 with p value equal to
0.0594, which is higher than the significance level alpha of 0.05 and hence it follows that the
residuals are normally distributed.
13
4. Multicollinearity
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 49.946 2.073 24.090 .000
Age -.095 .068 -.033 -1.402 .161 .517 1.935
TotalWorkingYears 2.708 .078 .794 34.751 .000 .536 1.866
NumCompaniesWorked -.203 .187 -.019 -1.090 .276 .907 1.102
DistanceFromHome -.032 .055 -.010 -.591 .555 .999 1.001
a. Dependent Variable: sqrt_MonthlyIncome
Since the Variance Inflating Factor (VIF) is less than 10 for all variables, there is no multicollinearity.
If multicollinearity were present, we would’ve corrected it through one or more of the following steps:
i. Removal of highly correlated variables
ii. Linearly combining some of the independent variables
iii. Perform principal components analysis or partial least squares regression
iv. Use other regression models
5. Heteroscedasticity
14
To correct heteroscedasticity, we applied different transformations like log, square root, cubic root,
weighted/generalized least square but we couldn’t correct for heteroscedasticity.
So, we applied White- Heteroscedasticity SE.
15
6. Autocorrelation
Model Summaryb
Adjusted R Std. Error of the
Model R R Square Square Estimate Durbin-Watson
1 .768a .590 .589 17.01253 1.998
a. Predictors: (Constant), DistanceFromHome, Age, NumCompaniesWorked,
TotalWorkingYears
b. Dependent Variable: sqrt_MonthlyIncome
n = 1470 and k’ = 4
dU=1.872 and dL=1.883
Using Newey-West Standard Error method to correct for autocorrelation (just to demonstrate though
no autocorrelation is present):
Std.
Estimate Error t value Pr(>|t|)
(Intercept) 2088.8859 309.6343 6.7463 2.18E-11 ***
Age -22.9526 9.9892 -2.2977 0.02172 *
TotalWorkingYears 490.4237 15.1297 32.4147 <0.00000000000000022 ***
NumCompaniesWorked -57.2671 34.2731 -1.6709 0.09495 .
DistanceFromHome -12.6193 9.2874 -1.3587 0.17444
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
7. Chow Test
H0 : Parameter Stability
H1 : Parameter Instability
The Chow Test score is 0.4 and Co-variance is 2.628 , we get a p-value of 19.23% which is higher
than the significance value of 5%, hence we accept the null hypothesis that the parameters are
stable and there is no structural break in the data.
16
NOTE :
Owing to the size of the data set, Chi Square was not possible. Hence, it has been excluded above.
17