You are on page 1of 17

IMPACT OF DIFFERENT FACTORS OVER

EMPLOYEE COMPENSATION

GROUP 4

ARVIND M | H21010
ASHRAY SHARMA | H21012
SRAVYA RANGAVAJJULA | H21025
NIHARIKA PRADHAN | H21032
PINAKI MISHRA | H21034

1
ACKNOWLEDGEMENT
We also want to express our gratitude to Prof. P. C. Padhan for giving us the chance to conduct this
research and gain first-hand knowledge of how econometrics is used and used in the specialized field
of human resource management. We gained fresh insights into the field by quantitatively analyzing
human resource management and human behaviour, and our research and experimenting with the
same led to extensive learning.

2
Table of Contents
ACKNOWLEDGEMENT ............................................................................................................................................ 2
TOPIC ...................................................................................................................................................................... 4
PART – A ................................................................................................................................................................. 5
A. VARIOUS REGRESSION MODELS ..................................................................................................................... 5
Straight Line Regression Model: A = β1 + β2*D.............................................................................................. 5
Reciprocal Regression Model: A = β1 + β2/D ................................................................................................. 6
Exponential Regression Model: log(A) = β1 + β2*log(D) ................................................................................ 7
Parabolic Regression Model: A = β1 + β2*D + β3*(D^2) ................................................................................ 8
B. COMPARISON OF THE MODELS ...................................................................................................................... 9
C. HYPOTHESIS TESTING FOR THE BEST MODEL ............................................................................................... 10
1. Sign of the coefficient ............................................................................................................................... 10
2. T-test ......................................................................................................................................................... 10
3. F-Test........................................................................................................................................................ 10
4. Chi-Square Test ......................................................................................................................................... 11
PART - B ................................................................................................................................................................ 11
MULTIPLE LINEAR REGRESSION MODEL ........................................................................................................... 11
1. T-test ......................................................................................................................................................... 12
2. F-test ......................................................................................................................................................... 13
3. Normality .................................................................................................................................................. 13
4. Multicollinearity ........................................................................................................................................ 14
5. Heteroscedasticity .................................................................................................................................... 14
6. Autocorrelation ......................................................................................................................................... 16
7. Chow Test.................................................................................................................................................. 16
NOTE : ........................................................................................................................................................... 17
Owing to the size of the data set, Chi Square was not possible. Hence, it has been excluded above. ........ 17

3
TOPIC
The research aims at studying a relation between various factors on the monthly compensation of the
employees, especially the cash component of the salary.

We chose this topic because since the dawn of the epidemic, many companies, both big and small,
have made changes to their compensation structure, amongst huge layoffs and mass firings. And now
that the things are getting better, firms are trying to analyze past data to figure out what should they
pay their employees to retain the employees and avoid mass attrition. This will help the firms formulate
better compensation policies and benefit programs for the employees based on their demographics
and personalized details.

From theory, we assume the following:

• Older employees tend to earn much more than younger employees, but this can mostly be
attributed to the experience they gain and how high they are in the corporate ladder given the
time they have been working.

• Distance from home does not have a significant impact on the monthly income. This has been
established in high-quality research that distance between residence and workplace has a high
affect on employee absenteeism and attrition rates, but not on the compensation structure.

• The number of companies an employee has worked in before the current organization also has
no significant impact on the employee’s salary. Some people choose to stay with a firm for a
long term and grow internally while others might see as switching jobs as a faster and easier
way to rise up in designations and compensation.

• Percent salary hike refers to how much the salary has been increased since the last promotion.
There is a positive correlation between increase in salary and the monthly compensation.

• The total number of years a person has as experience positively impact an employee’s
compensation, when other variables are held constant.

4
PART – A

A. VARIOUS REGRESSION MODELS

Dependent Variable: Monthly Compensation (A)


Independent Variable: Age (D)

Model 1:
Straight Line Regression Model: A = β1 + β2*D

Interpretation:

1. Regression Equation: A = 256.57*D-2970.67

As Age increases by 1-unit, Monthly Compensation comes down by Rs.256.57 on an average.

2. Covariance between the coefficients:

Covariance = -5028.61

The covariance is negative which shows that:


The impact on the Monthly compensation as β1 increases, will not be equal to the impact on the
monthly compensation as β2 increases.

3. R2 = 0.2479

R2 of the model is 0.2479, which means that 24.79% of the observed variation can be explained
by the model's inputs.

5
4. AIC = 28621.58

5. CV=0.6278

Model 2:
Reciprocal Regression Model: A = β1 + β2/D

Interpretation:

1. Regression Equation: A = 15144 - D/299744.2

As age increases, monthly income decreases by a minuscule

2. Covariance between the coefficients:

Covariance = -7846223687

The covariance is negative which shows that:


The impact on the Monthly compensation as β1 increases, will not be equal to the impact on the
monthly compensation as β2 increases

3. R2 = 0.2236

R2 of the model is 0.2236, which means that 22.36% of the observed variation can be explained
by the model's inputs.

4. AIC=28668.22

5. CV=0.6379

6
Model 3:
Exponential Regression Model: log(A) = β1 + β2*log(D)

Interpretation:

1. Regression Equation: log(A) = 3.75988 + 1.33946*log(D)

As age increases by 1 unit, Monthly Compensation increases.

2. Covariance between the coefficients:

Covariance = 0.1322619

The covariance is negative which shows that:


The impact on the Monthly compensation as β1 increases, will not be equal to the impact on the
monthly compensation as β2 increases

3. R2 = 0.2544

R2 of the model is 0.2544, which means that 25.44% of the observed variation can be explained
by the model's inputs.

4. AIC=2543.27

5. CV=8.82277e-05

7
Model 4:
Parabolic Regression Model: A = β1 + β2*D + β3*(D^2)

Interpretation:

1. Regression Equation: A = -39999.1498+ (312.8211) *D-(0.7247*(D^2))

As age increases, monthly compensation increases

2. Covariance between the coefficients:

Covariance (β1, β2) = - 260242.931


Covariance (β2, β3) = - 1736.80235
Covariance (β1, β3) = - 2507714.848

The covariance is negative which shows that:

• The impact on the Monthly compensation as β1 increases, will not be equal to the impact on
the Monthly compensation as β2 increases

• The impact on the Monthly compensation as β2 increases, will not be equal to the impact on
the Monthly compensation as β3 increases

• The impact on the Monthly compensation as β1 increases, will not be equal to the impact on
the Monthly compensation as β2 increases

3. R2 = 0.2481

R2 of the model is 0.2481, which means that 24.81% of the observed variation can be explained
by the model's inputs.

8
4. AIC = 28623.13

5. CV=0.627776

B. COMPARISON OF THE MODELS

1.The Akaike information criterion (AIC) is a mathematical method for evaluating how well a
model fits the data it was generated from.

2.The coefficient of variation (CV) is a relative measure of variability that indicates the size of a
standard deviation in relation to its mean.

**Note: Since our data is cross sectional, we are not using the AIC values to choose the model
We will choosing the data based on the CV value because the model with the lowest CV
value is the best model.

S. No Model CV Value
1 Straight Line Regression Model 0.6278
2 Reciprocal Regression Model 0.6379
Exponential Regression Model 8.82277e-05
3
4 Parabolic Regression Model 0.627776

Exponential Regression Model is the best because it has the lowest CV value.

9
C. HYPOTHESIS TESTING FOR THE BEST MODEL

Exponential Regression Model: log(A) = β1 + β2*log(D)

Regression Equation: log(A) = 3.75988 + 1.33946*log(D)

1. Sign of the coefficient

The sign of the coefficient is positive which indicates that as Age increases by 1 unit, the Monthly
Compensation also increases by a miniscule value. This matches with our theory.

2. T-test

Null Hypothesis: There is no relationship between the Age and Monthly Income (H0 : β2= 0)

Alternate Hypothesis: There is a relationship between the Age and Monthly Income(H1: β2 ≠ 0)

The t value corresponding to the independent variable is 0.000, which means that it is statistically
significant. Hence the null hypothesis will be rejected.

3. F-Test

Null Hypothesis: There is no relationship between the Age and Monthly Income (H0 : β2= 0)

Alternate Hypothesis: There is a relationship between the Age and Monthly Income(H1: β2 ≠ 0)

Since the p value corresponding to the F value is 0.000, hence we can reject the null hypothesis.

10
4. Chi-Square Test

Null Hypothesis: There is no relationship between the Age and Monthly Income (H0 : β2= 0)

Alternate Hypothesis: There is a relationship between the Age and Monthly Income(H1: β2 ≠ 0)

The Chi-squared test can be used to measure the goodness-of-fit of your trained regression model on
the training, validation, or test data sets.

The result shows that the p value is significant, hence we reject the null hypothesis.

PART - B
MULTIPLE LINEAR REGRESSION MODEL

Dependent Variable: Monthly Income (MI)

Independent Variables:
a) Age (A)
b) Distance from Home (DFH)
c) No. of companies (NC)
d) Total Working Years (TWY)

Regression Model: MI = β1 + β2*A + β3*DFH + β4*NC + β5*T

11
Multiple Linear Regression Model:

Monthly Income = 2088.89- 22.95*Age + 490.42*TotalWorkingYears-


57.27*NumCompaniesWorked - 12.62*DistanceFromHome

The final model after correcting for heteroscedascity and multicollinearity:

Std.
Estimate Error t value Pr(>|t|)
(Intercept) 2088.89 310.30 6.731 2.398e-11 ***
Age -22.95 10.00 -2.294 0.02191 .
TotalWorkingYears 490.42 15.18 33.313 < 2.2e-16 ***
NumCompaniesWorked -57.27 34.37 -1.666 0.09589 .
DistanceFromHome -12.62 9.31 -1.355 0.17553
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

• Residual standard error: 2981 on 1465 degrees of freedom


• Multiple R-squared: 0.6001, Adjusted R-squared: 0.599
• F-statistic: 549.6 on 4 and 1465 DF, p-value: < 2.2e-16

1. T-test

H0: There is no statistically significant relationship between monthly income and age.
Ha: There is a significant relationship between monthly income and age.

H0: There is no statistically significant relationship between monthly income and total working years.
Ha: There is a significant relationship between monthly income and total working years.

H0: There is no statistically significant relationship between monthly income and number of
companies worked.
Ha: There is a significant relationship between monthly income and number of companies worked.

H0: There is no statistically significant relationship between monthly income and distance from home.
Ha: There is a significant relationship between monthly income and distance from home.

Since p value is greater than 0.05 for NumCompaniesWorked and DistanceFromHome, we fail to
reject the null hypothesis and conclude that there is no statistically significant relationship.

However, since p value is lower than 0.05 for TotalWorkingYears and Age, we reject the null
hypothesis and conclude that there is a statistically significant relationship amongst the variables.

12
2. F-test
H0: All the regression coefficients are equal to zero.
Ha: All the regression coefficients are not equal to zero.
Since observed p value / significance of F-test is close to 0, null hypothesis can be rejected..

3. Normality

To test the normality, we first used the above formula to compute the Jarque-Bera Statistic.

Where n= total number of observations, S=skewness and K= kurtosis.


The JB statistic follows chi square distribution with degree of freedom=2. Thus, we formulate the
following hypothesis:

H0: Residuals follow normal distribution


Ha: Residuals do not follow normal distribution.

Upon testing normality of the residuals, the Jarque-Bera statistic was found to have a large value of
77.525 with p value less than 0.0001, which is lower than significance level alpha of 0.05. Hence, the
null hypothesis is rejected, and the interpretation follows that the residuals are not normally
distributed.
To correct this, we transformed the dependent variable to square root of monthly income. Upon
testing for normality again, we found Jarque-Bera statistic to be equal to 5.648 with p value equal to
0.0594, which is higher than the significance level alpha of 0.05 and hence it follows that the
residuals are normally distributed.

13
4. Multicollinearity

Coefficientsa
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 49.946 2.073 24.090 .000
Age -.095 .068 -.033 -1.402 .161 .517 1.935
TotalWorkingYears 2.708 .078 .794 34.751 .000 .536 1.866
NumCompaniesWorked -.203 .187 -.019 -1.090 .276 .907 1.102
DistanceFromHome -.032 .055 -.010 -.591 .555 .999 1.001
a. Dependent Variable: sqrt_MonthlyIncome

Since the Variance Inflating Factor (VIF) is less than 10 for all variables, there is no multicollinearity.
If multicollinearity were present, we would’ve corrected it through one or more of the following steps:
i. Removal of highly correlated variables
ii. Linearly combining some of the independent variables
iii. Perform principal components analysis or partial least squares regression
iv. Use other regression models

5. Heteroscedasticity

14
To correct heteroscedasticity, we applied different transformations like log, square root, cubic root,
weighted/generalized least square but we couldn’t correct for heteroscedasticity.
So, we applied White- Heteroscedasticity SE.

Results before applying transformations:


Std.
Estimate Error t value Pr(>|t|)
(Intercept) 2088.89 363.33 5.749 1.09E-08 ***
Age -22.95 11.84 -1.938 0.0528 .
TotalWorkingYears 490.42 13.66 35.909 <0.0000000000000002 ***
NumCompaniesWorked -57.27 32.69 -1.752 0.08 .
DistanceFromHome -12.62 9.6 -1.315 0.1889
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Results after applying transformations:


Std.
Estimate Error t value Pr(>|t|)
(Intercept) 2088.89 310.30 6.731 2.398e-11 ***
Age -22.95 10.00 -2.294 0.02191 .
TotalWorkingYears 490.42 15.18 33.313 < 2.2e-16 ***
NumCompaniesWorked -57.27 34.37 -1.666 0.09589 .
DistanceFromHome -12.62 9.31 -1.355 0.17553
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

15
6. Autocorrelation

Model Summaryb
Adjusted R Std. Error of the
Model R R Square Square Estimate Durbin-Watson
1 .768a .590 .589 17.01253 1.998
a. Predictors: (Constant), DistanceFromHome, Age, NumCompaniesWorked,
TotalWorkingYears
b. Dependent Variable: sqrt_MonthlyIncome

n = 1470 and k’ = 4
dU=1.872 and dL=1.883

The hypothesis for autocorrelation are:


H0: ρ=0 zero auto correlation,
H1: ρ≠ 0 autocorrelation is present (positive or negative)
The Durbin-Watson statistic lies between dU and 4-dU, therefore we accept the null hypothesis.
i.e., no autocorrelation is present

Using Newey-West Standard Error method to correct for autocorrelation (just to demonstrate though
no autocorrelation is present):

Std.
Estimate Error t value Pr(>|t|)
(Intercept) 2088.8859 309.6343 6.7463 2.18E-11 ***
Age -22.9526 9.9892 -2.2977 0.02172 *
TotalWorkingYears 490.4237 15.1297 32.4147 <0.00000000000000022 ***
NumCompaniesWorked -57.2671 34.2731 -1.6709 0.09495 .
DistanceFromHome -12.6193 9.2874 -1.3587 0.17444
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

7. Chow Test

Regression Stability Test (Chow test)


Score C.V. P-Value Stable? 5.0%
0.401 1.837 19.23% TRUE

H0 : Parameter Stability

H1 : Parameter Instability

The Chow Test score is 0.4 and Co-variance is 2.628 , we get a p-value of 19.23% which is higher
than the significance value of 5%, hence we accept the null hypothesis that the parameters are
stable and there is no structural break in the data.

16
NOTE :
Owing to the size of the data set, Chi Square was not possible. Hence, it has been excluded above.

17

You might also like