Group 4 - AEC1

IMPACT OF DIFFERENT FACTORS OVER
EMPLOYEE COMPENSATION
GROUP 4
ARVIND M | H21010
ASHRAY SHARMA | H21012
SRAVYA RANGAVAJJULA | H21025
NIHARIKA PRADHAN | H21032
PINAKI MISHRA | H21034
1
ACKNOWLEDGEMENT
We also want to express our gratitude to Prof. P. C. Padhan for giving us the chance to conduct this
research and gain first-hand knowledge of how econometrics is used and used in the specialized field
of human resource management. We gained fresh insights into the field by quantitatively analyzing
human resource management and human behaviour, and our research and experimenting with the
same led to extensive learning.
2
Table of Contents
ACKNOWLEDGEMENT ............................................................................................................................................ 2
TOPIC ...................................................................................................................................................................... 4
PART – A ................................................................................................................................................................. 5
A. VARIOUS REGRESSION MODELS ..................................................................................................................... 5
Straight Line Regression Model: A = β1 + β2*D.............................................................................................. 5
Reciprocal Regression Model: A = β1 + β2/D ................................................................................................. 6
Exponential Regression Model: log(A) = β1 + β2*log(D) ................................................................................ 7
Parabolic Regression Model: A = β1 + β2*D + β3*(D^2) ................................................................................ 8
B. COMPARISON OF THE MODELS ...................................................................................................................... 9
C. HYPOTHESIS TESTING FOR THE BEST MODEL ............................................................................................... 10
1. Sign of the coefficient ............................................................................................................................... 10
2. T-test ......................................................................................................................................................... 10
3. F-Test........................................................................................................................................................ 10
4. Chi-Square Test ......................................................................................................................................... 11
PART - B ................................................................................................................................................................ 11
MULTIPLE LINEAR REGRESSION MODEL ........................................................................................................... 11
1. T-test ......................................................................................................................................................... 12
2. F-test ......................................................................................................................................................... 13
3. Normality .................................................................................................................................................. 13
4. Multicollinearity ........................................................................................................................................ 14
5. Heteroscedasticity .................................................................................................................................... 14
6. Autocorrelation ......................................................................................................................................... 16
7. Chow Test.................................................................................................................................................. 16
NOTE : ........................................................................................................................................................... 17
Owing to the size of the data set, Chi Square was not possible. Hence, it has been excluded above. ........ 17
3
TOPIC
The research aims at studying a relation between various factors on the monthly compensation of the
employees, especially the cash component of the salary.
We chose this topic because since the dawn of the epidemic, many companies, both big and small,
have made changes to their compensation structure, amongst huge layoffs and mass firings. And now
that the things are getting better, firms are trying to analyze past data to figure out what should they
pay their employees to retain the employees and avoid mass attrition. This will help the firms formulate
better compensation policies and benefit programs for the employees based on their demographics
and personalized details.
From theory, we assume the following:
• Older employees tend to earn much more than younger employees, but this can mostly be
attributed to the experience they gain and how high they are in the corporate ladder given the
time they have been working.
• Distance from home does not have a significant impact on the monthly income. This has been
established in high-quality research that distance between residence and workplace has a high
affect on employee absenteeism and attrition rates, but not on the compensation structure.
• The number of companies an employee has worked in before the current organization also has
no significant impact on the employee’s salary. Some people choose to stay with a firm for a
long term and grow internally while others might see as switching jobs as a faster and easier
way to rise up in designations and compensation.
• Percent salary hike refers to how much the salary has been increased since the last promotion.
There is a positive correlation between increase in salary and the monthly compensation.
• The total number of years a person has as experience positively impact an employee’s
compensation, when other variables are held constant.
4
PART – A
A. VARIOUS REGRESSION MODELS
Dependent Variable: Monthly Compensation (A)

Independent Variable: Age (D)
Model 1:
Straight Line Regression Model: A = β1 + β2*D
Interpretation:
1. Regression Equation: A = 256.57*D-2970.67
As Age increases by 1-unit, Monthly Compensation comes down by Rs.256.57 on an average.
2. Covariance between the coefficients:
Covariance = -5028.61
The covariance is negative which shows that:

The impact on the Monthly compensation as β1 increases, will not be equal to the impact on the
monthly compensation as β2 increases.
3. R2 = 0.2479
R2 of the model is 0.2479, which means that 24.79% of the observed variation can be explained
by the model's inputs.
5
4. AIC = 28621.58
5. CV=0.6278
Model 2:
Reciprocal Regression Model: A = β1 + β2/D
Interpretation:
1. Regression Equation: A = 15144 - D/299744.2
As age increases, monthly income decreases by a minuscule
Covariance = -7846223687

monthly compensation as β2 increases
3. R2 = 0.2236
4. AIC=28668.22
5. CV=0.6379
6
Model 3:
Exponential Regression Model: log(A) = β1 + β2*log(D)
Interpretation:
1. Regression Equation: log(A) = 3.75988 + 1.33946*log(D)
As age increases by 1 unit, Monthly Compensation increases.
Covariance = 0.1322619

monthly compensation as β2 increases
3. R2 = 0.2544
4. AIC=2543.27
5. CV=8.82277e-05
7
Model 4:
Parabolic Regression Model: A = β1 + β2*D + β3*(D^2)
Interpretation:
1. Regression Equation: A = -39999.1498+ (312.8211) *D-(0.7247*(D^2))
As age increases, monthly compensation increases
Covariance (β1, β2) = - 260242.931

Covariance (β2, β3) = - 1736.80235
Covariance (β1, β3) = - 2507714.848
• The impact on the Monthly compensation as β1 increases, will not be equal to the impact on
the Monthly compensation as β2 increases
3. R2 = 0.2481
8
4. AIC = 28623.13
5. CV=0.627776
B. COMPARISON OF THE MODELS
1.The Akaike information criterion (AIC) is a mathematical method for evaluating how well a
model fits the data it was generated from.
2.The coefficient of variation (CV) is a relative measure of variability that indicates the size of a
standard deviation in relation to its mean.
**Note: Since our data is cross sectional, we are not using the AIC values to choose the model
We will choosing the data based on the CV value because the model with the lowest CV
value is the best model.
S. No Model CV Value
1 Straight Line Regression Model 0.6278
2 Reciprocal Regression Model 0.6379
Exponential Regression Model 8.82277e-05
3
4 Parabolic Regression Model 0.627776
Exponential Regression Model is the best because it has the lowest CV value.
9
C. HYPOTHESIS TESTING FOR THE BEST MODEL
Exponential Regression Model: log(A) = β1 + β2*log(D)
Regression Equation: log(A) = 3.75988 + 1.33946*log(D)
1. Sign of the coefficient
The sign of the coefficient is positive which indicates that as Age increases by 1 unit, the Monthly
Compensation also increases by a miniscule value. This matches with our theory.
2. T-test
Null Hypothesis: There is no relationship between the Age and Monthly Income (H0 : β2= 0)
Alternate Hypothesis: There is a relationship between the Age and Monthly Income(H1: β2 ≠ 0)
The t value corresponding to the independent variable is 0.000, which means that it is statistically
significant. Hence the null hypothesis will be rejected.
3. F-Test
Since the p value corresponding to the F value is 0.000, hence we can reject the null hypothesis.
10
4. Chi-Square Test
The Chi-squared test can be used to measure the goodness-of-fit of your trained regression model on
the training, validation, or test data sets.
The result shows that the p value is significant, hence we reject the null hypothesis.
PART - B
MULTIPLE LINEAR REGRESSION MODEL
Dependent Variable: Monthly Income (MI)
Independent Variables:
a) Age (A)
b) Distance from Home (DFH)
c) No. of companies (NC)
d) Total Working Years (TWY)
Regression Model: MI = β1 + β2*A + β3*DFH + β4*NC + β5*T
11
Multiple Linear Regression Model:
Monthly Income = 2088.89- 22.95*Age + 490.42*TotalWorkingYears-

57.27*NumCompaniesWorked - 12.62*DistanceFromHome
The final model after correcting for heteroscedascity and multicollinearity:
Std.
Estimate Error t value Pr(>|t|)
(Intercept) 2088.89 310.30 6.731 2.398e-11 ***
Age -22.95 10.00 -2.294 0.02191 .
TotalWorkingYears 490.42 15.18 33.313 < 2.2e-16 ***
NumCompaniesWorked -57.27 34.37 -1.666 0.09589 .
DistanceFromHome -12.62 9.31 -1.355 0.17553
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
• Residual standard error: 2981 on 1465 degrees of freedom

• Multiple R-squared: 0.6001, Adjusted R-squared: 0.599
• F-statistic: 549.6 on 4 and 1465 DF, p-value: < 2.2e-16
1. T-test
H0: There is no statistically significant relationship between monthly income and age.
Ha: There is a significant relationship between monthly income and age.
H0: There is no statistically significant relationship between monthly income and total working years.
Ha: There is a significant relationship between monthly income and total working years.
H0: There is no statistically significant relationship between monthly income and number of
companies worked.
Ha: There is a significant relationship between monthly income and number of companies worked.
H0: There is no statistically significant relationship between monthly income and distance from home.
Ha: There is a significant relationship between monthly income and distance from home.
Since p value is greater than 0.05 for NumCompaniesWorked and DistanceFromHome, we fail to
reject the null hypothesis and conclude that there is no statistically significant relationship.
However, since p value is lower than 0.05 for TotalWorkingYears and Age, we reject the null
hypothesis and conclude that there is a statistically significant relationship amongst the variables.
12
2. F-test
H0: All the regression coefficients are equal to zero.
Ha: All the regression coefficients are not equal to zero.
Since observed p value / significance of F-test is close to 0, null hypothesis can be rejected..
3. Normality
To test the normality, we first used the above formula to compute the Jarque-Bera Statistic.
Where n= total number of observations, S=skewness and K= kurtosis.

The JB statistic follows chi square distribution with degree of freedom=2. Thus, we formulate the
following hypothesis:
H0: Residuals follow normal distribution

Ha: Residuals do not follow normal distribution.
Upon testing normality of the residuals, the Jarque-Bera statistic was found to have a large value of
77.525 with p value less than 0.0001, which is lower than significance level alpha of 0.05. Hence, the
null hypothesis is rejected, and the interpretation follows that the residuals are not normally
distributed.
To correct this, we transformed the dependent variable to square root of monthly income. Upon
testing for normality again, we found Jarque-Bera statistic to be equal to 5.648 with p value equal to
0.0594, which is higher than the significance level alpha of 0.05 and hence it follows that the
residuals are normally distributed.
13
4. Multicollinearity
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 49.946 2.073 24.090 .000
Age -.095 .068 -.033 -1.402 .161 .517 1.935
TotalWorkingYears 2.708 .078 .794 34.751 .000 .536 1.866
NumCompaniesWorked -.203 .187 -.019 -1.090 .276 .907 1.102
DistanceFromHome -.032 .055 -.010 -.591 .555 .999 1.001
a. Dependent Variable: sqrt_MonthlyIncome
Since the Variance Inflating Factor (VIF) is less than 10 for all variables, there is no multicollinearity.
If multicollinearity were present, we would’ve corrected it through one or more of the following steps:
i. Removal of highly correlated variables
ii. Linearly combining some of the independent variables
iii. Perform principal components analysis or partial least squares regression
iv. Use other regression models
5. Heteroscedasticity
14
To correct heteroscedasticity, we applied different transformations like log, square root, cubic root,
weighted/generalized least square but we couldn’t correct for heteroscedasticity.
So, we applied White- Heteroscedasticity SE.
Results before applying transformations:

Std.
(Intercept) 2088.89 363.33 5.749 1.09E-08 ***
Age -22.95 11.84 -1.938 0.0528 .
TotalWorkingYears 490.42 13.66 35.909 <0.0000000000000002 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Results after applying transformations:

Std.
(Intercept) 2088.89 310.30 6.731 2.398e-11 ***
Age -22.95 10.00 -2.294 0.02191 .
TotalWorkingYears 490.42 15.18 33.313 < 2.2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
15
6. Autocorrelation
Model Summaryb
Adjusted R Std. Error of the
Model R R Square Square Estimate Durbin-Watson
1 .768a .590 .589 17.01253 1.998
a. Predictors: (Constant), DistanceFromHome, Age, NumCompaniesWorked,
TotalWorkingYears
b. Dependent Variable: sqrt_MonthlyIncome
n = 1470 and k’ = 4
dU=1.872 and dL=1.883
The hypothesis for autocorrelation are:

H0: ρ=0 zero auto correlation,
H1: ρ≠ 0 autocorrelation is present (positive or negative)
The Durbin-Watson statistic lies between dU and 4-dU, therefore we accept the null hypothesis.
i.e., no autocorrelation is present
Using Newey-West Standard Error method to correct for autocorrelation (just to demonstrate though
no autocorrelation is present):
Std.
(Intercept) 2088.8859 309.6343 6.7463 2.18E-11 ***
Age -22.9526 9.9892 -2.2977 0.02172 *
TotalWorkingYears 490.4237 15.1297 32.4147 <0.00000000000000022 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
7. Chow Test
Regression Stability Test (Chow test)

Score C.V. P-Value Stable? 5.0%
0.401 1.837 19.23% TRUE
H0 : Parameter Stability
H1 : Parameter Instability
The Chow Test score is 0.4 and Co-variance is 2.628 , we get a p-value of 19.23% which is higher
than the significance value of 5%, hence we accept the null hypothesis that the parameters are
stable and there is no structural break in the data.
16
NOTE :
Owing to the size of the data set, Chi Square was not possible. Hence, it has been excluded above.
17

Group 4 - AEC1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Group 4 - AEC1

Uploaded by

Copyright:

Available Formats

IMPACT OF DIFFERENT FACTORS OVER

From theory, we assume the following:

A. VARIOUS REGRESSION MODELS

Dependent Variable: Monthly Compensation (A)

1. Regression Equation: A = 256.57*D-2970.67

As Age increases by 1-unit, Monthly Compensation comes down by Rs.256.57 on an average.

2. Covariance between the coefficients:

The covariance is negative which shows that:

1. Regression Equation: A = 15144 - D/299744.2

As age increases, monthly income decreases by a minuscule

2. Covariance between the coefficients:

The covariance is negative which shows that:

1. Regression Equation: log(A) = 3.75988 + 1.33946*log(D)

As age increases by 1 unit, Monthly Compensation increases.

2. Covariance between the coefficients:

The covariance is negative which shows that:

1. Regression Equation: A = -39999.1498+ (312.8211) *D-(0.7247*(D^2))

As age increases, monthly compensation increases

2. Covariance between the coefficients:

Covariance (β1, β2) = - 260242.931

The covariance is negative which shows that:

B. COMPARISON OF THE MODELS

Exponential Regression Model: log(A) = β1 + β2*log(D)

Regression Equation: log(A) = 3.75988 + 1.33946*log(D)

1. Sign of the coefficient

Dependent Variable: Monthly Income (MI)

Regression Model: MI = β1 + β2*A + β3*DFH + β4*NC + β5*T

Monthly Income = 2088.89- 22.95*Age + 490.42*TotalWorkingYears-

The final model after correcting for heteroscedascity and multicollinearity:

• Residual standard error: 2981 on 1465 degrees of freedom

Where n= total number of observations, S=skewness and K= kurtosis.

H0: Residuals follow normal distribution

Results before applying transformations:

Results after applying transformations:

The hypothesis for autocorrelation are:

Regression Stability Test (Chow test)

You might also like

1. Regression Equation: A = -39999.1498+ (312.8211) D-(0.7247(D^2))

Regression Model: MI = β1 + β2A + β3DFH + β4NC + β5T

Monthly Income = 2088.89- 22.95Age + 490.42TotalWorkingYears-