IIMB Term 1 DS Dean Answers

Easwaran Iyer for admitting students.

To identify the variables to be used by Easwaran for analysis, lets do a regression analysis. In regression

analysis, following is the process:

Variable

Gender

Percent_SSC

Board_SSC

Percent_HSC

Board_HSC

Stream_HSC

Percent_Degree

Course_Degree

Experience_Yrs

Entrance_Test

Percentile_ET

Percent_MBA

Specialization_MBA

Marks_Communication

Marks_Projectwork

Marks_BOCA

Placement

Salary

Type

Categorical

Numerical

Categorical

Numerical

Categorical

Categorical

Numerical

Categorical

Interval

Categorical

Numerical

Numerical

Categorical

Numerical

Numerical

Numerical

Categorical

Numerical

No

No

No

No

No

No

No

No

No

Yes

Yes

No

No

No

No

No

No

No

Also, the correlation matrix of all the numerical variables (except salary, which is

a dependent variable) is given below. Same has been extracted from the

program R.

Experien

ce_Yrs

Marks_Proj

ectwork

Marks_

BOCA

Percenti

le_ET

Percent

_HSC

Percent_

Degree

Marks_

Commun

ication

Percent

_MBA

Percen

t_SSC

Experience_Y

rs

100%

18%

14%

6%

5%

-6%

15%

21%

1%

Marks_Projec

twork

18%

100%

18%

4%

13%

23%

28%

35%

9%

Marks_BOCA

14%

18%

100%

34%

15%

28%

19%

43%

29%

Percentile_ET

6%

4%

34%

100%

17%

9%

19%

30%

32%

Percent_HSC

5%

13%

15%

17%

100%

31%

23%

32%

33%

Percent_Degr

ee

-6%

23%

28%

9%

31%

100%

35%

41%

33%

Marks_Comm

unication

15%

28%

19%

19%

23%

35%

100%

73%

44%

Percent_MBA

21%

35%

43%

30%

32%

41%

73%

100%

49%

Percent_SSC

1%

9%

29%

32%

33%

33%

44%

49%

100%

We have also observed that few data points are missing from the entrance test,

and the percentage in entrance test. However, since our analysis is to find out

salary of the placed students, and when we looked at the data, we figured that

data points are not missing for the placed students. While doing the regression,

we have only taken the data points, where students are placed.

After doing the preliminary steps, lets estimate the regression parameters. We

have used StatTools for the same purpose. We have also excluded placement

variable as it is not adding any new information.

Following is the solution:

Regression Table

Constant

Marks_Communication

Gender (F)

Experience_Yrs

Specialization_MBA (Marketing

& Finance)

Specialization_MBA (Marketing

& HR)

Coefficien

t

121087.

1

2620.87

7

42246.3

5

20264.2

4

10182.7

9

21459.0

2

Standard

Error

49303.

35

638.90

87

12486.

27

8333.7

68

32686.

56

33046.

8

tValue

2.45

6

4.10

2

3.38

3

2.43

2

0.31

2

0.64

9

p-Value

0.0147

<

0.0001

0.0008

0.0157

0.7557

0.5167

However, when we diagnose the model, we found that the model is not valid, as

the errors are not normal.

Thus, we need to make changes to the model. Lets try with making dependent

variable ln(salary).

Following is the output with this change:

Coefficie

nt

Regression Table

11.985

Marks_Communi

cation

0.0086

Gender (F)

-0.133

Specialization_M

BA (Marketing &

HR)

t-Value

0.15

42

0.00

2

0.0417

0.03

92

0.06

1

0.10

25

-0.077

0.10

4

0.1632

77.7120

78

4.30084

96

3.38144

9

2.67343

92

0.40647

39

0.73649

2

Errors are normal as seen below:

Histogram of Residuals

80

70

60

50

40

30

20

10

0

Frequency

p-Value

Error

Constant

Course_Degree

(Engineering)

Specialization_M

BA (Marketing &

Finance)

Stand

ard

Multicollinearity Checking

VIF

R-Square

<

0.0001

<

0.0001

1.07762809

0.072036

067

0.0008

1.082456251

0.0080

1.028763158

0.6847

8.513941538

0.076175

135

0.027958

969

0.882545

588

0.4621

8.570918457

0.883326

39

Residual Fit diagram also validate the model. There is no pattern and the

numbers are randomly distributed. Clearly, homoscedasticity is abs

Scatterplot of Fit vs ln(salary)

13.0

12.8

12.6

Fit

12.4

12.2

12.0

11.8

11.5

12.0

12.5

13.0

13.5

14.0

ln(s alary)

ent.

Also, when we see the output, specialization MBA variables are not significant

statistically at 95% significance level.

Thus, the new solution will be:

Coefficie

nt

Regression Table

Stand

ard

t-Value

p-Value

77.7120

78

4.30084

96

3.38144

9

2.67343

92

<

0.0001

<

0.0001

Multicollinearity Checking

Error

Constant

11.985

Marks_Communi

cation

0.0086

Gender (F)

-0.133

Course_Degree

(Engineering)

0.1632

0.15

42

0.00

2

0.03

92

0.06

1

VIF

R-Square

1.07762809

0.072036

067

0.0008

1.082456251

0.0080

1.028763158

Thus, the dean should include communication marks, gender and course degree

(whether it is engineering or not) as its main decision variables.

The corresponding R square, Ftest and error values are:

Stepwise Regression for

ln(salary)

Summary

ANOVA Table

Explained

Multip

le

RSquar

e

Adjusted

Std. Err. of

R-square

Estimate

0.40

53

0.16

43

0.1479

0.280884

336

Degre

es of

Sum

of

Mean of

Freed

om

Squar

es

Squares

3.95

0.79092

10.02486

0.076175

135

0.027958

969

255

Unexplained

46

20.1

18

18

0.07889

6

394

The p-value corresponding to F-value of 10 is less than 0.0001, hence the model

is valid.

at the time of admission. How should these

parameters be incorporated while building the

model?

The variables that should be included in the analysis should be those that are available at the time of admission.

Following table shows all the variables, and if they are available at the time of admission.

Variable

Gender

Percent_SSC

Board_SSC

Percent_HSC

Board_HSC

Stream_HSC

Percent_Degree

Course_Degree

Experience_Yrs

Entrance_Test

Percentile_ET

Percent_MBA

Specialization_MBA

Marks_Communication

Marks_Projectwork

Marks_BOCA

Placement

Salary

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

No

No

No

No

No

Dependent Variable

Include?

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

No

No

No

No

No

Dependent Variable

Coefficie

nt

Stand

ard

Regression Table

Error

Constant

11.985

0.15

42

Gender (F)

-0.133

Course_Degree

(Engineering)

0.1632

0.03

92

0.06

1

t-Value

p-Value

77.7120

78

3.38144

9

2.67343

92

<

0.0001

Multicollinearity Checking

VIF

0.0008

1.082456251

0.0080

1.028763158

R-Square

0.076175

135

0.027958

969

(percentage marks in different board exams) on the

salary earned at the time of graduation?

Regression output with SSC marks:

Coefficient

Percent_SSC

t-Value

p-Value

Error

Regression Table

Constant

Standard

199867.5

001

1140.117

736

32318.03

541

486.0677

216

6.184395

11

2.345594

421

< 0.0001

0.0196

Thus, salary is directly related to the placement. For every percent increase in

SSC marks, There is a increase in salary of 1140 INR.

Although the model is not valid, as the errors are not normal. Hence, the above

solution is valid only if we assume errors are normal.

Histogram of Residuals

160

140

120

100

80

60

Frequency

40

20

0

Iyer for admitting students to the MBA program?

The final equation that we have figured out from the analysis that is valid is

dependent on Gender and Course degree of the applicant. However, the R

square value is only 16.43%.

The equation is:

Ln (Salary) = 11.985 0.133 Female_Gender + 0.1632 Engineering_Degree

Thus, salary has been higher for Male Genders, and engineering degree holders.

Thus, Easwaran should keep an eye for applicants who are engineers and are

male. We would also like to say that Easwaran needs to be cautious as these two

variables only explain 16% of the salary. Thus, more different variables are to be

explored which are not provided. For example, salaries can be dependent on

incoming work function, and industry of the applicants.

