You are on page 1of 7

1.

Identify the variables that should be used by


Easwaran Iyer for admitting students.
To identify the variables to be used by Easwaran for analysis, lets do a regression analysis. In regression
analysis, following is the process:

Following property is available about the data:


Variable
Gender
Percent_SSC
Board_SSC
Percent_HSC
Board_HSC
Stream_HSC
Percent_Degree
Course_Degree
Experience_Yrs
Entrance_Test
Percentile_ET
Percent_MBA
Specialization_MBA
Marks_Communication
Marks_Projectwork
Marks_BOCA
Placement
Salary

Type
Categorical
Numerical
Categorical
Numerical
Categorical
Categorical
Numerical
Categorical
Interval
Categorical
Numerical
Numerical
Categorical
Numerical
Numerical
Numerical
Categorical
Numerical

Data Points Missing?


No
No
No
No
No
No
No
No
No
Yes
Yes
No
No
No
No
No
No
No

Also, the correlation matrix of all the numerical variables (except salary, which is
a dependent variable) is given below. Same has been extracted from the
program R.

Experien
ce_Yrs

Marks_Proj
ectwork

Marks_
BOCA

Percenti
le_ET

Percent
_HSC

Percent_
Degree

Marks_
Commun
ication

Percent
_MBA

Percen
t_SSC

Experience_Y
rs

100%

18%

14%

6%

5%

-6%

15%

21%

1%

Marks_Projec
twork

18%

100%

18%

4%

13%

23%

28%

35%

9%

Marks_BOCA

14%

18%

100%

34%

15%

28%

19%

43%

29%

Percentile_ET

6%

4%

34%

100%

17%

9%

19%

30%

32%

Percent_HSC

5%

13%

15%

17%

100%

31%

23%

32%

33%

Percent_Degr
ee

-6%

23%

28%

9%

31%

100%

35%

41%

33%

Marks_Comm
unication

15%

28%

19%

19%

23%

35%

100%

73%

44%

Percent_MBA

21%

35%

43%

30%

32%

41%

73%

100%

49%

Percent_SSC

1%

9%

29%

32%

33%

33%

44%

49%

100%

We have also observed that few data points are missing from the entrance test,
and the percentage in entrance test. However, since our analysis is to find out
salary of the placed students, and when we looked at the data, we figured that
data points are not missing for the placed students. While doing the regression,
we have only taken the data points, where students are placed.
After doing the preliminary steps, lets estimate the regression parameters. We
have used StatTools for the same purpose. We have also excluded placement
variable as it is not adding any new information.
Following is the solution:

Regression Table
Constant
Marks_Communication

Gender (F)
Experience_Yrs
Specialization_MBA (Marketing
& Finance)

Specialization_MBA (Marketing
& HR)

Coefficien
t

121087.
1
2620.87
7
42246.3
5
20264.2
4
10182.7
9
21459.0
2

Standard
Error

49303.
35
638.90
87
12486.
27
8333.7
68
32686.
56
33046.
8

tValue

2.45
6
4.10
2
3.38
3
2.43
2
0.31
2
0.64
9

p-Value

0.0147
<
0.0001
0.0008
0.0157
0.7557
0.5167

However, when we diagnose the model, we found that the model is not valid, as
the errors are not normal.

Thus, we need to make changes to the model. Lets try with making dependent
variable ln(salary).
Following is the output with this change:
Coefficie
nt
Regression Table

11.985

Marks_Communi
cation

0.0086

Gender (F)

-0.133

Specialization_M
BA (Marketing &
HR)

t-Value

0.15
42
0.00
2

0.0417

0.03
92
0.06
1
0.10
25

-0.077

0.10
4

0.1632

77.7120
78
4.30084
96
3.38144
9
2.67343
92
0.40647
39
0.73649
2

Lets check the validity of the model:


Errors are normal as seen below:
Histogram of Residuals
80
70
60
50
40
30
20
10
0
Frequency

p-Value

Error

Constant

Course_Degree
(Engineering)
Specialization_M
BA (Marketing &
Finance)

Stand
ard

Multicollinearity Checking
VIF

R-Square

<
0.0001
<
0.0001

1.07762809

0.072036
067

0.0008

1.082456251

0.0080

1.028763158

0.6847

8.513941538

0.076175
135
0.027958
969
0.882545
588

0.4621

8.570918457

0.883326
39

Residual Fit diagram also validate the model. There is no pattern and the
numbers are randomly distributed. Clearly, homoscedasticity is abs
Scatterplot of Fit vs ln(salary)
13.0
12.8
12.6
Fit

12.4
12.2
12.0
11.8
11.5

12.0

12.5

13.0

13.5

14.0

ln(s alary)

ent.

Also, when we see the output, specialization MBA variables are not significant
statistically at 95% significance level.
Thus, the new solution will be:
Coefficie
nt
Regression Table

Stand
ard

t-Value

p-Value

77.7120
78
4.30084
96
3.38144
9
2.67343
92

<
0.0001
<
0.0001

Multicollinearity Checking

Error

Constant

11.985

Marks_Communi
cation

0.0086

Gender (F)

-0.133

Course_Degree
(Engineering)

0.1632

0.15
42
0.00
2
0.03
92
0.06
1

VIF

R-Square

1.07762809

0.072036
067

0.0008

1.082456251

0.0080

1.028763158

Thus, the dean should include communication marks, gender and course degree
(whether it is engineering or not) as its main decision variables.
The corresponding R square, Ftest and error values are:
Stepwise Regression for
ln(salary)
Summary

ANOVA Table
Explained

Multip
le

RSquar
e

Adjusted

Std. Err. of

R-square

Estimate

0.40
53

0.16
43

0.1479

0.280884
336

Degre
es of

Sum
of

Mean of

Freed
om

Squar
es

Squares

3.95

0.79092

10.02486

0.076175
135
0.027958
969

255

Unexplained

46
20.1
18

18
0.07889
6

394

The p-value corresponding to F-value of 10 is less than 0.0001, hence the model
is valid.

2. Parameters such as MBA marks will not be available


at the time of admission. How should these
parameters be incorporated while building the
model?
The variables that should be included in the analysis should be those that are available at the time of admission.
Following table shows all the variables, and if they are available at the time of admission.
Variable
Gender
Percent_SSC
Board_SSC
Percent_HSC
Board_HSC
Stream_HSC
Percent_Degree
Course_Degree
Experience_Yrs
Entrance_Test
Percentile_ET
Percent_MBA
Specialization_MBA
Marks_Communication
Marks_Projectwork
Marks_BOCA
Placement
Salary

Available at time of admission?


Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
Dependent Variable

Include?
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
Dependent Variable

Thus, the new solution, after removing communication marks is as follows:


Coefficie
nt

Stand
ard

Regression Table

Error

Constant

11.985

0.15
42

Gender (F)

-0.133

Course_Degree
(Engineering)

0.1632

0.03
92
0.06
1

t-Value

p-Value

77.7120
78
3.38144
9
2.67343
92

<
0.0001

Multicollinearity Checking
VIF

0.0008

1.082456251

0.0080

1.028763158

R-Square

0.076175
135
0.027958
969

3. What is the impact of academic performance


(percentage marks in different board exams) on the
salary earned at the time of graduation?
Regression output with SSC marks:
Coefficient

Percent_SSC

t-Value

p-Value

Error

Regression Table
Constant

Standard

199867.5
001
1140.117
736

32318.03
541
486.0677
216

6.184395
11
2.345594
421

< 0.0001
0.0196

Thus, salary is directly related to the placement. For every percent increase in
SSC marks, There is a increase in salary of 1140 INR.
Although the model is not valid, as the errors are not normal. Hence, the above
solution is valid only if we assume errors are normal.

Histogram of Residuals
160
140
120
100
80
60
Frequency

40
20
0

4. What are your final recommendations to Easwaran


Iyer for admitting students to the MBA program?
The final equation that we have figured out from the analysis that is valid is
dependent on Gender and Course degree of the applicant. However, the R
square value is only 16.43%.
The equation is:
Ln (Salary) = 11.985 0.133 Female_Gender + 0.1632 Engineering_Degree

Thus, salary has been higher for Male Genders, and engineering degree holders.
Thus, Easwaran should keep an eye for applicants who are engineers and are
male. We would also like to say that Easwaran needs to be cautious as these two
variables only explain 16% of the salary. Thus, more different variables are to be
explored which are not provided. For example, salaries can be dependent on
incoming work function, and industry of the applicants.