You are on page 1of 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/289224070

Graduate School Application Advisor Based on Neural Classification System

Article  in  Advances in Intelligent Systems and Computing · January 2014


DOI: 10.1007/978-81-322-1768-8_80

CITATIONS READS

0 34

3 authors, including:

Devarsh Bhonde Sai Krishna Kanth Hari


University of British Columbia - Vancouver Texas A&M University
5 PUBLICATIONS   0 CITATIONS    19 PUBLICATIONS   30 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

UV Localization and Routing View project

All content following this page was uploaded by Devarsh Bhonde on 19 February 2020.

The user has requested enhancement of the downloaded file.


Graduate School Application Advisor
Based on Neural Classification System

Devarsh Bhonde, T. Sri Kalyan and Hari Sai Krishna Kanth

Abstract Neural classification systems are widely used in many fields for making
logical decisions. This paper envisages a neural classification system based on
back propagation algorithm to suggest an advisory model for graduate school
admissions. It uses real and synthetically generated data to advise the students
about the group of graduate schools where they have the maximum probability of
getting selected. The system takes into consideration all the important aspects of
the student’s application such as: the GPA, GRE score, number of publications,
professor recommendation, parent institute rating and work experience in order as
to suggest the group of potential schools. A new parameter named Student Rating
Index (SRI) is also defined for a better representation of the quality of professor
recommendation. The system comprises of a two-layer feed-forward network, with
sigmoid hidden and output neurons to classify the data sets. The results are verified
using mean square error method, Receiver Operator Characteristic (ROC) curve
and confusion matrices. The verification confirms that the proposed system is an
accurate and reliable representation. Thus the proposed advisory system can be
used by the students to make more focused applications in the graduate schools.

Keywords Neural classification 


Graduate school application  Advisory

model Back propagation algorithm

1 Introduction

Across the globe, there may be numerous schools offering a particular graduate
program. The students generally shortlist the graduate schools based on their rank-
ings and on the recommendation of peers currently studying in those schools. There

D. Bhonde (&)  T. Sri Kalyan  H. S. Krishna Kanth


Department of Civil Engineering, Indian Institute of Technology Kharagpur,
Kharagpur, India
e-mail: devarshbhonde@gmail.com

M. Pant et al. (eds.), Proceedings of the Third International Conference on Soft 975
Computing for Problem Solving, Advances in Intelligent Systems and Computing 259,
DOI: 10.1007/978-81-322-1768-8_80,  Springer India 2014
976 D. Bhonde et al.

are discussion forums and admission counselors available for admission advice but
no advisory model is present for the students to suggest the graduate schools where
their chances of getting selected are the best. The total cost of completing the
application procedure for different potential graduate schools can be very high.
In this project a neural classification system based on back propagation algo-
rithm is proposed to advise the students about the group of graduate schools where
have maximum probability of getting selected. The system takes into consideration
all the important aspects of an application, such as: the Grade Point Average
(GPA) of the student, GRE score, professor recommendation, number of publi-
cations, parent institute rating and work experience to suggest the result. A new
parameter named Student Rating Index (SRI) is defined for a better representation
of the quality of professor recommendation. The system is trained based on data
available from prominent sources and it predicts the suitability of the student to get
selected into each defined group of graduate schools. The system proposed is
useful for the students to make more focused applications in the graduate schools
where their chances of getting selected are high. It may also be used to reduce
unnecessary expenditure on application costs for the group of graduate schools
where their probability of selection is not good.

2 Neural Classification System

The neural network classification system helps in classifying various cases into a set
of target categories based on various input parameters that represent the input cases.
It has wide variety of applications in market forecasting, mortgage screening, loan
advising etc. The graduate applications consists of various parameters that repre-
sents their academic and research performance during their undergraduate study.
Due to the large number of input parameters available, a neural classification
system based on back-propagation algorithm using real and synthetic data is
developed. Artificial neural networks analyze data sets one by one, and learn by
comparing the predicted classification of the data set with its actual classification.
The calculated errors from the initial classification of the first set are fed back into
the network, and are then used to modify the networks algorithm the second time
and this process is repeated for n iterations. This process of learning from the error
and updating the model is the basis of back-propagation algorithm.

2.1 Model Formulation

The model is formulated based on back-propagation algorithm using MATLAB’s


Neural Network Pattern Recognition Tool. This tool helps in developing a neural
network to classify inputs into a set of target classes. The tool employs a two-layer
feed-forward network, with sigmoid hidden and output neurons to classify the data
sets.
Graduate School Application Advisor Based on Neural Classification System 977

Fig. 1 Schematic of neural network (Source MATLAB Neural Network Pattern Recognition
toolbox)

A total of 1,000 data sets with 6 input parameters, 10 neurons in the hidden
layer and 3 neurons in the output layer are used for the creation of the model as
shown in Fig. 1. In the model 70 % of the data sets are used for training, 15 % of
the sets are used for validation and the rest 15 % are used for testing of the model.
The input parameters used for each data set are:
1. Grade Point Average (GPA) of the student
2. Graduate Record Examination (GRE) score
3. Student Rating Index
4. Number of Publications
5. Parent Institute Rating
6. Work Experience.

2.2 Result Representation

The proposed neural classification system takes the input parameters for an
applicant into consideration and suggests the group of schools suitable for his
profile. In order to group the schools, three grades of schools namely: Grade A,
Grade B and Grade C are defined based on the ranking of schools available from the
reputed ranking organization: QS Rankings (http://www.topuniversities.com/)
Grade A schools have been defined as the graduate schools with ranking between 1
and 30, Grade B schools comprise of the schools having ranking between 31 and 70
and Grade C schools are defined as the schools having ranking between 71 and 100.

3 Input Data Collection

The input data sets required for training and formulating the neural network are
collected from prominent sources or generated synthetically on the basis of
observed trends. Detailed explanation of the input data used for representing
different input parameters are as follows.
978 D. Bhonde et al.

Fig. 2 Histogram depicting 180


the GPA distribution among 160
the applicants
140

Frequency
120
100
80
60
40
20
0
6 6.5 7 7.5 8 8.5 9 9.5 10
GPA of the applicant

3.1 Grade Point Average (GPA) of the Student

The GPA of a student is one of the most influential parameter in deciding the
outcome of the application and hence is employed in the formulation of the pro-
posed neural network model. The GPA of students studying at various graduate
schools are collected from the online discussion forums: Gradcafe (http://forum.
thegradcafe.com/) and Edulix (http://www.edulix.com/forum/index.php). From the
data collected, it is observed that the GPA varies over the range of 6–10 on a 10
point scale. A histogram representing the range of GPA used for data description is
shown in Fig. 2.

3.2 Graduate Record Examination (GRE) Score

The GRE score is a prerequisite for many universities and hence forms an essential
part of every application. The GRE scores for previous year applicants are also
collected from the online discussion forums: Gradcafe (http://forum.thegradcafe.
com/) and Edulix (http://www.edulix.com/forum/index.php). The range of score
observed in the data set is from 300 to 340 mostly ranging between 315 and 325.

3.3 Student Rating Index

A new parameter called as Student Rating Index is defined to account for more
accurate representation of the professor recommendation value. This index is
dependent on the rating of student given by the professor and also on the repu-
tation of professor in their field of research. The h-index values of the professors
measures the productivity and impact of their published work. The h-index values
for various professors are collected from the citation website Scopus (https://www.
Graduate School Application Advisor Based on Neural Classification System 979

Fig. 3 Histogram showing 400


the distribution of no. of 350
publications among the
300
applicants

Frequency
250
200
150
100
50
0
1 2 3 4 5 6 More
No. of publications

scopus.com/home.url). Thus the final student rating index values ranging from 0 to
1 is formulated based on the following equation:
Student rating index ¼ 0:7  ðrating of studentÞ þ 0:3  ðnormalized h-indexÞ
ð1Þ
Where the rating of student is done on scale of 0–1 and normalized h-index
values are also on a scale of 0–1. It is evident from Eq. 1 that more weightage is
given to the rating given by the professor to the student.

3.4 Number of Publications

The publications of an applicant represents their research background and is given


a high weightage while judging different applications. On account to the inbuilt
difficulty in collecting real world data for the number of publications, it is assumed
that it follows a normal probability distribution function with mean of three
(l = 3) and standard deviation of one (r = 1). The histogram depicting the dis-
tribution can be seen in Fig. 3.

3.5 Parent Institute Rating

The reputation of an applicant’s parent institute is considered an influential


parameters towards the selection of an application as it represents the level of the
competition they experienced and in-depth exposure to their respective fields. The
rating values of various universities are collected from the ranking organization:
QS Rankings (http://www.topuniversities.com/).
980 D. Bhonde et al.

Fig. 4 Histogram depicting 250


the distribution of work
experience among the 200
applicants

Frequency
150

100

50

0
0.5 1 1.5 2 2.5 3
Work Experience (years)

3.6 Work Experience

The work experience represents first hand application of a student’s knowledge in


a given field, which can be in the form of a job or an internship. Large data sets of
work experience for different applicants was not available in the resources, hence
synthetic data is generated for its representation. On observing the available trends
of work experience for various applicants, a skewed distribution like lognormal
probability distribution function is found to be more suitable to represent this
parameter, since a large number of applicants have work experience mostly
ranging from 1 to 2 years and there are fewer applicants with higher work
experience. The lognormal distribution used for its representation can be seen in
Fig. 4.

4 Example Application

The contribution of the applicant’s GPA, GRE score, number of publications,


parent institute rating, work experience and professor recommendation (which is
reflected in the student rating index) in determining the probable grade of schools
in which they can get admitted can be seen in Table 1. In the first example
application, the student has a high GPA, two publications and good work expe-
rience. Even though the applicant has a GRE score which may be termed as low,
the rest of his profile is too good to be rejected by a top notch university. The
system takes these factors into consideration and suggests that the applicant has
high probability of getting selected in Grade A schools. Similar trends are
observed in the real world data, where the academic profile is given a higher
importance than the GRE score. In the second case, the applicant has a low profile
(low GPA and other important factors) and hence is suggested applying to Grade C
schools with a suitability of 1. In the third case, the student has a good profile with
moderate GPA as a result, the system suggests him to apply for Grade B schools to
Graduate School Application Advisor Based on Neural Classification System 981

Table 1 Sample example results for the proposed model


Sr. CPA GRE No. of Parent Work Student Suitability of getting Probable
No store publication institute experience rating admitted into grade grade of
rating index school
A B C
1 9 316 2 3 4 0.8 1 0 0 A
2 6.7 313 0 5 0 0.6 0 0 1 C
3 8.2 320 1 3 2 0.7 0 0.998 0.002 B
4 8.76 327 2 4 1 0.9 1 3E-04 0 A
5 8 315 1 4 1 0.75 0 0.624 0.376 B
6 8.5 320 2 4 2 0.75 0.57 0.43 0 A

improve his selection chances. In the fourth case, the overall profile of the
applicant is great, which is in-turn reflected in the professor recommendation index
too, hence the system recommends him to apply for Grade A schools. In the fifth
case, the applicant has an above average profile, which is more appropriate for
Grade B and Grade C schools. The system rightly predicts the suitability of the
student getting admitted into a Grade B and Grade C schools as 0.624 and 0.376
respectively so that the student is advised to apply for both Grade B and Grade C
schools for best results. Similar results are observed for the sixth case where the
applicant’s profile is a border case between Grade A and Grade B schools. The
model duly predicts the suitability of getting admitted to Grade A and Grade B
schools as 0.57 and 0.43 respectively. Hence the results obtained are in agreement
with the statistics available from various universities regarding their graduate
admissions.

4.1 Verification of Results

The performance of the system is verified by determining the mean square errors,
the Receiver Operator Characteristic (ROC) curve and by computing the confusion
matrix.
The Mean Square Error (MSE). It is the average squared difference between
the target and the outputs which indicates how accurate a model is. The MSE
values obtained in the model during training, validation and testing are 4.27e-3,
6.64e-3 and 4.83e-3 respectively (as shown in Table 2), indicating that the
model is very accurate. The percent error which indicates the fraction of samples
misclassified, has a very low value of 6.67e-1 percent for the proposed model. It
implies that the system fails just 6 in 1,000 times (or has accuracy of 99.33 %)
thereby verifying the accuracy and the reliability of the results.
Receiver Operator Characteristic (ROC) curve. Another useful diagnostic
tool used to get an idea about the accuracy of the model is the Receiver Operator
Characteristic (ROC) curve. If threshold values are assigned to output in the range
of 0–1 for each class of the classifier, the ROC represents the curve of true positive
982 D. Bhonde et al.

Table 2 The mean square error and the percentage error for the proposed system (Source
MATLAB Neural Network Pattern Recognition toolbox)
Process Samples MSE Error (%)
Training 700 4.27709e-3 0
Validation 150 6.64836e-3 6.66666e-1
Testing 150 4.83072e-3 6.66666e-1

Fig. 5 Receiver operator characteristic curve for the proposed system (Source MATLAB Neural
Network Pattern Recognition toolbox)

rate against the false positive rate, where the false positive rate is the ratio of the
number of output values that are less than the threshold to the number of targets
having a value of 0, and the true positive rate represents the ratio of the number of
output values greater than or equal to the threshold to the number of targets having
Graduate School Application Advisor Based on Neural Classification System 983

Fig. 6 Confusion matrices for the proposed system (Source MATLAB Neural Network Pattern
Recognition toolbox)

a value 1. This curve represents the inherent capacity of the model to discriminate
different classes of outputs. From Fig. 5 it can clearly be seen that the upper left
corner points have near 100 % specificity (false positive rate) and almost 100 %
sensitivity (true positive rate). Hence it can be concluded that the model can
accurately distinguish a particular class from the others.
Confusion matrices. The confusion matrices for training, validating, testing
and the overall process can be seen in Fig. 6. It is observed that the output is
accurate as the number of correct responses which are indicated in the green
squares [squares with indices (1, 1), (2, 2), (3, 3)] are high and the number of
incorrect responses represented in the red squares [squares with indices (1, 2), (1,
3), (2, 1), (2, 3), (3, 1), (3, 2)] are low. The overall accuracies indicated in the
984 D. Bhonde et al.

lower right blue squares [squares with index (4, 4)] are high, justifying the reli-
ability of the system.
It can be inferred from the verification of the results that the graduate school
application advisory model based on neural classification system developed in this
project is an accurate and reliable model which closely resembles the statistics
available.

5 Conclusion

An applicant can have numerous options while applying for graduate school
programs. In the present scenario, the applicants finalize potential graduate schools
based on their rankings and advice of peers currently studying in those schools.
The total cost of completing the application procedure for different schools can be
very high. In this project a neural classification system is proposed to advice the
applicants about the graduate schools where their chances of selection are good
based on the student’s application aspects namely: the Grade Point Average (GPA)
of the student, GRE Score, professor recommendation, number of publications,
parent institute rating and work experience to suggest the result. A new parameter
named Student Rating Index (SRI) is defined for a better representation of the
quality of professor recommendation. The model is trained based on the data from
prominent sources and the results are verified using mean square error method,
Receiver Operator Characteristic (ROC) curve and confusion matrices. The veri-
fication confirms that the proposed system is an accurate and reliable represen-
tation. Hence it can be used by the applicants to make more focused applications in
the graduate schools where their chances of selection are the best. The model can
also be used by the applicant as an advisor to reduce expenditure on the application
costs of those graduate schools where his chances of selection are not good.

Acknowledgments The authors would like to thank Mr. Pushpal Mazumder of the Department
of Civil Engineering at the Indian Institute of Technology Kharagpur for his help in the data
extraction process.

References

1. Huang, M.H.: Opening the black box of QS World University Rankings. Res. Eval. 21(1),
71–78 (2012)
2. Raghunathan, K.: Demystifying the American graduate admissions process (Online). Available
http://nlp.stanford.edu/*rkarthik/DAGAP.pdf (2010). Accessed 5 June 2013
3. MathWorks: MATLAB 7.12 (R2011bSPM12), The Language of Technical Computing. The
MathWorks, Inc., Natick, Massachusetts (2011)
4. Meho, L.I., Rogers. Y.: Citation counting, citation ranking, and h-index of human-computer
interaction researchers: a comparison of Scopus and web of science. J. Am. Soc. Inf. Sci.
Technol. 59(11), 1711–1726 (2008)
Graduate School Application Advisor Based on Neural Classification System 985

5. Pratihar, D.K.: Soft Computing. Alpha Science International Ltd (2007)


6. Quacquarelli Symonds: QS Top Universities (Online). Available http://www.topuniversities.
com/ (2013). Accessed 1 June 2013
7. SCOPUS 2013 (Online). Available http://www.scopus.com/home.url (2013). Accessed 29
May 2013

View publication stats

You might also like