Professional Documents
Culture Documents
Bhonde2014 Chapter GraduateSchoolApplicationAdvis-1
Bhonde2014 Chapter GraduateSchoolApplicationAdvis-1
net/publication/289224070
CITATIONS READS
0 34
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Devarsh Bhonde on 19 February 2020.
Abstract Neural classification systems are widely used in many fields for making
logical decisions. This paper envisages a neural classification system based on
back propagation algorithm to suggest an advisory model for graduate school
admissions. It uses real and synthetically generated data to advise the students
about the group of graduate schools where they have the maximum probability of
getting selected. The system takes into consideration all the important aspects of
the student’s application such as: the GPA, GRE score, number of publications,
professor recommendation, parent institute rating and work experience in order as
to suggest the group of potential schools. A new parameter named Student Rating
Index (SRI) is also defined for a better representation of the quality of professor
recommendation. The system comprises of a two-layer feed-forward network, with
sigmoid hidden and output neurons to classify the data sets. The results are verified
using mean square error method, Receiver Operator Characteristic (ROC) curve
and confusion matrices. The verification confirms that the proposed system is an
accurate and reliable representation. Thus the proposed advisory system can be
used by the students to make more focused applications in the graduate schools.
1 Introduction
Across the globe, there may be numerous schools offering a particular graduate
program. The students generally shortlist the graduate schools based on their rank-
ings and on the recommendation of peers currently studying in those schools. There
M. Pant et al. (eds.), Proceedings of the Third International Conference on Soft 975
Computing for Problem Solving, Advances in Intelligent Systems and Computing 259,
DOI: 10.1007/978-81-322-1768-8_80, Springer India 2014
976 D. Bhonde et al.
are discussion forums and admission counselors available for admission advice but
no advisory model is present for the students to suggest the graduate schools where
their chances of getting selected are the best. The total cost of completing the
application procedure for different potential graduate schools can be very high.
In this project a neural classification system based on back propagation algo-
rithm is proposed to advise the students about the group of graduate schools where
have maximum probability of getting selected. The system takes into consideration
all the important aspects of an application, such as: the Grade Point Average
(GPA) of the student, GRE score, professor recommendation, number of publi-
cations, parent institute rating and work experience to suggest the result. A new
parameter named Student Rating Index (SRI) is defined for a better representation
of the quality of professor recommendation. The system is trained based on data
available from prominent sources and it predicts the suitability of the student to get
selected into each defined group of graduate schools. The system proposed is
useful for the students to make more focused applications in the graduate schools
where their chances of getting selected are high. It may also be used to reduce
unnecessary expenditure on application costs for the group of graduate schools
where their probability of selection is not good.
The neural network classification system helps in classifying various cases into a set
of target categories based on various input parameters that represent the input cases.
It has wide variety of applications in market forecasting, mortgage screening, loan
advising etc. The graduate applications consists of various parameters that repre-
sents their academic and research performance during their undergraduate study.
Due to the large number of input parameters available, a neural classification
system based on back-propagation algorithm using real and synthetic data is
developed. Artificial neural networks analyze data sets one by one, and learn by
comparing the predicted classification of the data set with its actual classification.
The calculated errors from the initial classification of the first set are fed back into
the network, and are then used to modify the networks algorithm the second time
and this process is repeated for n iterations. This process of learning from the error
and updating the model is the basis of back-propagation algorithm.
Fig. 1 Schematic of neural network (Source MATLAB Neural Network Pattern Recognition
toolbox)
A total of 1,000 data sets with 6 input parameters, 10 neurons in the hidden
layer and 3 neurons in the output layer are used for the creation of the model as
shown in Fig. 1. In the model 70 % of the data sets are used for training, 15 % of
the sets are used for validation and the rest 15 % are used for testing of the model.
The input parameters used for each data set are:
1. Grade Point Average (GPA) of the student
2. Graduate Record Examination (GRE) score
3. Student Rating Index
4. Number of Publications
5. Parent Institute Rating
6. Work Experience.
The proposed neural classification system takes the input parameters for an
applicant into consideration and suggests the group of schools suitable for his
profile. In order to group the schools, three grades of schools namely: Grade A,
Grade B and Grade C are defined based on the ranking of schools available from the
reputed ranking organization: QS Rankings (http://www.topuniversities.com/)
Grade A schools have been defined as the graduate schools with ranking between 1
and 30, Grade B schools comprise of the schools having ranking between 31 and 70
and Grade C schools are defined as the schools having ranking between 71 and 100.
The input data sets required for training and formulating the neural network are
collected from prominent sources or generated synthetically on the basis of
observed trends. Detailed explanation of the input data used for representing
different input parameters are as follows.
978 D. Bhonde et al.
Frequency
120
100
80
60
40
20
0
6 6.5 7 7.5 8 8.5 9 9.5 10
GPA of the applicant
The GPA of a student is one of the most influential parameter in deciding the
outcome of the application and hence is employed in the formulation of the pro-
posed neural network model. The GPA of students studying at various graduate
schools are collected from the online discussion forums: Gradcafe (http://forum.
thegradcafe.com/) and Edulix (http://www.edulix.com/forum/index.php). From the
data collected, it is observed that the GPA varies over the range of 6–10 on a 10
point scale. A histogram representing the range of GPA used for data description is
shown in Fig. 2.
The GRE score is a prerequisite for many universities and hence forms an essential
part of every application. The GRE scores for previous year applicants are also
collected from the online discussion forums: Gradcafe (http://forum.thegradcafe.
com/) and Edulix (http://www.edulix.com/forum/index.php). The range of score
observed in the data set is from 300 to 340 mostly ranging between 315 and 325.
A new parameter called as Student Rating Index is defined to account for more
accurate representation of the professor recommendation value. This index is
dependent on the rating of student given by the professor and also on the repu-
tation of professor in their field of research. The h-index values of the professors
measures the productivity and impact of their published work. The h-index values
for various professors are collected from the citation website Scopus (https://www.
Graduate School Application Advisor Based on Neural Classification System 979
Frequency
250
200
150
100
50
0
1 2 3 4 5 6 More
No. of publications
scopus.com/home.url). Thus the final student rating index values ranging from 0 to
1 is formulated based on the following equation:
Student rating index ¼ 0:7 ðrating of studentÞ þ 0:3 ðnormalized h-indexÞ
ð1Þ
Where the rating of student is done on scale of 0–1 and normalized h-index
values are also on a scale of 0–1. It is evident from Eq. 1 that more weightage is
given to the rating given by the professor to the student.
Frequency
150
100
50
0
0.5 1 1.5 2 2.5 3
Work Experience (years)
4 Example Application
improve his selection chances. In the fourth case, the overall profile of the
applicant is great, which is in-turn reflected in the professor recommendation index
too, hence the system recommends him to apply for Grade A schools. In the fifth
case, the applicant has an above average profile, which is more appropriate for
Grade B and Grade C schools. The system rightly predicts the suitability of the
student getting admitted into a Grade B and Grade C schools as 0.624 and 0.376
respectively so that the student is advised to apply for both Grade B and Grade C
schools for best results. Similar results are observed for the sixth case where the
applicant’s profile is a border case between Grade A and Grade B schools. The
model duly predicts the suitability of getting admitted to Grade A and Grade B
schools as 0.57 and 0.43 respectively. Hence the results obtained are in agreement
with the statistics available from various universities regarding their graduate
admissions.
The performance of the system is verified by determining the mean square errors,
the Receiver Operator Characteristic (ROC) curve and by computing the confusion
matrix.
The Mean Square Error (MSE). It is the average squared difference between
the target and the outputs which indicates how accurate a model is. The MSE
values obtained in the model during training, validation and testing are 4.27e-3,
6.64e-3 and 4.83e-3 respectively (as shown in Table 2), indicating that the
model is very accurate. The percent error which indicates the fraction of samples
misclassified, has a very low value of 6.67e-1 percent for the proposed model. It
implies that the system fails just 6 in 1,000 times (or has accuracy of 99.33 %)
thereby verifying the accuracy and the reliability of the results.
Receiver Operator Characteristic (ROC) curve. Another useful diagnostic
tool used to get an idea about the accuracy of the model is the Receiver Operator
Characteristic (ROC) curve. If threshold values are assigned to output in the range
of 0–1 for each class of the classifier, the ROC represents the curve of true positive
982 D. Bhonde et al.
Table 2 The mean square error and the percentage error for the proposed system (Source
MATLAB Neural Network Pattern Recognition toolbox)
Process Samples MSE Error (%)
Training 700 4.27709e-3 0
Validation 150 6.64836e-3 6.66666e-1
Testing 150 4.83072e-3 6.66666e-1
Fig. 5 Receiver operator characteristic curve for the proposed system (Source MATLAB Neural
Network Pattern Recognition toolbox)
rate against the false positive rate, where the false positive rate is the ratio of the
number of output values that are less than the threshold to the number of targets
having a value of 0, and the true positive rate represents the ratio of the number of
output values greater than or equal to the threshold to the number of targets having
Graduate School Application Advisor Based on Neural Classification System 983
Fig. 6 Confusion matrices for the proposed system (Source MATLAB Neural Network Pattern
Recognition toolbox)
a value 1. This curve represents the inherent capacity of the model to discriminate
different classes of outputs. From Fig. 5 it can clearly be seen that the upper left
corner points have near 100 % specificity (false positive rate) and almost 100 %
sensitivity (true positive rate). Hence it can be concluded that the model can
accurately distinguish a particular class from the others.
Confusion matrices. The confusion matrices for training, validating, testing
and the overall process can be seen in Fig. 6. It is observed that the output is
accurate as the number of correct responses which are indicated in the green
squares [squares with indices (1, 1), (2, 2), (3, 3)] are high and the number of
incorrect responses represented in the red squares [squares with indices (1, 2), (1,
3), (2, 1), (2, 3), (3, 1), (3, 2)] are low. The overall accuracies indicated in the
984 D. Bhonde et al.
lower right blue squares [squares with index (4, 4)] are high, justifying the reli-
ability of the system.
It can be inferred from the verification of the results that the graduate school
application advisory model based on neural classification system developed in this
project is an accurate and reliable model which closely resembles the statistics
available.
5 Conclusion
An applicant can have numerous options while applying for graduate school
programs. In the present scenario, the applicants finalize potential graduate schools
based on their rankings and advice of peers currently studying in those schools.
The total cost of completing the application procedure for different schools can be
very high. In this project a neural classification system is proposed to advice the
applicants about the graduate schools where their chances of selection are good
based on the student’s application aspects namely: the Grade Point Average (GPA)
of the student, GRE Score, professor recommendation, number of publications,
parent institute rating and work experience to suggest the result. A new parameter
named Student Rating Index (SRI) is defined for a better representation of the
quality of professor recommendation. The model is trained based on the data from
prominent sources and the results are verified using mean square error method,
Receiver Operator Characteristic (ROC) curve and confusion matrices. The veri-
fication confirms that the proposed system is an accurate and reliable represen-
tation. Hence it can be used by the applicants to make more focused applications in
the graduate schools where their chances of selection are the best. The model can
also be used by the applicant as an advisor to reduce expenditure on the application
costs of those graduate schools where his chances of selection are not good.
Acknowledgments The authors would like to thank Mr. Pushpal Mazumder of the Department
of Civil Engineering at the Indian Institute of Technology Kharagpur for his help in the data
extraction process.
References
1. Huang, M.H.: Opening the black box of QS World University Rankings. Res. Eval. 21(1),
71–78 (2012)
2. Raghunathan, K.: Demystifying the American graduate admissions process (Online). Available
http://nlp.stanford.edu/*rkarthik/DAGAP.pdf (2010). Accessed 5 June 2013
3. MathWorks: MATLAB 7.12 (R2011bSPM12), The Language of Technical Computing. The
MathWorks, Inc., Natick, Massachusetts (2011)
4. Meho, L.I., Rogers. Y.: Citation counting, citation ranking, and h-index of human-computer
interaction researchers: a comparison of Scopus and web of science. J. Am. Soc. Inf. Sci.
Technol. 59(11), 1711–1726 (2008)
Graduate School Application Advisor Based on Neural Classification System 985