You are on page 1of 12

ANALYSIS OF ADMISSIONS

DATASET:

Project by:

• Kuldeep
• Vignesh
• Varun S
• Shivang
• Utkarsh
OVERVIEW:

 Business problem

 Data understanding

 Data visualization using tableau

 Univariate and bivariate analysis

 Model building

 Evaluation metrics

 Key insights
BUSINESS PROBLEM:

Predicting whether the enrolled students are going


to get placed or not based on there profile.
Hence improving the caliber of the college.
DATA UNDERSTANDING:

 The given data is of the shape (391,19).

 Based on the business problem, we are taking ‘Placement’ as dependent


variable and the rest all are independent variable.

 On observing the distribution of dependent variable, we can say that dataset


is imbalance.

 We found few missing values in the feature Entrance_Test, which is


replaced with value ‘Direct_Adimission’
DATA VISUALIZATION USING TABLEAU:
MARKS DISTRIBUTION:
UNIVARIATE AND BIVARIATE
ANALYSIS:
 Placed Vs Not placed
GENDER VS PLACEMENT SPECIALIZATION vs PLACEMENT
MODEL BUILDING:

 We have used the following models:

-Logistic regression

-Decision tree

-KNN classifier

-Naïve based

-Support vector machine


EVOLUTION METRICS:

MODELS ACCURACY SCORE AUCROC

Logistic regression 0.77 0.5

Decision tree 0.72 0.5

KNN classifier 0.76 0.5

Naïve based 0.71 0.5

Support vector machine 0.81 0.5


KEY INSIGHTS:

 Since the model is imbalanced, we take AUCROC as an evaluation metric.

 Considering the evolution metrics of all the models i.e. in the range of 0.5-0.6, which
infer that the model do not distinguish between the classes.

 This may occur due to insufficient records or improper selection of variables.

 Based on the Accuracy score, we can infer that support vector machine is a better
model.

 The SVM is applicable for the dataset with less number of records, which also holds
true in this case.

You might also like