Documentation Miniproject Alen-Final

MAR ATHANASIUS COLLEGE OF ENGINEERING
(Affiliated to APJ Abdul Kalam Technological University,

TVM) KOTHAMANGALAM
Department of Computer Applications
Mini Project Report
ACADEMIC ANALYTICS USING MACHINE

LEARNING
Done by
Alen George
Reg No: MAC21MCA-2002
Under the guidance of

Prof. Beena Jacob
2021-2023
MAR ATHANASIUS COLLEGE OF ENGINEERING
(Affiliated to APJ Abdul Kalam Technological University,
TVM) KOTHAMANGALAM
CERTIFICATE
ACADEMIC ANALYTICS USING MACHINE

LEARNING
Certified that this is the bonafide record of project work done by
Alen George
Reg No: MAC21MCA-2002
During the academic year 2022-2023, in partial fulfilment of requirements for

award of the degree,
Master of Computer Applications
of
APJ Abdul Kalam Technological
University Thiruvananthapuram
Faculty Guide Head of the Department

Prof. Beena Jacob Prof. Biju Skaria
Project Coordinator Internal Examiner

Prof. Manu John
ACKNOWLEDGEMENT
First and foremost, I thank God Almighty for his divine grace and blessings in
making all this possible. May he continue to lead me in the years to come.
I am also grateful to Prof. Biju Skaria, Head of the Department, Department of

Computer Applications and Prof. Manu John our project coordinator, for his valuable
guidance as well as timely advice which helped me a lot during preparation of the project.
I would like to express my special gratitude and thanks to my Mini project guide
Prof. Beena Jacob , Assistant Professor, Department of Computer Applications for her
guidance and constant supervision as well as for providing necessary information
regarding the Mini project & also for her support.
I profusely thank other Professors in the department and all other staffs of
MACE, for their guidance and inspirations throughout my course of study. No words can
express my humble gratitude to my beloved parents who have been guiding me in all
walks of my journey. My thanks and appreciations also go to my friends and people who
have willingly helped me out with their abilities.
ABSTRACT
An outsized amount of digital data from social media, research, agriculture, medical records
and other University and technical organizations faces high competition and their challenge is
in analysing their students’ performance. The foremost important challenges are in admission,
student placement and within the curriculum. The two most important processes during which
data are collected and analysed are admission and placement. The university’s rank in the
market solely depends on academic performance and placement of the student. Aside from
academic performance there are various other factors which help in understanding the final
performance of the student. During this project, processing techniques are utilized to understand
the performance of students and group the students under various categories as a student must
consistently improve to compete in today’s world. Almost every university has their own
management system to manage the students’ records. Currently, there is a student management
system that manages the students’ records in University of Malaysia Sarawak (UNIMAS), but
no permission is provided for lecturers to access the system. This is often actually because the
access permission is solely to top management like Deans and Deputy Deans of Undergraduate
and Student Development due to its privacy setting. Thus, this project proposes a system named
‘Academic Analytics Using Machine Learning’ to remain track of students’ results. The
proposed system offers a predictive system that's able to predict the student’s performance
which in turn assists the lecturers to identify students that are predicted to possess bad
performance in their courses.
The proposed system offers student performance prediction through the principles generated via
processing technique. The knowledge mining technique utilized during this project is
classification. The Dataset consists of 6 features (Gender, S1 CGPA, S2 CGPA, S3 CGPA,
Overall CGPA, Target Class). Here we considered different algorithms and by comparing
accuracies of different algorithms we chose SVM (support vector machine), it is a machine
learning algorithm’s shows 96% accuracy on the given dataset.
LIST OF TABLES
2.1 Paper Comparison…............................................................................................4
2.1 Confusion Matrix…...........................................................................................12

LIST OF FIGURES
3.1 Images of Dataset................................................................................................6

3.2 Values for each feature list..................................................................................7
3.3 Checking null value.............................................................................................8
3.4 Checking Duplicates............................................................................................8
3.5 Unique values of each feature list......................................................................9

3.6 Value of class Variable.......................................................................................9
3.7 Pair Plot of each features....................................................................................10
3.8 Pie Chart of Clas Variables Pass and Fail...........................................................11
3.9 Histogram Representation of the Overall CGPA................................................11
3.10 SVM Hyperplane................................................................................................13
3.11 Accuracy Comparison.......................................................................................15

3.12 Project Pipeline..................................................................................................16
4.1 Splitting Dataset………………………………………………………………..23
4.2 Model Generation……………………………………………………………….23
4.3 Loading model and prediction…………………………………………………..24
5.1 UI Design………………………………………………………………………..26
6.1 Git History………………………………………………………………………27
CONTENTS
1 Introduction 1
2 Supporting Literature 2
2.1 Literature Review................................................................................................2
2.2 Findings and Proposals........................................................................................4
3 System Analysis 6
3.1 Analysis of Dataset...............................................................................................6
3.1.1 About the Dataset.......................................................................................6
3.1.2 Explore the Dataset……………………………………….………….…....7
3.2 Data Pre-processing..............................................................................................8
3.2.1 Data Cleaning.............................................................................................8
3.2.2 Analysis of Feature Variables....................................................................9
3.2.3 Analysis of Class Variables........................................................................9
3.3 Data Visualization..............................................................................................10
3.4 Analysis of Algorithm........................................................................................12
3.4.1 Accuracy Comparison……………………………………………………14
3.5 Project Pipeline...................................................................................................16
3.6 Feasibility Analysis............................................................................................18
3.6.1 Technical Feasibility................................................................................18
3.6.2 Economic Feasibility................................................................................18
3.6.3 Operational Feasibility.............................................................................19
3.7 System Environment...........................................................................................20
3.7.1 Software Environment..............................................................................20
3.7.2 Hardware Environment............................................................................22
4 System Design 23
4.1 Model Building...................................................................................................23
4.1.1 Model Planning…………………………………………………………..23
4.1.2 Training....................................................................................................23
4.1.3 Testing......................................................................................................24
5 Results and Discussion 24
6 Model Deployment 25
7 Git History 27
8 Conclusions 28
9 Future Work 29
10 Appendix 30
10.1 Minimum Software Requirements....................................................................30
10.2 Minimum Hardware Requirements...................................................................30
11 References 31
Academic analytics using machine learning
1. INTRODUCTION
Machine learning could be a specialization under the vast AI. Machine learning works
towards comprehending the complexity of assorted sorts of collected data and
identifying the right model for the data by trying several models. This can be
effectively systemized with easier interpretation and use by people. Machine learning
lies within the engineering science field but is different from basic computing
algorithms which are used for problem solving. Within the process of machine
learning, the algorithms are designed in a way which allows the system or computer to
process the input information, create training sets and produce the desired range
specified output using statistical estimation. Students are the greatest asset for various
universities. Universities and students play a very important role in producing
graduates of high qualities with their academic performance achievement. Academic
performance achievement is the level of accomplishment of the students’ educational
goal that may be measured and tested through examination, assessments and other
types of measurements. However, the educational performance achievement varies for
different reasons students may have at different levels of performance achievement. In
all the elemental parts of a student’s personal and professional development is
performance evaluation. Performance evaluations emphasize students’ strong suits and
their forte. This acts as a vital tool in augmenting their strengths and distinguishing
areas that require improvement as goals. By having the ability to research the
performance of their students, teachers can divert their attention to the mandatory
areas, advise and guide the scholars along the proper path and acknowledge and
reward their achievements.
Department of Computer Applications, MACE 1

2. SUPPORTING LITERATURE
2.1. Literature Review
Early indications of students' achievement assist academics in optimizing their

learning tactics and focusing on varied educational approaches to ensure a good
learning experience. Machine learning applications can assist academics in
anticipating predicted shortcomings in learning processes, allowing them to
proactively engage such students in a better learning experience. There can be many
different approaches to predict performance of a student.
1. H.M. Rafi Hasan;AKM Shahariar Azad Rabby;Mohammad Touhidul

Islam;Syed Akhter Hossain, ‘Machine Learning Algorithm for
Student’s Performance Prediction’ , 2019 10th International
Conference on Computing, Communication and Networking
Technologies (ICCCNT),IEEE
In paper [1] the paper suggests that Student performance prediction is very
important to understand a student's progress rate. To predict the student’s
performance, they begin by collecting data sets in order to anticipate the
students' performance. As a result, they attempted to gather students' class
test, attendance, presentation, assignment, midterm, and final examination
marks. For the greatest accuracy rate, they propose using K-Nearest
Neighbors and Decision Tree Classifier. This proposed model outperforms
Student Performance across three semesters. The training and testing sets
provide optimum results and event accuracy. Finally, they are able to get the
best results and accuracy with the K-Nearest Neighbors, Decision Tree
Classifier model with 89.74 percent & 94.44 percent accuracy.
2. Hina Gull;Madeeha Saqib;Sardar Zafar Iqbal;Saqib Saeed , Improving

Learning Experience of Students by Early Prediction of Student
Performance using Machine Learning 2020 IEEE International
Conference for Innovation in Technology (INOCON)
In paper[2] They developed a model to predict the grades of students taking

the same course in the next term using logistic regression, linear
discriminant analysis, K-nearest neighbours, classification and regression
trees, gaussian Naive Bayes, and support vector machines on historical data

of student grades in one of the undergraduate courses Their investigations

reveal that linear discrimination analysis is the most effective method for
accurately predicting students' final exam results. Out of a total of 54
records, the model correctly predicted 49, resulting in an accuracy of 90.74
percent. They evaluated the accuracy of students' success using historical
course data.
3. Chew Li Sa;Dayang Hanani bt. Abang Ibrahim;Emmy Dahliana

Hossain;Mohammad bin Hossin , Student performance analysis system
(SPAS), The 5th International Conference on Information and
Communication Technology for The Muslim World (ICT4M)
This paper[3] attempted to provide a prediction of students' examination

performance. They used WEKA to examine the feasibility of linear
regression and multilayer perceptron in terms of accuracy, performance, and
error rate. The classification techniques, linear regression, multilayer
perceptron and support vector machine are used to forecast final
examination results Then, using the value of mean absolute error difference,
they are compared . According to the findings,support vector machine has
the highest accuracy with 94.88%.

Findings and Proposals
Table 2.1 Paper Comparison


From the above three papers, we get to know that different approaches are used for
students performance analysis. First paper use KNN and Decision Tree Classifier. This
proposed model outperforms Student Performance across three semesters. The training
and testing sets provide optimum results and event accuracy. In the second paper they
developed a model to predict the grades of students taking the same course in the next
term using logistic regression, linear discriminant analysis, K-nearest neighbours,
classification and regression trees, gaussian Naive Bayes, and support vector machines
on historical data of student grades in one of the undergraduate courses. In the third
paper, They used WEKA to examine the feasibility of linear regression and multilayer
perceptron in terms of accuracy, performance, and error rate. According to the
findings,support vector machine has the highest accuracy with 94.88%.

3. SYSTEM ANALYSIS
3.1. Analysis of Dataset
3.1.1. About the Dataset
The dataset I used is a dataset which was made by collecting information
through a survey. It contains details of various students including marks of
different semester and sessional exam. It also includes gender of the students.
https://drive.google.com/file/d/1mwfn2PWHVPRQg3IqP2xlUTMlyVo6XdNb/view?
usp=share_link
In this dataset there are about 600 records which is details that the students have to
answer related to academics. By analyzing the values in each records we can conclude
or predict that the student may pass or fail in the upcoming semester. There are 6
feature list of dataset i.e. the gender and marks of various semesters and sessional
exams and pass or fail is the class label.
Figure 3.1 Image of Dataset

3.1.2. Explore the Dataset
The attributes in this dataset are CGPA of s1,s2,s3,session 1,session 2 and age. And
the class label is whether the student will pass or failin the upcoming semester which
is found by analyzing this huge dataset.
Fig 3.2. Values for each feature List

3.2. Data Preprocessing

3.2.1. Data Cleaning
Data Cleaning is the data pre-processing method we choose. Data cleaning routines
attempt to fill in missing values, smooth out noisy data and correct inconsistencies.
The dataset taken is already pre-processed, so pre-processing techniques are not need
for the dataset. But for assurance pre-processing techniques for handling missing
values and duplicated values are made.
Fig 3.3 Checking null Values
Fig 3.4 Checking duplicates

3.2.2. Analysis of Feature Variables
Feature variables consist of Numerical values. Numerical values are the

mark in each subjects gender is 0 for male and 1 for female.
Fig 3.5 Unique values of each feature list
3.2.3 . Analysis of Class Variables
Class variables pass and fail according to their marks.
Fig3.6 Class Variable

3.3. Data Visualization

Data visualization is the graphical representation of information and
data in a pictorial or graphical format (Example: charts, graphs, and maps).
Data visualization tools provide an accessible way to see and understand trends,
patterns in data, and outliers. Data visualization tools and technologies are
essential to analysing massive amounts of information and making data-driven
decisions. The concept of using pictures is to understand data that has been
used for centuries. General types of data visualization are Charts, Tables,
Graphs, Maps, and Dashboards.
Fig 3.7: Pairplot of each features

Fig 3.8: piechart of class variables Pass and Fail
Fig 3.9: Histogram representation of the overall CGPA

3.4. Analysis of Algorithm
Prediction is done using algorithm called Support Vector Machine. The

accuracy of this algorithm can be calculated as
Table 3.4 Confusion Matrix
True positive reviews True negative reviews
Predict positive reviews a b
Predict negative reviews c d
Accuracy = (a+d)/(a+b+c+d)
Algorithm Used
Support Vector Machine
Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a
hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different
categories that are classified using a decision boundary or hyperplane:

Fig 3.10.SVM Hyperplane
SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines
from both the classes. These points are called support vectors. The distance between
the vectors and the hyperplane is called as margin. And the goal of SVM is to
maximize this margin. The hyperplane with maximum margin is called the optimal
hyperplane.
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes

in n-dimensional space, but we need to find out the best decision boundary that helps
to classify the data points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight
line. And if there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support
the hyperplane, hence called a Support vector.

SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is called as Non-linear
SVM classifier.
3.4.1. Accuracy Comparison
Fig svm accuracy
Fig knn accuracy

Fig Logistic accuracy
Fig Gaussian accuracy
Fig 3.11 Accuracy Comparison

3.5 Project Pipeline
Fig 3.12 Project Pipeline
Data collection: A dataset with appropriate parameters like Gender, different semester
marks like S1,S2,S3 and Session 1,Session 2 Marks, Overall CGPA and Class
variables are pass and fail
Data Pre-processing: Make the acquired data set in an organized format. Data
Cleaning is the data pre-processing method we choose. Data cleaning routines attempt
to fill in missing values, smooth out noisy data and correct inconsistencies.
Missing values can be handled by:
Ignore the tuple: This is usually done when the class label is missing.
Use global constant to fill the missing value: Replace all missing attribute values by
the same constant, such as label like “Unknown” or NA.This method is simple.
Use attribute mean or use the attribute mean for all samples belonging to the same
class as the given tuple.
Noisy data can be handled by:
Binning: Binning methods smooth a sorted data value by consulting its
“neighbourhood”. The sorted values are distributed into a number of buckets or bins.

Since binning methods consult the neighbourhood of values, they perform local
smoothing.
Regression: Data can be smoothed by fitting the data to a function such as with
regression. Linear regression and Multiple Linear Regression can be used.
Clustering: Outliers may be detected by clustering, where similar values are organized
into groups or clusters. Intuitively values that fall outside of the set of clusters may
considered as outliers.
The dataset taken is already pre-processed, so pre-processing techniques are not need
for the dataset. But for assurance pre-processing techniques for handling missing
values and duplicated values are made.
Training and Testing: Model Training was done and generates model and saved
in .pkl format using pickle. Testing is made by loading saved model and perform
prediction through python code. Accuracy comparison is made by splitting dataset into
training and test data. After Testing , User interface is developed for prediction and
connect the model using Flask Framework.

3.6. Feasibility Analysis
A feasibility study aims to objectively and rationally uncover the

strengths and weaknesses of an existing system or proposed system,
opportunities and threats present in the natural environment, the
resources required to carry through, and ultimately the prospects for
success.
Evaluated the feasibility of the system in terms of the following categories:
 Technical Feasibility
 Economical Feasibility
 Operational Feasibility
3.6.1.Technical Feasibility
Proposed system is technically feasible since all the required

tools are easily available. Technical issues involved are the necessary
technology existence, technical guarantees of accuracy, reliability, ease
of access, data security, and aspects of future expansion. The
application is technically feasible because all the technical resources
required for the development and working of the application is easily
available and reliable. The project is implemented in Python. Since
Python supports a various libraries and packages that make the project
development easier, the project was technically feasible. The codes are
written in Google Colab, therefore all the libraries will be available, no
need to install or import each of those. These requirements are easily
available, reliable, and will make the system more time saving and
require less manpower.
3.6.2. Economic Feasibility
In our proposed system ”Academic Analytics using Machine

Learning”, the development cost of the application is optimum. The
system requires only a computer for working. The code is working on
Google Colab .So the colab consumes an amount of internet. The
development of the system will not need a huge amount of money. It
will be economically feasible.

3.6.3. Operational Feasibility
Operational feasibility assesses the extent to which the required

system performs a series of steps to solve business problems and user
requirements. Operational feasibility is mainly concerned with issues
like whether the system will be used if it is developed and
implemented. The developed system is completely driven and user
friendly. Since the code is written on Google Colab, no need for
worrying about importing or installing the libraries required. There is
no need of skill for a new user to open this application and use it. The
interface contain only a file upload option and a submit button. Users
also need to be aware of the application initially. Then they can use it
easily. So it is feasible. But sometimes it have a GPU issues.

3.7. System Environment

System environment specifies the hardware and software
configuration of the new system. Regardless of how the requirement
phase proceeds, it ultimately ends with the software requirement
specification. A good SRS contains all the system requirements to a level
of detail sufficient to enable designers to design a system that satisfies
those requirements. The system specified in the SRS will assist the
potential users to determine if the system meets their needs or how the
system must be modified to meet their needs.
3.7.1. Software Environment
Various software used for the development of this application are the
following :
 Python
Python is a high-level programming language that lets

developers work quickly and integrate systems more efficiently. This
model is developed by using many of the Python libraries and
packages such as :
 Pandas :
Pandas is an open-source library that is made mainly for

working with relational or labeled data both easily and
intuitively. It provides various data structures and operations for
manipulating numerical data and time series. This library is built
on top of the NumPy library.
 Matplotlib :
Matplotlib is a cross-platform, data visualization and

graphical plotting library for Python. One of the greatest benefits
of visualization is that it allows us visual access to huge amounts
of data in easily digestible visuals.
In this application, its used for plotting the graph.

 Numpy :
NumPy is a Python library used for working with arrays.

NumPy, which stands for Numerical Python, is a library
consisting of multidimensional array objects and a collection of
routines for processing those arrays.
In this application, its used for handling arrays.
 Google Colab
Colab is a free Jupyter notebook environment that runs entirely

in the cloud. We can write and execute code in Python. Colab supports
many machine learning libraries which can be easily loaded in the
colab notebook.
 Visual Studio Code
Visual Studio Code is a streamlined code editor with support for

development operations like debugging, task running, and version
control. It aims to provide just the tools a developer needs for a
quick code-build-debug cycle and leaves more complex workflows
to fuller featured IDEs, such as Visual Studio IDE
 HTML and CSS
Hyper Text Markup Language is used for creating web

pages.HTML describes the structure of the web page. Here, the user
interface of my project is done using HTML. Cascading Style Sheet is
used with HTML to style the web pages.
 Flask
Flask is a web framework, it’s a Python module that lets you
develop web applications easily. It’s has a small and easy-to-extend
core: it’s a microframework that doesn’t include an ORM (Object
Relational Manager) or such features. It does have many cool
features like url routing, template engine. It is a WSGI web app
framework.

 Github
Git is an open-source version control system that was started
by Linus Torvalds. Git is similar to other version control systems
Subversion, CVS, and Mercurial to name a few. Version control
systems keep these revisions straight, storing the modifications in a
central repository. This allows developers to easily collaborate, as they
can download a new version of the software, make changes, and upload
the newest revision. Every developer can see these new changes,
download them, and contribute. Git is the preferred version control
system of most developers, since it has multiple advantages over the
other systems available.It stores file changes more efficiently and
ensures file integrity better.
The social networking aspect of GitHub is probably its most
powerful feature, allowing projects to grow more than just about any of
the other features offered. Project revisions can be discussed publicly,
so a mass of experts can contribute knowledge and collaborate to
advance a project forward.
3.7.2. Hardware Environment

Selection of hardware configuration is very important task related to the
software development.
Computer -
Processor : 2 GHz or faster (dual-core or quad-core

will be much faster)
Memory : 8 GB RAM or greater
Disk Space: 40 GB or Greater
Good Internet Connectivity

4. SYSTEM DESIGN
4.1. Model Building
4.1.1. Model Planning
Model is generated by using SVM algorithm with high accuracy is
used for prediction.. The accuracy comparison is made by using the
dataset as training and testing data. By splitting the dataset, a portion is
used for training the model and other for testing the model. 70% of
dataset is used as training data and remaining 30% used as testing data.
Fig 4.1 Splitting Dataset
4.1.2. Model Training

After splitting dataset, using training data, models are generated using
the SVM algorithm.
Fig 4.2 Model generation

5.1.3. Testing
By loading model, testing is done for a sample case . The sample

case include values for each features .Then the model predicted Fail
Fig.4.3 Loading model and prediction
5. RESULTS AND DISCUSSION

The calculated accuracy for SVM is 97.5. The saved model model.sav was
loaded by importing pickle package.

6. MODEL DEPLOYMENT
This figure shows the user interface of this application. The
interface is very simple and easy to understand. There are 6 fields for
entering details from users. There is a drop down list to select the gender .
There is a predict button to predict the result. Validation for numeric field
is done in html. And for making all fields mandatory validation is done
when values are taken from the form to the model.

Fig 5.1 UI Design

Intelligent Career Guidance System
7. GIT HISTORY
Fig 6.1 Git History

8. CONCLUSIONS
In conclusion, the project concentrates on the development of a system for student

performance analysis. A data mining technique, Support Vector Machine algorithm is
applied in this project to ensure the prediction of the student performance is possible.
The main contribution of the SPAS is that it assists the lecturers in conducting student
performance analysis. The system assists lecturers in identifying the students’ that are
predicted to fail in a course..The reason for employing machine learning techniques to
forecast student performance is that it allows for the handling of a huge amount of data
and aids in prediction based on the data in which the model is trained. Grades,
attendance, internal assessments, and assignments are commonly used as training
parameters. Algorithms were selected based on their accuracy after being tested using
data sets. Here we considered different algorithms and by comparing accuracies of
different algorithms we chose SVM (support vector machine), it is a machine learning
algorithm’s shows 96% accuracy on the given data set.We collected data from
different students through surveys. The parameters that we focused are age,
attendance, internal exam marks, number of backlogs. Just by entering these details
the prediction will happen that is either pass or fail.

9. FUTURE WORK
In this project, the prediction is not updated dynamically within the system’s source
codes. Thus, in future, a dynamic prediction model could be implemented by training
the prediction model itself whenever a new training set is fed into the system.
Moreover, the prediction can be offered to the other courses in future as well. .There is
a large number of institutions of higher education and they operate in a very complex
and highly competitive environment. Predicting a student's academic performance is
one of the most important steps towards efficient education and university’s
profitability, especially for private ones which are fully funded by tuition fees. It
affects the modification of the existing programs and the creation of new ones. With
accelerated IT development and lower prices, universities start to collect a huge
amount of data about their students. These data can be further analyzed with machine
learning methods and techniques. A special application of machine learning methods
and techniques in the educational environment is used. It is an interdisciplinary area
that brings together techniques from statistics, artificial intelligence, database systems,
machine learning, pattern recognition, data visualization, knowledge acquisition and
information theory to find useful patterns and, thus, help understand students’
behavior and how they learn

10. APPENDIX
10.1. Minimum Software
Requirements Software:Google
Colab
Operating System : Windows 10
10.2. Minimum Hardware Requirements

Hardware capacity : 256 GB
(minimum) RAM : 8 GB
Processor : Intel Core i5 preferred
Display : 1366 * 768
SSD : 512 GB

11. REFERENCES
1. H.M. Rafi Hasan;AKM Shahariar Azad Rabby;Mohammad Touhidul

Islam;Syed Akhter Hossain, ‘Machine Learning Algorithm for Student’s
Performance Prediction’ , 2019 10th International Conference on Computing,
Communication and Networking Technologies (ICCCNT),IEEE
2. Hina Gull;Madeeha Saqib;Sardar Zafar Iqbal;Saqib Saeed, Improving

Learning Experience of Students by Early Prediction of Student Performance
using Machine Learning , 2020 IEEE International Conference for Innovation
in Technology (INOCON)
3. Chew Li Sa;Dayang Hanani bt. Abang Ibrahim;Emmy Dahliana

Hossain;Mohammad bin Hossin , Student performance analysis system
(SPAS) , The 5th International Conference on Information and
Communication Technology for The Muslim World (ICT4M)

Documentation Miniproject Alen-Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Documentation Miniproject Alen-Final

Uploaded by

Copyright:

Available Formats

MAR ATHANASIUS COLLEGE OF ENGINEERING

(Affiliated to APJ Abdul Kalam Technological University,

Department of Computer Applications

Mini Project Report

ACADEMIC ANALYTICS USING MACHINE

Reg No: MAC21MCA-2002

Under the guidance of

ACADEMIC ANALYTICS USING MACHINE

During the academic year 2022-2023, in partial fulfilment of requirements for

Faculty Guide Head of the Department

Project Coordinator Internal Examiner

I am also grateful to Prof. Biju Skaria, Head of the Department, Department of

2.1 Paper Comparison…............................................................................................4

2.1 Confusion Matrix…...........................................................................................12

3.1 Images of Dataset................................................................................................6

3.5 Unique values of each feature list......................................................................9

3.11 Accuracy Comparison.......................................................................................15

5 Results and Discussion 24

Department of Computer Applications, MACE 1

Early indications of students' achievement assist academics in optimizing their

1. H.M. Rafi Hasan;AKM Shahariar Azad Rabby;Mohammad Touhidul

2. Hina Gull;Madeeha Saqib;Sardar Zafar Iqbal;Saqib Saeed , Improving

In paper[2] They developed a model to predict the grades of students taking

Department of Computer Applications, MACE 2

of student grades in one of the undergraduate courses Their investigations

3. Chew Li Sa;Dayang Hanani bt. Abang Ibrahim;Emmy Dahliana

This paper[3] attempted to provide a prediction of students' examination

Department of Computer Applications, MACE 3

Findings and Proposals

Table 2.1 Paper Comparison

Department of Computer Applications, MACE 4

Department of Computer Applications, MACE 5

Department of Computer Applications, MACE 6

Figure 3.1 Image of Dataset

Department of Computer Applications, MACE 7

3.1.2. Explore the Dataset

Fig 3.2. Values for each feature List

Department of Computer Applications, MACE 8

3.2. Data Preprocessing

Fig 3.3 Checking null Values

Fig 3.4 Checking duplicates

Department of Computer Applications, MACE 9

3.2.2. Analysis of Feature Variables

Feature variables consist of Numerical values. Numerical values are the

Fig 3.5 Unique values of each feature list

3.2.3 . Analysis of Class Variables

Class variables pass and fail according to their marks.

Fig3.6 Class Variable

Department of Computer Applications, MACE 10

3.3. Data Visualization

Fig 3.7: Pairplot of each features

Department of Computer Applications, MACE 11

Fig 3.8: piechart of class variables Pass and Fail

Fig 3.9: Histogram representation of the overall CGPA

Department of Computer Applications, MACE 12

3.4. Analysis of Algorithm

Prediction is done using algorithm called Support Vector Machine. The

Table 3.4 Confusion Matrix

True positive reviews True negative reviews

Predict positive reviews a b

Predict negative reviews c d

Department of Computer Applications, MACE 13

Fig 3.10.SVM Hyperplane