You are on page 1of 21

Principles of Management

Semester-Project
12.01.2024

Sumitted to: Ms Saima Siddiqui

Muhammad Mobeen 200901097

Muhammad Ahsan Ali 200901072

Hamza Farooq 200901019

Danyal Shah 200901046

Ahmed Rohail Awan 200901124

Awais Afzal 200901037


2

Predictive Analysis of SGPA and CGPA of 5th Semester

Literature Review:
In this project we were requested to perform a predictive analysis of SGPA and CGPA for
students in the fifth semester based on relevant features, treating it as a regression
problem. The initial phase involved meticulous data preprocessing steps, including loading
the dataset, addressing missing values, removing duplicates, and transforming categorical
variables. Exploratory data analysis (EDA) was performed using Seaborn and Sweetviz
libraries, generating insightful visualizations for data distribution, outliers, and
relationships.

Moving on to predictive modeling, the project utilized a variety of artificial intelligence


models such as Linear Regression, Support Vector Regression (SVR), Neural Network,
XGBoost Regressor, and Random Forest Regressor. The models were systematically trained
and evaluated using metrics like Mean Squared Error (MSE) and R-squared. Visualizations,
including scatter plots, were employed for result interpretation. This comprehensive multi-
model approach ensures a robust analysis, aiding in the selection of an optimal model for
accurate SGPA predictions in the fifth semester.

Work-Breakdown Structure:

1.
3

2.

3.

3.

4.
4

4.
5

5.

6.
6

WBS Screenshot:

Activity on Node and CPM

Critical Path Calculation= 26.5 days


Critical Path: Blue Path
7

Methodology:

Data Preprocessing:
In data preprocessing, first we used panda library that read all the given data set into data frame
and then we showed all the contents of data frame.

After that we showed the data types and mean of all the contents.

After that we filled all the empty spaces in dataset, else we can do that we can drop columns with
empty spaces but we filled them instead.

We choose some features for testing and training as by dropping irrelevant features. We split the
data into 70% training and 30% testing.
8

AI Prediction Models :
In data preprocessing, it cannot be done through classification models but through regression
models. As our target variable is continuous not discrete. So we have used Linear Regression
Model, Support Vector Regression (SVR), Neural Network, XGBoost regressor and Random Forest
Regressor Model. These all models are Regression Models.

1.Linear Regression Model:

First we import the necessary library Linear Regression from scikit-learn for the linear regression
model.
Then we trained the model by fit the model with training data (X_train and y_train). Then we made
the predictions by using the trained model to make predictions on the test set (X_test).
After that we evaluate the model by calculating and printing metrics such as Mean Squared Error
(MSE) and R-squared to assess the model's performance. At last we visualize predictions by plotting
the actual values against the predicted values for visualization using Matplotlib.

Result of Linear Regression Model:

Mean Squared Error: 0.10212410915757575


R-squared: 0.6374532214800184

2. Support Vector Regression (SVR):


First we imported the library SVR from scikit-learn for the Support Vector Regression model. After
that we initialized the model by creating an instance of the SVR model, specifying the linear. Then
9

we trained the model by fitting the model with training data (X_train and y_train). Then we made
predictions by using the trained SVR model to make predictions on the test set (X_test). Next we
evaluated the model by calculating and printing metrics such as Mean Squared Error (MSE) and R-
squared to assess the model's performance and visualized the model by plotting it.

Result of Support Vector Regression (SVR):


Mean Squared Error: 0.10626246098482978
R-squared: 0.622761821616366

3. Neural Network:

We instantiate a Standard Scaler object from scikit-learn. This scaler will be used to standardize the
features. Fit_transform will fit the scaler on the training data (X_train) and transform it. Then,
transform the test data (X_test) using the same scaler.
After that we instantiate a Multi-layer Perceptron (MLP) regressor model using scikit-learn and train
the MLP regressor on the standardized training data. There are two hidden layers with 100 and 50
neurons.
We set the maximum number of iterations (epochs) for training (max_iter) and for reproducibility
(random_state).
Then we calculated the mean squared error between the actual (y_test) and predicted values
(y_pred) and access the training loss values over epochs from the MLP regressor through loss curve.
10
11

LOSS FUNCTION OUTPUT

Result of Neural Network:


TensorFlow Model Mean Squared Error: 0.2528950051069883
TensorFlow Model R-Squared: 0.03596719099374379

4. XGBoost Regressor:
In this model, we instantiate an XGBoost Regressor. Then we fit the model to the training set by
using the fit method to train the XGBoost model on the training set (X_train and y_train). After that
we made predictions on the test set (X_test). Next we calculated the Mean Squared Error (MSE) and
R-squared between the actual (y_test) and predicted values (y_pred) and visually plot the actual and
predicted values.
12

Result of XGBoost Regressor:


Mean Squared Error: 0.30599851476999607
R-squared: -0.08631327780381892

5. Random Forest Regressor:

In random forest regressor, we instantiate a Random Forest Regressor with 100 trees (n_estimators)
and a specified random seed (random_state). Then we used the fit method to train the Random
Forest model on the training set (X_train and y_train). Then we used the trained Random Forest
model to make predictions on the test set (X_test). After that we calculated Mean Squared Error
(MSE) and R-squared between the actual (y_test) and predicted values (y_pred) and visually plot the
actual and predicted values.
13

Result of Random Forest Regressor:


Mean Squared Error: 0.09653424363636351
R-squared: 0.6572975829514944

In Conclusion, Random Forest Regressor model performed best as it has lower Mean Squared Error
(MSE) and higher R-squared values as compared to all other models.
14

Exploratory Data Analysis

The EDA process undertaken in this project involves leveraging Seaborn and Sweetviz
libraries to gain insights into the dataset's characteristics.

Seaborn's box plots were generated for distinct subsets of columns within the one-hot
encoded DataFrame. These visualizations, organized into four blocks, each analyzing 20
columns, serve to illuminate the distribution of data and highlight potential outliers.

Data Visualization with Seaborn:

 Utilized Seaborn to construct box plots for 20-column subsets.

 Each block of code focuses on a different range of columns, allowing for a


systematic examination of the entire dataset.
15

Analyzing the first 19 columns of a student data set using box plots revealed various characteristics.
The distribution of data was largely symmetrical, except for slight skewness in some opinion-based
columns. Central tendencies, represented by medians, differed significantly across variables,
indicating diverse typical values. The spread of data points varied similarly, with some columns
exhibiting wider ranges in responses than others. A few potential outliers existed, particularly in the
"surprise quizzes" category, hinting at extreme cases of stress and discouragement. These
observations suggested possible connections between family background and preferred learning
styles, diverse coping mechanisms for anxieties, and potentially strong reactions to surprise quizzes.
16

Specific Observations:

 Columns related to online learning preferences ("I am comfortable taking online


quizzes/examinations" and "I prefer online lectures over classroom lectures") generally have
higher medians compared to those for engagement ("I actively participate in online discussions"
and "I ask questions during online lectures"). This might suggest a disconnect between positive
attitudes towards online learning and active participation within it.
 Columns concerning student engagement ("I actively participate in online discussions" and "I ask
questions during online lectures") show wider ranges in responses, indicating diverse levels of
interaction and learning styles among students in the online environment.
 The presence of an outlier in the "online lecture concentration" boxplot highlights a potential case
of someone finding online lectures significantly more focused than classroom settings.

Sweetviz Reports(Auto EDA Approach):


Sweetviz, an analytical tool, was employed to generate HTML reports providing in-depth analyses
of specific column subsets. This facilitated a comprehensive understanding of the dataset's
statistical properties, including distributions and inter-column relationships.

Some key visuallisations and statistics after exploring through EDA


17
18

Visual Insights
19

Live Website deployed at: https://emproject.streamlit.app/

Graphical User Interface(GUI):

This is the Graphical User Interface of the website being hosted.The User can input the
Matric and Intermediate percentages and all the Semester GPA’s from first till fourth
Semester to get the predicted SGPA and CGPA of fifth semester.All the five models are
integrated with the GUI and based on these inputs each model predicts corresponding
output value respectively.

When User first visits the website


20

After that the user inputs the details of previous education

OUTPUT:

All the models predict the corresponding SGPA and CGPA respectively.
21

Conclusion:

In conclusion, the project successfully conducted a predictive analysis of SGPA and CGPA
for the fifth semester, employing various regression models such as Linear Regression,
Support Vector Regression, Neural Network, XGBoost Regressor, and Random Forest
Regressor. The comprehensive methodology, encompassing meticulous data
preprocessing, model training, and evaluation using metrics like Mean Squared Error and
R-squared, revealed that the Random Forest Regressor model outperformed others. The
inclusion of a user-friendly graphical interface enhances the project's practicality,
providing users with a seamless tool for predicting SGPA and CGPA based on their
educational details.

THE END

You might also like