Em Semester Project

Principles of Management
Semester-Project
12.01.2024
Sumitted to: Ms Saima Siddiqui
Muhammad Mobeen 200901097
Muhammad Ahsan Ali 200901072
Hamza Farooq 200901019
Danyal Shah 200901046
Ahmed Rohail Awan 200901124
Awais Afzal 200901037

2
Predictive Analysis of SGPA and CGPA of 5th Semester
Literature Review:
In this project we were requested to perform a predictive analysis of SGPA and CGPA for
students in the fifth semester based on relevant features, treating it as a regression
problem. The initial phase involved meticulous data preprocessing steps, including loading
the dataset, addressing missing values, removing duplicates, and transforming categorical
variables. Exploratory data analysis (EDA) was performed using Seaborn and Sweetviz
libraries, generating insightful visualizations for data distribution, outliers, and
relationships.
Moving on to predictive modeling, the project utilized a variety of artificial intelligence

models such as Linear Regression, Support Vector Regression (SVR), Neural Network,
XGBoost Regressor, and Random Forest Regressor. The models were systematically trained
and evaluated using metrics like Mean Squared Error (MSE) and R-squared. Visualizations,
including scatter plots, were employed for result interpretation. This comprehensive multi-
model approach ensures a robust analysis, aiding in the selection of an optimal model for
accurate SGPA predictions in the fifth semester.
Work-Breakdown Structure:
1.
3
2.
3.
3.
4.
4
4.
5
5.
6.
6
WBS Screenshot:
Activity on Node and CPM
Critical Path Calculation= 26.5 days

Critical Path: Blue Path
7
Methodology:
Data Preprocessing:
In data preprocessing, first we used panda library that read all the given data set into data frame
and then we showed all the contents of data frame.
After that we showed the data types and mean of all the contents.
After that we filled all the empty spaces in dataset, else we can do that we can drop columns with
empty spaces but we filled them instead.
We choose some features for testing and training as by dropping irrelevant features. We split the
data into 70% training and 30% testing.
8
AI Prediction Models :
In data preprocessing, it cannot be done through classification models but through regression
models. As our target variable is continuous not discrete. So we have used Linear Regression
Model, Support Vector Regression (SVR), Neural Network, XGBoost regressor and Random Forest
Regressor Model. These all models are Regression Models.
1.Linear Regression Model:
First we import the necessary library Linear Regression from scikit-learn for the linear regression
model.
Then we trained the model by fit the model with training data (X_train and y_train). Then we made
the predictions by using the trained model to make predictions on the test set (X_test).
After that we evaluate the model by calculating and printing metrics such as Mean Squared Error
(MSE) and R-squared to assess the model's performance. At last we visualize predictions by plotting
the actual values against the predicted values for visualization using Matplotlib.
Result of Linear Regression Model:
Mean Squared Error: 0.10212410915757575

R-squared: 0.6374532214800184
2. Support Vector Regression (SVR):

First we imported the library SVR from scikit-learn for the Support Vector Regression model. After
that we initialized the model by creating an instance of the SVR model, specifying the linear. Then
9
we trained the model by fitting the model with training data (X_train and y_train). Then we made
predictions by using the trained SVR model to make predictions on the test set (X_test). Next we
evaluated the model by calculating and printing metrics such as Mean Squared Error (MSE) and R-
squared to assess the model's performance and visualized the model by plotting it.
Result of Support Vector Regression (SVR):

R-squared: 0.622761821616366
3. Neural Network:
We instantiate a Standard Scaler object from scikit-learn. This scaler will be used to standardize the
features. Fit_transform will fit the scaler on the training data (X_train) and transform it. Then,
transform the test data (X_test) using the same scaler.
After that we instantiate a Multi-layer Perceptron (MLP) regressor model using scikit-learn and train
the MLP regressor on the standardized training data. There are two hidden layers with 100 and 50
neurons.
We set the maximum number of iterations (epochs) for training (max_iter) and for reproducibility
(random_state).
Then we calculated the mean squared error between the actual (y_test) and predicted values
(y_pred) and access the training loss values over epochs from the MLP regressor through loss curve.
10
11
LOSS FUNCTION OUTPUT
Result of Neural Network:

TensorFlow Model Mean Squared Error: 0.2528950051069883
TensorFlow Model R-Squared: 0.03596719099374379
4. XGBoost Regressor:
In this model, we instantiate an XGBoost Regressor. Then we fit the model to the training set by
using the fit method to train the XGBoost model on the training set (X_train and y_train). After that
we made predictions on the test set (X_test). Next we calculated the Mean Squared Error (MSE) and
R-squared between the actual (y_test) and predicted values (y_pred) and visually plot the actual and
predicted values.
12
Result of XGBoost Regressor:

R-squared: -0.08631327780381892
5. Random Forest Regressor:
In random forest regressor, we instantiate a Random Forest Regressor with 100 trees (n_estimators)
and a specified random seed (random_state). Then we used the fit method to train the Random
Forest model on the training set (X_train and y_train). Then we used the trained Random Forest
model to make predictions on the test set (X_test). After that we calculated Mean Squared Error
(MSE) and R-squared between the actual (y_test) and predicted values (y_pred) and visually plot the
actual and predicted values.
13
Result of Random Forest Regressor:

R-squared: 0.6572975829514944
In Conclusion, Random Forest Regressor model performed best as it has lower Mean Squared Error
(MSE) and higher R-squared values as compared to all other models.
14
Exploratory Data Analysis
The EDA process undertaken in this project involves leveraging Seaborn and Sweetviz
libraries to gain insights into the dataset's characteristics.
Seaborn's box plots were generated for distinct subsets of columns within the one-hot
encoded DataFrame. These visualizations, organized into four blocks, each analyzing 20
columns, serve to illuminate the distribution of data and highlight potential outliers.
Data Visualization with Seaborn:
 Utilized Seaborn to construct box plots for 20-column subsets.
 Each block of code focuses on a different range of columns, allowing for a

systematic examination of the entire dataset.
15
Analyzing the first 19 columns of a student data set using box plots revealed various characteristics.
The distribution of data was largely symmetrical, except for slight skewness in some opinion-based
columns. Central tendencies, represented by medians, differed significantly across variables,
indicating diverse typical values. The spread of data points varied similarly, with some columns
exhibiting wider ranges in responses than others. A few potential outliers existed, particularly in the
"surprise quizzes" category, hinting at extreme cases of stress and discouragement. These
observations suggested possible connections between family background and preferred learning
styles, diverse coping mechanisms for anxieties, and potentially strong reactions to surprise quizzes.
16
Specific Observations:
 Columns related to online learning preferences ("I am comfortable taking online

quizzes/examinations" and "I prefer online lectures over classroom lectures") generally have
higher medians compared to those for engagement ("I actively participate in online discussions"
and "I ask questions during online lectures"). This might suggest a disconnect between positive
attitudes towards online learning and active participation within it.
 Columns concerning student engagement ("I actively participate in online discussions" and "I ask
questions during online lectures") show wider ranges in responses, indicating diverse levels of
interaction and learning styles among students in the online environment.
 The presence of an outlier in the "online lecture concentration" boxplot highlights a potential case
of someone finding online lectures significantly more focused than classroom settings.
Sweetviz Reports(Auto EDA Approach):

Sweetviz, an analytical tool, was employed to generate HTML reports providing in-depth analyses
of specific column subsets. This facilitated a comprehensive understanding of the dataset's
statistical properties, including distributions and inter-column relationships.
Some key visuallisations and statistics after exploring through EDA

17
18
Visual Insights
19
Live Website deployed at: https://emproject.streamlit.app/
Graphical User Interface(GUI):
This is the Graphical User Interface of the website being hosted.The User can input the
Matric and Intermediate percentages and all the Semester GPA’s from first till fourth
Semester to get the predicted SGPA and CGPA of fifth semester.All the five models are
integrated with the GUI and based on these inputs each model predicts corresponding
output value respectively.
When User first visits the website

20
After that the user inputs the details of previous education
OUTPUT:
All the models predict the corresponding SGPA and CGPA respectively.
21
Conclusion:
In conclusion, the project successfully conducted a predictive analysis of SGPA and CGPA
for the fifth semester, employing various regression models such as Linear Regression,
Support Vector Regression, Neural Network, XGBoost Regressor, and Random Forest
Regressor. The comprehensive methodology, encompassing meticulous data
preprocessing, model training, and evaluation using metrics like Mean Squared Error and
R-squared, revealed that the Random Forest Regressor model outperformed others. The
inclusion of a user-friendly graphical interface enhances the project's practicality,
providing users with a seamless tool for predicting SGPA and CGPA based on their
educational details.
THE END

Em Semester Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Em Semester Project

Uploaded by

Copyright:

Available Formats

Principles of Management

Sumitted to: Ms Saima Siddiqui

Muhammad Mobeen 200901097

Muhammad Ahsan Ali 200901072

Hamza Farooq 200901019

Danyal Shah 200901046

Ahmed Rohail Awan 200901124

Awais Afzal 200901037

Predictive Analysis of SGPA and CGPA of 5th Semester

Moving on to predictive modeling, the project utilized a variety of artificial intelligence

Activity on Node and CPM

Critical Path Calculation= 26.5 days

1.Linear Regression Model:

Result of Linear Regression Model:

Mean Squared Error: 0.10212410915757575

2. Support Vector Regression (SVR):

Result of Support Vector Regression (SVR):

LOSS FUNCTION OUTPUT

Result of Neural Network:

Result of XGBoost Regressor:

5. Random Forest Regressor:

Result of Random Forest Regressor:

Exploratory Data Analysis

Data Visualization with Seaborn:

 Utilized Seaborn to construct box plots for 20-column subsets.

 Each block of code focuses on a different range of columns, allowing for a

 Columns related to online learning preferences ("I am comfortable taking online

Sweetviz Reports(Auto EDA Approach):

Some key visuallisations and statistics after exploring through EDA

Live Website deployed at: https://emproject.streamlit.app/

Graphical User Interface(GUI):

When User first visits the website

After that the user inputs the details of previous education

You might also like