Professional Documents
Culture Documents
CHATTOGRAM
Department of Computer Science & Engineering
SUBMITTED BY
Name: Oishe Dey
ID: 2003810202094
SUBMITTED TO
MD. Tamim Hossain
Lecturer
Department of Computer Science & Engineering
Premier University, Chattogram
1 Introduction
2 OBJECTIVE 2
2.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 ABOUT DATASET 3
3.1 Description About Dataset . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Problem Statement 4
4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5 Dataset Collection 5
5.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6 DATA PREPROCESSING 6
6.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7 MODEL ARCHITECTURE 8
7.1 MODEL ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . 8
7.1.1 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 9
8 RESULTS 10
8.1 Count Plot & Pie Chart . . . . . . . . . . . . . . . . . . . . . . . . . 10
8.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
8.3 Evaluate Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . 13
8.4 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.5 Train accuracy & validation accuracy . . . . . . . . . . . . . . . . . . 14
8.6 Train loss & validation loss . . . . . . . . . . . . . . . . . . . . . . . 14
9 Conclusion 15
9.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
10 References i
10.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
List of Figures
Introduction
CHAPTER 1. INTRODUCTION 1
Chapter 2
OBJECTIVE
2.1 Objective
The provided code aims to read a CSV file named ’yeast.csv’ from Google Drive
into a Pandas DataFrame. Subsequently, it displays basic information about the
DataFrame, including the number of rows, columns, and the shape of the data. The
dataset appears to have columns labeled ’mcg’, ’gvh’, ’alm’, ’mit’, ’erl’, ’pox’, ’vac’,
’nuc’, and ’name’, though the data itself is not explicitly shown in the provided code
snippet. The objective is to provide a brief overview and summary statistics of the
yeast dataset.
2
Chapter 3
ABOUT DATASET
This data set contains the following Features: ”mcg, gvh, alm, mit, erl, pox, vac,
nuc, name”
3
Chapter 4
Problem Statement
4
Chapter 5
Dataset Collection
5
Chapter 6
DATA PREPROCESSING
6
No Cluster Worry Free Thesis Template
algorithm’s sensitivity to the scale of input features. The scaled training and testing
sets are stored in Xtrainscaled and Xtestscaled, respectively.Information about the
preprocessed dataset is displayed, including the shapes of the training and testing
sets.ptionally, if the scaled features need to be inverse-transformed later, the code
snippet provides comments on how to perform the inverse transformation.
This data preprocessing pipeline prepares the dataset for machine learning tasks, en-
suring that it is free of missing values, encoded properly, and split into training and
testing sets. Feature scaling is applied for standardization, and the optional inverse
scaling step is included for scenarios where the original feature scale is needed.
MODEL ARCHITECTURE
In summary, this code defines a neural network with three layers (input, hidden, and
output) for a multiclass classification task (assuming 6 classes). The ’relu’ activation
function is used for the hidden layers, and ’softmax’ is used for the output layer to
predict the class probabilities.
8
No Cluster Worry Free Thesis Template
RESULTS
10
No Cluster Worry Free Thesis Template
CHAPTER 8. RESULTS 11
No Cluster Worry Free Thesis Template
12 CHAPTER 8. RESULTS
No Cluster Worry Free Thesis Template
CHAPTER 8. RESULTS 13
No Cluster Worry Free Thesis Template
14 CHAPTER 8. RESULTS
Chapter 9
Conclusion
9.1 Conclusion
The logistic regression and random forest models are trained and evaluated, provid-
ing insights into the classification performance on the yeast dataset. The confusion
matrix and accuracy scores give a detailed view of model performance, and visual-
izations aid in understanding the training process. The combination of Logistic Re-
gression and Random Forest models allows for a balanced evaluation of the dataset,
contributing to a more robust analysis. The conclusion is tailored based on the
provided code snippets. Additional insights could be obtained by further exploring
hyperparameter tuning, feature importance analysis, and potentially trying other
machine learning algorithms for comparison.
15
Chapter 10
References
10.1 References
[1] Dataset.https://www.kaggle.com/datasets/yasserh/bmidataset/data
[5] Google.http://google.com/