Professional Documents
Culture Documents
FInal Major Project Documentation Brain Tumour Detection
FInal Major Project Documentation Brain Tumour Detection
Thakkar Batch –
ML09B3
Page | 1
REPORT
TITLE
Imported pandas as
Imported matplotlib
Imported numpy as
Then we’ve used .info() function to check for null values and then we’ve used .dropna() to drop null values in t
Page | 2
ning we’ve used .value_counts() function for the gender column so as to remove the genders other than male as female as show
Page | 3
Page | 4
REPORT
TITLE
Once the data was clean we’ve separated the male and the female data by
using .loc[] and scanning through gender column.
Page | 5
2. . Questions asked on data set:
I. What are the most common emotions/words used by Males and
Females?
Ans. Most common word used by Males is ‘the’.
Most common word used by Females is ‘and’.
PAGE 6
Page | 6
II. What gender makes more typos in their tweets?
Ans. The gender with more typos in their tweets is Male.
PAGE 7
Page | 7
Converting the text column into List:
Page | 8
Tokenizing the converted string:
Page | 9
Using Counter to count max repeated words:
PAGE 10
Page | 10
Installing pyspellchecker to pc:
Page | 11
C ounting miss spelt words:
PAGE 12
Page | 12
E rror while encoding:
Page | 13
Splitting data for training and testing purposes:
Page | 14
4. . Ensemble Machine Learning Modelling:
1. Random Forest Algorithm –
We’ve imported RandomForestClassifier from sklearn.ensemble .
We’ve fitted the X_train and Y_train to the random forest classifier and by using
.predict() we’ve predicted the values for X_test.
By importing accuracy_scores from sklearn.metrics we’ve found the
accuracy of predicted value i.e. y_pred and test value i.e. Y_test. As shown in the
above figure.
2. Logistic Regression –
We’ve imported LogisticRegression from sklearn.linear_model. We’ve
fitted the X_train and Y_train to the and by using .predict() we’ve predicted the
values for X_test.
By importing accuracy_scores from sklearn.metrics we’ve found the
accuracy of predicted value i.e. y_pred and test value i.e. Y_test. As shown in the
figure.
PAGE 15
Page | 15
3. SVM Algorithm –
We’ve imported svm from sklearn. We’ve fitted the X_train and Y_train
to the random forest classifier and by using .predict() we’ve predicted the values for
X_test.
By importing accuracy_scores from sklearn.metrics we’ve found the
accuracy of predicted value i.e. y_pred and test value i.e. Y_test. As shown in the
figure.
4. KNN Algorithm –
We’ve imported KNeighborsClassifier from sklearn.neighbors. We’ve
fitted the X_train and Y_train to the and by using .predict() we’ve predicted the values
for X_test.
By importing accuracy_scores from sklearn.metrics we’ve found the
accuracy of predicted value i.e. y_pred and test value i.e. Y_test. As shown in the
figure.
PAGE 16
Page | 16
5. Accuracy Results:
Random Forests – 59.34
Logistic Regression – 50.01
SVM Algorithm – 50.80
KNN algorithm – 54.84
In Random Forest calculated the accuracy for all the Estimators in the range of
1 to 100, plotted the graph for all the accuracies and found the average of 100
accuracies.
PAGE 17
Page | 17
Page | 18
In KNN Algorithm calculated the accuracy for all the Neighbours in the range of 1 to 100, plotted the graph for all
Page | 19
6. Summary :
The project required us to perform a lot of tasks. We started off with EDA and data
cleaning. The first step was to remove null values, separate male and female data from
the main data set.
The next part of the project included performing various operations to solve the given
questions. We used .str.split(expand=True).stack().value_counts() function.
We also used nltk toolkit for the second questions. Using spellchecker we removed all
the errors and typos from the tweet as well.
The dependent variable was “Gender”. .fillna() Function was used to remove the null
values.
We performed Feature Selection and Feature Engineering on the dataset to get the best
possible results.
Since the data required a lot of cleaning, and dataset was completely cateogorical the
accuracy was not more than 60%.
Page | 20
Team –
ML09B3
B. Lalith Kumar
Niti Goyal
Aswin CA
Himani Thakkar
Aishwary Krishna Singh
Sreeja Samanthapuri
Utkarsh Soni
Ruchita Singh
Santosh Reddy
Mohammad Abdul Saleem
Jay Chandra
Page | 21
Page | 22
Thank-You
Page | 23