You are on page 1of 29

Machine Learning

compared to Deep
Learning - algorithms
Filote Cosmin
Tudor Alexandrescu
The Dataset

 IMDB Dataset of 50K Movie Reviews. 0


One of the other reviewers
has mentioned that ...
positive

A wonderful little
 It has 50K movie reviews for natural 1 production. <br /><br positive
/>The...
language processing or Text analytics. I thought this was a
2 wonderful way to spend positive
 This is a dataset for binary sentiment ti...
Basically there's a family
classification containing substantially 3
where a little boy ...
negative

more data than previous benchmark 4


Petter Mattei's "Love in the
positive
Time of Money" is...
datasets Probably my all-time
5 positive
favorite movie, a story o...
 It has nearly 25k for positive 6
I sure would like to see a
positive
resurrection of a u...
sentiment and 25 for negative so it is This show was an
a balanced. 7 amazing, fresh &
innovative i...
negative

Encouraged by the
8 positive comments about negative
this...
If you like original gut
9 positive
wrenching laughter yo...
The Dataset

 The problem is that we need to do some cleanup before testing our models. The text is
full of html tags. So we will eliminate them. Eliminate the numbers, bring everything to
lower case and lemmatize the text. After that we can see some interesting graphs.
Graphs – most used words

Positive Sentiment Negative Sentiment


 dsa  dsa
Graphs – most used words -
WordCloud

Positive Sentiment Negative Sentiment


Data preparation for training

 The problem is : how to classify if a review is positive or negative ? But beforewe need
to prepare our data for training. For the sentiment we will convert positive to 1 and
negative to 0. But for the text ( the review ) we will use TF-IDF .

tfidf_vec = TfidfVectorizer(ngram_range=(1, 3))


tfidf_vec_train = tfidf_vec.fit_transform(corpus_train)
tfidf_vec_test = tfidf_vec.transform(corpus_test)
ML – Models

 Logistic Regression – we
obtained a high accuracy from
the start. We only chose the
random state parameter.
ML – Models

 With SVM we Obtained even a


higher accuracy : 0.9. We used
the value for Regularization
Parameter(C) : 0.5
ML – Models

 Next one is Random Forest


Classifier. The score obtained
was not so great but we used it
with max_depth = 2
ML – Models

 We also thought to try xgboost.


We obtained 0.85.
ML – Models

 We obtained a great result for


Multinomial naïve Bayes.
ML – Models

 The last one is K-NN. An


accuracy of 0.77. We had other
algorithms that performed way
better.
ML – Models - Conclusions

 In general all ML Algorithms that we used performed well (only Random Forest
Classifier was poor). The average accuracy was over 80%. In our case it is ok but there
might be other cases (like medical ones) where an accuracy of 80% is a very poor one.
DL - Models

 First model that we tried was a


combination with the layer from
https://tfhub.dev/google/tf2-pre
view/gnews-swivel-20dim/1
that takes as an input a batch
of sentences of 1-D tensor of
strings.
 After that we have a Dense
layer with 30 units and
activation relu.
 The last one has obviously 2
units for our output and
activation softmax.
DL - Models

 After 20 epochs this is what we


got.
DL –
Models
training
vs testing
loss &
accuracy
DL – Models – v1 from scratch

 The second model that we tried


was a combination with the layer
Embedding
(https://keras.io/api/layers/core_la
yers/embedding/)1 that takes as
an input the vocabulary size (8185
in our case) and the output size is
16.
 After that we have
GlobalAveragePooling1D and a
Dense layer.
 We trained it on batches of 1000.
 After 20 epochs we had an
accuracy of 88.21%.
DL – Models
– v1 from
scratch
training vs
testing loss
& accuracy
DL – Models – v2
from scratch

 The third model that we tried was with a


Tokenizer from Keras. We put 5000
features and then we fitted to the
reviews. We preprocessed then the
sentiment.
 We used an Embedding layer with 5000
as input and 256 as output followed by a
Bidirectional LSTM. After that a
GlobalMaxPooling1D, a Dense layer with
128 and activation relu, a Dropout with
0.2 and another Dense with 2 units for
the output with activation softmax.
DL – Models – v2 from scratch

 After 10 epochs this is what we


got.
DL – Models
– v2 from
scratch
training vs
testing loss
& accuracy
DL – Models – v3
from scratch
 The 4th model that we tried uses the
same Tokenizer from Keras. Only the
model was changed a bit to include a
Convolutional layer with 64 filters, a
kernel size of 3, padding same and
activation relu.
 We used an Embedding layer with 5000
as input and 256 as output followed by
the convolutional layer. After that a
GlobalMaxPooling1D, A Dropout of 0.2, a
Dense layer with 128 units and activation
relu, a Dense layer with 64 units and
activation relu ,a Dropout of 0.2 and
another Dense with 2 units for the output
with activation softmax.
DL – Models – v3 from scratch

 After training for 10 epochs with


a batch_size of 32 this is what
we got.
DL – Models
– v3 from
scratch
training vs
testing loss
& accuracy
DL – Models – v4
from scratch
 The 5th model that we tried uses the
same Tokenizer from Keras. The model
is very similar to V2.
 We used an Embedding layer with
5000 as input and 256 as output
followed by a Bidirectional LSTM(128).
After that a Dropout of 0.2, a Dense
layer with 16 units and activation relu,
a Dropout with 0.2 and another Dense
with 2 units for the output with
activation softmax.
DL – Models – v4 from scratch

 After training for 10 epochs with


a batch_size of 128 this is what
we got.
DL – Models
– v4 from
scratch
training vs
testing loss
& accuracy
DL – Models – v5
from scratch
 The last model that we tried uses the
same Tokenizer from Keras. The model
is very similar to V2.
 We used an Embedding layer with
5000 as input and 256 as output
followed by a GRU layer with 64 units.
After that a Dropout of 0.25 and a
Dense layer with 1 unit for the output
with activation sigmoid.
 After training for 10 epochs with a
batch_size of 512 we obtaind an
accuracy of 86.04%.
Conclusions

 The used shallow algorithms (machine learning algorithms) generated (most of the
time) very good results related to the global accuracy (on each class). This is an
interesting fact given the complexity of the used algorithms.
 Related to the deep learning algorithms, the obtained results are very similar.
Moreover, in each case there is a tendency to overfitting. While we used different
embedding styles and network architectures, we were not able to obtain a global
accuracy above 90% (on the test sample), as happened when using machine learning
algorithms.

You might also like