You are on page 1of 10

Available online at www.sciencedirect.

com

ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2021) 000–000
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 193 (2021) 131–140

10th International Young Scientists Conference on Computational Science

LSTM-ANN & BiLSTM-ANN: Hybrid deep learning models for


enhanced classification accuracy
Md. Kowshera,*, Anik Tahabilderb, Md. Zahidul Islam Sanjidc, Nusrat Jahan Prottashad,
Md. Shihab Uddine, Md Arman Hossainf, Md. Abdul Kader Jilanig
a
Stevens
Stevens Institute of Technology, Hoboken, NJ 07030, USA
b
Wayne State University, Detroit, MI 48202, USA
c
BRAC University, Dhaka 1212, Bangladesh
d,e,f,g
Daffodil international university, Dhaka 1207, Bangladesh

Abstract

Machine learning is getting more and more advanced with the progression of state-of-the-art technologies. Since existing
algorithms do not provide a palatable learning performance most often, it is necessary to carry on the trail of upgrading the
current algorithms incessantly. The hybridization of two or more algorithms can potentially increase the performance of the
blueprinted model. Although LSTM and BiLSTM are two excellent far and widely used algorithms in natural language
processing, there still could be room for improvement in terms of accuracy via the hybridization method. Thus, the advantages of
both RNN and ANN algorithms can be obtained simultaneously. This paper has illustrated the deep integration of BiLSTM-ANN
(Fully Connected Neural Network) and LSTM-ANN and manifested how these integration methods are performing better than
single BiLSTM, LSTM and ANN models. Undertaking Bangla content classification is challenging because of its equivocalness,
intricacy, diversity, and shortage of relevant data, therefore, we have executed the whole integrated models on the Bangla content
classification dataset from newspaper articles. The proposed hybrid BiLSTM-ANN model beats all the implemented models with
the most noteworthy accuracy score of 93% for both validation & testing. Moreover, we have analyzed and compared the
performance of the models based on the most relevant parameters.
© 2021
© 2021TheTheAuthors.
Authors.Published
Publishedby by
Elsevier B.V. B.V.
ELSEVIER
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 10th International Young Scientists Conference on
Peer-review under responsibility of the scientific committee of the 10th International Young Scientists Conference on
Computational Science
Computational Science
Keywords: BiLSTM-ANN; LSTM-ANN; Supervised machine learning; Hybrid ML model; Fusion of ML model; NLP;

* Corresponding author.
E-mail address: ga.kowsher@gmail.com

1877-0509 © 2021 The Authors. Published by ELSEVIER B.V.


This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 10th International Young Scientists Conference on Computational
Science

1877-0509 © 2021 The Authors. Published by Elsevier B.V.


This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 10th International Young Scientists Conference on Computational
Science
10.1016/j.procs.2021.10.013
132 Kowsher et al. / Procedia
Md. Kowsher Computer
et al. / Procedia ScienceScience
Computer 00 (2021)
193000–000
(2021) 131–140 2

1. Introduction

With the progression of this new era of collaborations, the innovation is getting progressively customized.
Chatbots, automated response systems, and recommender systems are increasingly becoming really conversational,
learning our propensities and characters and even creating characters of their own. There are a lot of profoundly
useful applications of Artificial Intelligence. This development is empowering education and revolutionizing
individualized learning support for expanding quantities of teachers and students. Machine learning applications are
changing the world by advancing almost all the sectors, including medical care administrations, transportation,
automation, distinctive industrial production system, and so forth.
Content classification is the way toward grouping the contents and allotting labels to common literature sources
by some predetermined set of categories algorithmically [1]. The grouping of hard copy reports has generally been
in the area of library science, while the algorithmic arrangement of archives is mostly in data science and software
engineering. It is getting off the most significant and popular applications of natural language processing. Allocating
classifications to archives, which can be an online journal, newspapers, books, magazines, documents, and so forth,
has numerous applications.
One of the serious issues with algorithms like LSTMs is they are vulnerable to overfitting. Also, it is hard to
apply the dropout calculation to resolve the overfitting issue. Moreover, LSTM is slower in comparison to the
demand for the classification task. This is because of the successive operations in the LSTM layers. LSTM requires
consecutive contributions to figure out the hidden layer weights iteratively, which makes it a bit phlegmatic in terms
of performance than the other neural networks. Though BiLSTM performs better than the LSTM, it has limitations
too. In order to tackle those limitations, the hybridization method could be a competent way. The objective is to
combine two algorithms with raising the performance of the integrated model in such a way that it performs better
compared to the two algorithms that would perform independently.
In this exploration, we have utilized two major recurrent neural networks such as LSTM and BiLSTM, and
deeply integrated them to make two crossover models with artificial neural networks for classifying the substance
from well-known Bengali news sources. Recurrent Neural networks are capable of learning order dependence in
succession forecast issues and are utilized broadly in language processing applications. Both LSTM and BiLSTM
are assimilated with another deep ANN model in order to make two better models than the individual development
of LSTM and BiLSTM models. Further, we analyze and show that the hybrid model gives away preferable
exactness and characterization result over the single LSTM and BiLSTM models themselves. The contributions of
this paper are summarized as below:
 We have established a deeply integrated model LSTM-ANN and BiLSMT-ANN for content classification.
 We have compared the BiLSTM -ANN and LSTM-ANN with the single deep development of BiLSTM and
LSTM, respectively, with the help of different statistical measurements.
The entire association of this paper can be segmented in the following way: In section 2, we consolidated related
work, which addresses some related research directed in the earlier years. In section 3, we added the whole
procedure of our investigation. In addition, section 4 is about experimentations and analysis of the performance of
our intertwined model. Finally, we conclude our research in section 5 by briefing some future prospects of our
research work.

2. Related Work

Hidayatullah A.F. et al. in [2] have proposed a model for adult content arrangement by utilizing Long Short-Term
Memory (LSTM) neural network to characterize adult and non-adult content. In [3], Wang Y. et al. suggested a
model that incorporates an attention-based Long Short-Term Memory network for aspect-level emotion or sentiment
arrangement. The system can focus on various pieces of a sentence when various perspectives are taken as
information. Nowak J. et al. showed how to order text utilizing the LSTM model and its modification. Further, the
authors presented the predominance of this strategy over different calculations for text classification in [4]. Kumar
Md. Kowsher et al. / Procedia Computer Science 193 (2021) 131–140 133
Author name / Procedia Computer Science 00 (2021) 000–000 3

A. et al. presented a multi-modular way to deal with recognizing debacle-related educational substance from the
Twitter streams utilizing text and pictures together. Their methodology was based on LSTM and VGG-16 networks
that show critical improvement in the exhibition, as obvious from the approval result on seven distinctive
catastrophe-related datasets [5]. Yildirim O. et al. executed a convolutional auto-encoder (CAE) based nonlinear
pressure structure to decrease the sign size of arrhythmic beats. LSTM classifiers are utilized to perceive
arrhythmias utilizing ECG highlights naturally [6]. Chen T. et al. claimed to have applied a neural network-based
arrangement model to characterize obstinate sentences into three classes as per the number of targets that showed up
in a sentence. The authors then took care of Each type of sentence into a one-dimensional convolutional neural
organization independently for supposition order [7]. Sun Q. et al. considered the spam job posting identification as
the objective issue and assemble a nonexclusive machine learning pipeline for multi-lingual spam locations. The
fundamental parts of their research are feature generation and information relocation through transfer learning [8].
Dumais S. et al. in [9] investigated the utilization of progressive design for grouping an enormous, heterogeneous
assortment of web content. The authors have used a support vector machine (SVM) classifier, which has been
demonstrated to be productive and compelling for characterization, yet not recently investigated with regards to
various hierarchical classifications. Lytvyn V. et al. in [10] explored the issue of mechanized advancement of
fundamental ontology and further proposed an algorithm for extracting information from regular content. Karim F.
et al. proposed the enlargement of convolutional networks with LSTM sub-modules for time sequence classification.
The authors have fundamentally improved the presentation of convolutional networks with a nominal expansion in
model measure and need insignificant preprocessing of the informational collection [11]. Kowsher, Md et al.
showed the methodology of information extraction system from human names and expressed the better performance
of LSTM based model [12]. Unlike the aforementioned research, we have fused sequential model BiLSTM and the
ANN to get better accuracy in classification tasks.
In contrast to this research, every one of the papers addressed above shares one common flaw, practically
speaking. The authors have not looked for an original technique by incorporating at least two algorithms to change it
into a solitary hybrid model. Their strategies were more inclined to fall into traditional acts of data characterization
or other machine learning applications. Though a portion of the authors had great accomplishment in elevating the
exhibition through novel preprocessing strategies, they focused on only the application and execution of the
algorithm such as classification of adult contents, web information, and ECG data. Whereas, in this study, we are
not just zeroing in on another AI application yet in addition anticipating upgrading the current strategies by
hybridization and enhancing the accuracy.

3. Methodology

In order to develop the hybrid and deep integration model BiLSTM -ANN and LSTM-ANN, we have executed
the six parts separately and connected them to each other in a sequence such as data collection, preprocessing the
raw data, then separate the training and testing data for the purpose of training algorithm with validation, represent
the prepared data into 300 dimensions of vectors for each term using the pre-trained fastText word embedding
system [13], then the development of models deep integration and single deep model, finally compared among all
models and analysis. Fig. 1 shows the methodology of deep integration for content classification.
Model Training

Preprocessing word embedding BILSTM BILSTM


Data collection

Flattern
LSTM LSTM

Output Model performence ANN ANN ANN

Fig. 1. The methodology of the proposed work


134 Kowsher et al. / Procedia
Md. Kowsher Computer
et al. / Procedia ScienceScience
Computer 00 (2021)
193000–000
(2021) 131–140 4

3.1. Data Description

The dataset has been constructed by gathering articles from the most well-known Bangla newspapers' Prothom
Alo', 'BD News', and other prestigious Bangla news-related blogs, websites and it comprises 484622 articles. The
dataset was diverse and classified with the various names at the beginning until it was manually brunched. For
example, in 'BD News', they mentioned 'Desh (country)' in Bangladesh-related news wherein 'Prothom Alo'
mentions it as only 'Bangladesh'. Consequently, we merged them into one name, "Country". Thus all the articles are
ordered into 22 essential classifications, including Country, Friend, Job, Emigration, Economy, Education,
Entertainment, International, Children, Lifestyle, People's Conversation, Others, Opinion, Western, Contemplation,
Fun, Special-supplement, Sport, Technology, Trust, We-are, Unknown. In this dataset, the class 'trust' has a base
perception of 4393 examples, and the class 'Country' has the maximum observation of 129560 examples. Table 1
gives an outline of the dataset utilized for this undertaking.

Table 1. A brief overview of the samples in the dataset

Topic Number of Sample Topic Number of Sample


Country 129560 Others 5414
Friend 21137 Opinion 23813
Job 12100 Western 13005
Emigration 19932 Contemplation 9845
Economy 24335 Fun 15224
Education 28777 Special supplement 8751
Entertainment 36044 Sport 43894
International 25465 Technology 21251
Children 9455 Trust 4393
Lifestyle 9196 We-are 7462
People's Conversation 7296 Unknown 8273

3.2. Data Preprocessing

Data preprocessing is one of the significant steps of developing a machine learning model [14]. It helps to
increase the accuracy as well as to reduce the execution time [15]. In this work, we executed various data
preprocessing steps such as tokenization, spelling correction, stop words removing, punctuation removing, digit
removing, removing a non-Bangla character, removing Emoticons, word normalization, and lemmatization and data
splitting. Fig. 2 summarizes all the preprocessing steps of this work.

Tokenization Spelling correction 1. Stop word removal


2. Punctuation removal
3. Numeric character removal
4. Non-Bangla character removal
Lemmatization Word normalization
5. Emoticon removal

Fig. 2. The steps of preprocessing

Tokenization: Tokenization refers to splitting up a sentence, phrase, or word into numerous smaller linguistic
units named "Tokens". These tokens help to comprehend the NLP model and decipher the significance of the
content by investigating the arrangement of the words. In natural language processing, the language needs to be
Md. Kowsher et al. / Procedia Computer Science 193 (2021) 131–140 135
Author name / Procedia Computer Science 00 (2021) 000–000 5

analyzed and scrutinized under certain constraints and conditions. Tokenization breaks up a text to facilitate the
whole process of analyzing a language in detail.
Spelling correction: The collected text data was full of spelling blunders, and hence the raw data had to undergo
a decent scrutinization. In this dataset, most of the errors were type-error. Consequently, we made corrections to
these words. This process was primarily done by a deep learning-based Bengali spell checker module which used
the "Bangla Academy" word database as the training system. We pass every article and get a text without spelling
errors.
Stop words removal: Stop words are the words being used in a language without coordinating meaningful
information. In Bangla, there are a handful of stop words that do not denote any particular meaning rather than
helping another phrase or words to make sense of. In order to train a model, it's very important to remove the stop
words since stop words exist in a high quantity without providing any meaningful and unique sense.
Punctuation removal: Punctuation can play an important role when it comes to creating an emotional vibe to the
expression. Apart from that, punctuation has no role when data has to be converted to a word embedding method. In
our work, the word embedding method was incorporated, and for that reason, punctuations were to be removed
beforehand.
Digit removing: The data used in this investigation was mostly based on news coverage or bulletins in written
form. There existed plenty of numerical characters inscribed in many forms, such as the date and time of an event,
address, phone number, and game score, and so on. Generally, these numerical values do not provide any contextual
information that can be used to classify the contents. Therefore, all sorts of numerical characters were discarded
before training by the models.
Non-Bangla characters removal: The core purpose of our research was grounded on using a Bengali corpus.
Nevertheless, there existed a lot of Non-Bangla Characters. They were insignificant to the objective of our research
and negligible for the scheme we have propounded. Therefore, if there were any Non-Bangla Characters, we
handled them by either translating them into relevant Bengali characters or by discarding them depending on the
context and intendment of those characters.
Emoticon removal: The Internet has opened the horizon of globalization through social media. Emoticons play a
vital role in expressing situational emotion while communicating with another person on social media. Nonetheless,
the emoticons do not convey a message and do not contain any meaning themselves. Thereupon, Emoticons were
removed from the corpus for further investigation.
Word normalization: Many words were not used in their standard form in our corpus. Some of the spellings were
distorted, and some of them were in an informal format. For example, Bengali currency Taka is sometimes written
in "tk". We converted those words into their original spelling automatically by python programming. This process is
named word normalization, and it is a prerequisite for a well-developed NLP model [16].
Lemmatization: Lemmatization is the method of changing a word to its base structure [17]. The contrast between
stemming and lemmatization will be, lemmatization considers the unique circumstance and converts the word to its
significant base structure, while stemming simply eliminates the last couple of characters, frequently prompting off
base implications and spelling blunders. Lemmatization in phonetics is the way toward gathering the versatile types
of a word so they can be broken down as a solitary thing, recognized by the word's lemma or word reference
structure.
Data splitting: Training, validation, and testing are supposed to be the most important phase in the realm of
machine learning. In order to train the models, the corpus was methodized in a way, so it becomes compatible with
the computation process of the algorithms. Training allows the models to learn from their respective trials and error.
Our dataset was split into three segments such as 70% and 10%, and 20%. 70% of our dataset was used for training
the models, and the remaining 10% was kept for validation purposes, and the rest 20% is for the testing model.
Validation helps the model to evaluate itself and grease the wheels for training repeatedly. After the models were
trained, the testing data was used as the testing set for testing the model's performance.

3.3. Word embedding model

The word embedding model used here is fastText [18], a word embedding model developed by the Facebook
research team. In traditional word embedding models, a vector with one target element is incorporated by attributing
Kowsher et al. / Procedia Computer Science 00 (2021) 000–000 6
136 Md. Kowsher et al. / Procedia Computer Science 193 (2021) 131–140

a numerical value of 1 and others 0. The length of the vector had to be huge and exhausting to analyze. Ideally,
fastText is an extension to the word2vec model that represents a word by generating a vector summing up all the N-
gram characters. The advantage of fastText is that it spawns a better word embedding scheme for the rare words and
words out of the dictionary. In this work, a pre-trained fastText model with 300 dimensions has been utilized as a
part of the word embedding tech.

3.4. Development of the proposed models

In this work, we integrated two neural network-based hybrid deep learning models for the content classification
from the newspaper's data. These are LSTM-ANN and BiLSTM-ANN. We also build the single LSTM and
BiLSTM model so that the purpose of the hybrid integration can be analyzed based on the performance of the
classification problem. The fusion of the two types of models can be vividly visualized by Fig. 3.

Embedding layer ANN Dropout layer Embedding layer ANN Dropout layer

Dropout layer Flattern layer ANN Dropout layer Flattern layer ANN

Bi-LSTM Dropout layer Dropout LSTM Dropout layer Dropout

Dropout layer Bi-LSTM ANN Dropout layer LSTM ANN

Output Output

Fig. 3. The layers description of LSTM-ANN and BiLSTM-ANN models

Embedding Layer is by and large used in the uses of Natural Language Processing, yet it can likewise be utilized
with different assignments such as neural networks. In this investigation, the pre-trained word embeddings model
fastText was incorporated. To manage text-based information, we need to change over it into numbers prior to
implementing any machine learning algorithm. Moreover, all of the information esteems, including input values, are
required to be in numerical form for neural networks. The embedding layer makes it possible to change over each
word into a fixed-length vector of characterized size. The fixed length of word vectors expedites to address of words
in an efficient manner and helps to decrease dimensions and calculation complexities.
In the embedding layer, the input dimension is the size of the vocabulary, and the output dimension is the length
of the vector for each vocabulary. Here the accordant input dimension is in the embedding layer is 50000. The
output dimension is 300, which is the dimension of the fastText pertained model. Input length refers to the length of
input sequences when it is constant. The maximum input length is 200, which is the maximum sequence length of a
text in our research. The L2 regularization penalty is computed as . The
embedding l2 regularization we have used here is 0.0005.
Drop out layer is used to avoid overfitting. Since the results of a layer under dropout are arbitrarily subsampled,
it lessens the limit of the network during training the models. To do that dropout layer randomly sets input units to 0
with a recurrence of the rate at each progression during preparing time. Input sources that are not set to 0 are scaled
up by with the end goal that the total information sources are unaltered. After every LSTM and
dense layer, we used a 0.04 dropout layer. Moreover, we utilized used dropout layer after embedding the layer.
After the embedding layer was done, we passed the data to the BiLSTM or LSTM layer. We used two BiLSTM
or LSTM layers for each epoch. The first layer contains 128 memory units and 64 for the next layer. The activation
function used here is tanh, dropout is 0.02, and a recurrent dropout is 0.04. We have used both kernel regularizer and
bias regularizer as 0.005 of l2 regularizer.
Md. Kowsher et al. / Procedia Computer Science 193 (2021) 131–140 137
Author name / Procedia Computer Science 00 (2021) 000–000 7

Flatten is the activity of converting the pooled feature to a solitary column that is passed to a fully connected
layer. In a way, it is discarding all the dimensions of a vector except one. A flatten procedure on a tensor reshapes
the tensor to have a shape that is equivalent to the number of components contained in the tensor without
considering the batch dimension. If inputs are shaped without a feature axis, flattening adds an extra channel
dimension and output shape (batch, 1). After passing both LSTM layers, we used the flatten layer to pass by the
ANN.
The dense layer is a neural network layer that is strongly connected, which means that any neuron in the dense
layer receives feedback from all neurons in the previous layer. The dense layer is assumed to be the most widely
used layer in models. Here we used three dense layers as the formation of ANN. The unit of the first layer is 128,
and 64 for the next layer. For the first two dense layers, we used the regularizer function applied to the kernel
weights matrix as kernel_regularizer and activation function 'Relu'. Regularizer function applied to the bias vectors a
bias regularizer. Here bias regularizer = l2 regularizers of 0.001. Since we are solving a multiclass classification
problem, we used the softmax activation function with one unit in the last dense layer.

4. Experiment and Results

This examination investigates how better or poorer the introduced hybrid LSTM-ANN and BiLSTM-ANN
models perform in processing Bangla language to classify the contents from news articles. A robust experimental
blueprint has been organized to get an insightful result from the proposed method. We arranged our experiment by
developing the LSTM-ANN and BiLSTM -ANN models and further feeding our collected data to those models as
depicted in the methodology section. We have split the data set into training (70%), validation (10%), testing (20%),
and fed the corresponding section to trained and validated the models. Afterward, the test set was also exploited to
test the performance of the models. In this section, we have studied how traditional LSTM and BiLSTM models
perform compared to the proposed hybrid models classifying contents from Bangla text. To measure and analyze the
performance, we have utilized some statistical evaluation metrics. Besides, the important toolkit to develop the
models and establish the workflow is also described.

4.1. Experimental setup

In order to perform the experiment, we used a handful of mathematical and statistical tools. After the
mathematical models were made, we implemented the models by coding them in the python programming language.
Anaconda distribution with python 3.8 was our coding tool and go-to utensil for this research. To manipulate and
handle the corpus, we used the Pandas library. Pandas have numerous features to augment and process data
according to the demand of the project. Skicit-learn was used for making the data compatible with the algorithms.
Keras assisted in building up the LSTM and BiLSTM models. Moreover, Keras played the most crucial role in
crossing over the LSTM and BiLSTM with artificial neural networks. Also, Gensim [19], an open-source library
that incorporates cutting-edge statistical machine learning techniques for natural language processing, was
incorporated for the generation of FastText word embedding. To plot the result and display the outcome of the
analysis, we utilized matplotlib, a brilliant plotting tool that helps to plot graphs and charts.

4.2. Results and analysis

This section will assess the adequacy of our proposed strategy utilizing training sets and tuning significant
hyper-parameters. The test dataset was utilized for assessing the models. We chose the hyperparameter by manual
search and repeated the process until we reached an acceptable result. During this procedure, the hyperparameter is
tweaked according to our observation and intuition based on previous researches. For every model, the assessment
measurements considered were Accuracy, F1 Score (F1S), Recall (Rs), Precision (PS) and Hamming Loss (HL). In
addition, we also showed the ROC AUC curve to distinguish the performance of the models. Table 2 below depicts
the exhibitions of order calculation in the resulting segment addressing the center discoveries of our study derived
the techniques applied to accumulate and dissect data.
138 Kowsher et al. / Procedia
Md. Kowsher Computer
et al. / Procedia ScienceScience
Computer 00 (2021)
193000–000
(2021) 131–140 8

Table 2. Summary of training, testing, and validation performances of the models

Name of the models Accuracy F1 Score Recall Precision HL


BILSTM + ANN (Training) 0.96064 0.94051 0.944634 0.954615 0.02936
BiLSTM + ANN (Validation) 0.931891 0.900297 0.893288 0.914304 0.058109
BiLSTM + ANN (Testing) 0.930101 0.901201 0.890081 0.909435 0.059112
LSTM + ANN (Training) 0.946924 0.918872 0.918764 0.929223 0.043076
LSTM + ANN (Validation) 0.915701 0.867182 0.876227 0.878175 0.074299
LSTM + ANN (Testing) 0.909702 0.859781 0.875202 0.871177 0.079289
BiLSTM (Training) 0.935001 0.900021 0.905775 0.906703 0.054999
BiLSTM (Validation) 0.856177 0.771076 0.764232 0.812058 0.133823
BiLSTM (Testing) 0.850101 0.781064 0.769231 0.814071 0.143213
LSTM (Training) 0.901019 0.835804 0.849088 0.842679 0.095981
LSTM(Validation) 0.800051 0.749745 0.695793 0.902505 0.157849
LSTM(Validation) 0.801157 0.787425 0.699353 0.922551 0.154091

The proposed strategy incorporates two cross breed models like LSTM-ANN and BiLSTM-ANN. The table
above presents the comparison between our proposed approach and conventional profound learning content order
techniques based on the Bangla news dataset. The deep learning model upgraded the exhibition of content
classification. The BiLSTM-ANN model accomplishes accuracy of 96.06% (training), 93.01% (testing), and 93.18%
(validation), and it shows the best performance among all other developed models here. Then again, the LSTM-
ANN hybrid gets an accuracy of 94.69% (training) and 91.57% (validation). It appears that the LSTM validation
achieves the lowest accuracy score of 80%. Fig. 4 below demonstrates the accuracy of the developed over the
existing models in the classification of contents from Bangla text data.

Fig. 4. Accuracy comparison of all the models


It is pretty clear that the hybrid integrated models show better performance in training and validation data than
traditional models of BiLSTM and LSTM. If training loss is not too higher than the validation loss in an experiment,
Md. Kowsher et al. / Procedia Computer Science 193 (2021) 131–140 139
Author name / Procedia Computer Science 00 (2021) 000–000 9

this is an indication that the model could not be overfitting. Besides, to reduce the overfitting, neural network size
could be diminished, or dropout could be increased. If the validation loss and training loss are equal, then the model
could be underfitting. The most successful procedure to getting effective models is to attempt diverse dropout values
within 0 and 1. This study experimented with different dropout values and successfully maintained a fair balance
between training loss and validation loss. The ROC curve is also another significant parameter that determines the
performance of classifiers. Fig. 5 below shows the comparison of ROC measurement for our proposed methods and
similar existing methods.

Fig. 5. ROC AUC curve comparison of all the models

5. Conclusion and Future work

This paper shows that two deeply integrated models named BiLSTM-ANN and LSTM-ANN which can better
perform the text content classification than the single development models like BiLSTM, LSTM and ANN. In order
to validate the models with an experiment, we have used text articles from Bangla newspapers and blogs. To
determine the performance of the proposed models, we have considered the six most relevant measurements from
the performance metrics. Overall, the BiLSTM-ANN showed the best result in training, validation, and testing steps
among all other models.
This BiLSTM-ANN model can be further made more useable and congruent for the developers and ML
enthusiasts by building an application programming interface on which our team is already working. We ought to
meliorate these models for better performance by blending them with convolutional neural networks (CNN) and
other machine learning algorithms. Following the rationale of this research, we can furthermore expect to exploit the
same technique in various machine learning applications. We can investigate how this hybridization method works
in small amounts of data and combine appropriate machine learning strategies for more efficient results. Moreover,
we are considering combining three or more various algorithms to leverages the advantages of each of the individual
approaches. Besides, to make these models robust, we plan to implement this model in other relevant datasets. To
conclude, this model can be explored further by multifarious experimentations and can be made more accurate and
swifter.
Kowsher et al. / Procedia Computer Science 00 (2021) 000–000 10
140 Md. Kowsher et al. / Procedia Computer Science 193 (2021) 131–140

References

[1] Sebåk M, Kacsuk Z. The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball
Approach. Polit Anal 2021;29:236–49. https://doi.org/10.1017/pan.2020.27.
[2] Hidayatullah AF, Hakim AM, Sembada AA. Adult Content Classification on Indonesian Tweets using LSTM Neural Network. 2019
Int. Conf. Adv. Comput. Sci. Inf. Syst., IEEE; 2019, p. 235–40.
[3] Wang Y, Huang M, Zhu X, Zhao L. Attention-based LSTM for aspect-level sentiment classification. Proc. 2016 Conf. Empir. methods
Nat. Lang. Process., 2016, p. 606–15.
[4] Nowak J, Taspinar A, Scherer R. LSTM recurrent neural networks for short text and sentiment classification. Int. Conf. Artif. Intell.
Soft Comput., Springer; 2017, p. 553–62.
[5] Kumar A, Singh JP, Dwivedi YK, Rana NP. A deep multi-modal neural network for informative Twitter content classification during
emergencies. Ann Oper Res 2020:1–32.
[6] Yildirim O, Baloglu UB, Tan R-S, Ciaccio EJ, Acharya UR. A new approach for arrhythmia classification using deep coded features
and LSTM networks. Comput Methods Programs Biomed 2019;176:121–33.
[7] Chen T, Xu R, He Y, Wang X. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert
Syst Appl 2017;72:221–30.
[8] Sun Q, Amin M, Yan B, Martell C, Markman V, Bhasin A, et al. Transfer learning for bilingual content classification. Proc. 21th ACM
SIGKDD Int. Conf. Knowl. Discov. data Min., 2015, p. 2147–56.
[9] Dumais S, Chen H. Hierarchical classification of web content. Proc. 23rd Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., 2000, p.
256–63.
[10] Lytvyn V, Burov Y, Kravets P, Vysotska V, Demchuk A, Berko A, et al. Methods and Models of Intellectual Processing of Texts for
Building Ontologies of Software for Medical Terms Identification in Content Classification. IDDM, 2019, p. 354–68.
[11] Karim F, Majumdar S, Darabi H, Chen S. LSTM fully convolutional networks for time series classification. IEEE Access 2017;6:1662–
9.
[12] Kowsher M, Islam Sanjid MZ, Das A, Ahmed M, Hossain Sarker MM. Machine Learning and Deep Learning based Information
Extraction from Bangla Names. Procedia Comput Sci 2020;178:224–33. https://doi.org/10.1016/j.procs.2020.11.024.
[13] Kowsher, Md. and Sobuj, Md. Shohanur Islam and Shahriar, Md. Fahim and Prottasha, Nusrat Jahan, Development of FasTtext Pre-
Trained Model for Bangla NLP Research . Research on Computational Language,2021.
[14] Nawi NM, Atomi WH, Rehman MZ. The Effect of Data Pre-processing on Optimized Training of Artificial Neural Networks. Procedia
Technol 2013;11:32–9. https://doi.org/10.1016/J.PROTCY.2013.12.159.
[15] Razavi AR, Gill H, Åhlfeldt H, Shahsavar N. A Data Pre-processing Method to Increase Efficiency and Accuracy in Data Mining. Lect
Notes Comput Sci (Including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 2005;3581 LNAI:434–43.
https://doi.org/10.1007/11527770_59.
[16] Toman M, Tesar R, Jezek K. Influence of Word Normalization on Text Classification n.d.
[17] Kowsher M, Tahabilder A, Hossain Sarker MM, Islam Sanjid MZ, Prottasha NJ. Lemmatization Algorithm Development for Bangla
Natural Language Processing. 2020 Jt. 9th Int. Conf. Informatics, Electron. Vis. 2020 4th Int. Conf. Imaging, Vis. Pattern Recognition,
ICIEV icIVPR 2020, 2020. https://doi.org/10.1109/ICIEVicIVPR48672.2020.9306652.
[18] Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching Word Vectors with Subword Information. Trans Assoc Comput Linguist
2017;5:135–46. https://doi.org/10.1162/tacl_a_00051.
[19] Rehurek R, Rehurek R, Sojka P. Software Framework for Topic Modelling with Large Corpora. Proc Lr 2010 Work NEW
CHALLENGES NLP Fram 2010:45--50.

You might also like