Professional Documents
Culture Documents
Research Article
News Text Classification Method Based on the GRU_CNN Model
Lujuan Deng,1 Qingxia Ge,1 Jiaxue Zhang,1 Zuhe Li ,1,2 Zeqi Yu,1 Tiantian Yin,1
and Hanxue Zhu1
1
School of Computer and Communication Engineering, Zhengzhou University of Light Industry,
Zhengzhou 450002, Henan, China
2
Henan Key Laboratory of Food Safety Data Intelligence, Zhengzhou University of Light Industry,
Zhengzhou 450002, Henan, China
Received 25 June 2022; Revised 25 July 2022; Accepted 16 August 2022; Published 31 August 2022
Copyright © 2022 Lujuan Deng et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The convolutional neural network can extract local features of text but cannot capture structure information or semantic re-
lationships between words, and a single CNN model’s classification performance is low, whereas GRU can effectively extract
semantic information and global structure relationships of text. To address this problem, this paper proposes a news text
classification method based on the GRU_CNN model, which combines the advantages of CNN and GRU. The model first trains
word vectors as the embedding layer with the Word2vec model and then extracts semantic information from text sentences with
the GRU model. Following that, this model employs the CNN model to extract crucial semantic information features and finally
completes the classification through the Softmax layer. The experimental results reveal that the GRU_CNN hybrid model
outperforms single CNN, LSTM, and GRU models in terms of classification effect and accuracy.
categorization in recent years. Deep neural network models requires a lot of calculations during training due to its
have superior feature extraction ability to typical machine numerous parameters and the complexity of the calculations
learning algorithms, making them ideally suited for appli- between each gate.
cations in the area of text categorization [10]. Chung et al. proposed the GRU model [17] on this basis,
The primary goal of this study is to address the problem which simplified the structure and training parameters of
of Chinese news text classification using deep learning LSTM and improved the efficiency of training. However,
models. As GRU can efficiently extract the semantic in- GRU is unable to compute in parallel and is still unable to
formation and global structural information of text, and as resolve the gradient vanishing problem entirely.
CNN cannot effectively extract the structural information of Graves and Schmidhuber [18] applied the bidirectional
text and the semantic relationship of sentences, the classi- LSTM model for the first time to solve the categorization
fication accuracy of a single CNN model is low. Therefore, problem and achieved better classification results than the
this paper combines the benefits of the CNN model and the unidirectional LSTM model. However, compared with
GRU model to propose the GRU_CNN model for the LSTM, BiLSTM has more parameters and is more difficult to
classification of news text. calculate. Cao et al. [19] utilized the BiGRU model to cat-
egorize Chinese text by synthesizing the context of the ar-
2. Related Work ticle. The model is simple, with a few parameters and fast
convergence.
Among deep learning classification models, there are two Li and Dong [20] employed CNN to extract text’s local
main types of common categories, namely, convolutional features and the BiLSTM to extract text’s global features in
neural network and recurrent neural network, which have order to fully capitalize on the benefits of the features ob-
many applications in text categorization. tained by the two models and enhance the model’s ability to
Kim [11] first proposed a text categorization model classify data.
based on CNN in 2014, which converts words into fixed- In conclusion, a single neural network model often has
length word vectors through Word2vec as the input of CNN the problem of low classification accuracy. This research
and then uses multisize convolution to check word vector provides a news text categorization strategy based on the
convolution. Finally, pooling and classification are per- GRU_CNN model in order to get a better classification
formed. The key advantage of CNN is that it has a quick performance. This model extracts more accurate text fea-
training speed and can efficiently extract local text features, tures, and its effectiveness has been demonstrated through
but the pooling layer will lose a lot of vital information and experiments.
overlook the association between the local and the whole.
After Kim proposed the model, Zhang and Wallace [12] 3. GRU_CNN Model
also proposed a text categorization method based on CNN
and performed numerous comparative experiments under The whole process of the GRU_CNN hybrid model is as
various hyperparameter settings. Besides, they also offered follows: Firstly, input the news text data, preprocess the news
parameter tuning advice and have some experience with text, and train the text using the Word2vec model to produce
hyperparameter configuration. a word vector comprising the text’s overall information.
Johnson and Tong proposed the deep pyramid con- Secondly, to extract contextual semantic information from
volutional neural network (DPCNN) [13], which has low the text, the vector representation of words obtained by
complexity and excellent categorization power. This model Word2vec is fed into the GRU model for semantic infor-
primarily investigates word-level CNN and improves ac- mation extraction. Then, the output results of the GRU
curacy by deepening the depth of the network. DPCNN can model are input into the CNN model for further semantic
extract long-distance text dependencies by constantly feature extraction. Finally, the final word vector is input to
deepening the network, but the computational complexity the Softmax layer for text classification. Figure 1 depicts the
will increase as well, which makes practical applications GRU_CNN model’s structure.
problematic. Yao et al. proposed a graph-based convolu-
tional neural network (GCN) [14], which is more effective
for small datasets categorization. GCN’s limitations of 3.1. Word2vec Embedding Layer. This paper uses the classic
flexibility and scalability are its main drawbacks. Word2vec [21] model for word embedding, and the
Mikolov et al. [15] employed the recurrent neural net- Word2vec of the Gensim toolkit is used to train word
work model to achieve text categorization. RNN can ac- vectors, which is simple and fast. The Word2vec model is
commodate inputs of arbitrary length, and the model size divided into three layers: input, hidden, and output. The
does not grow with the input length. However, the RNN model’s input is a one-hot vector; that is, the textual in-
model relies on learning features for a long time, which is formation is represented by x1 , x2 , . . . , xV . Assuming the
prone to the problem of gradient dispersion. size of the vocabulary is V and the dimension of each hidden
Hochreiter and Schmidhuber proposed a long short- layer is N, the input to the hidden layer is represented by a
term memory network (LSTM) [16] based on RNN, which matrix W of size V × N. Similarly, word vectors can be
makes up for the classic RNN’s poor learning performance obtained by connecting the hidden and output layers via a
for long-distance sentences as well as the problem of gra- N × V matrix W′ . Figure 2 depicts the Word2vec model’s
dient disappearance or gradient explosion. However, LSTM structure.
International Transactions on Electrical Energy Systems 3
GRU
x1
softmax
GRU
x2
xn-1 GRU
xn
GRU
W(t-2) W(t-2)
SUM
W(t-1) W(t-1)
W(t) W(t)
W(t-1) W(t-1)
W(t+2) W(t+2)
FeatureMap
Max-pooling
Text category
Convolution fully
kernels connected
layer
(1) LR: logistic regression is a machine learning cate- 4.4. Evaluation Index. The experiment employs the accuracy
gorization method for estimating the likelihood of of text categorization as its evaluation index in order to
something. examine the effectiveness of the model put forth in this
study. The accuracy rate indicates the proportion of correctly
(2) NB: Naive Bayes is a machine learning categorization
predicted samples to the total samples, and the calculation
algorithm that leverages Bayes’ theorem.
formula is shown in
(3) CNN: it is the most basic convolutional neural
network for text categorization. 1 K
accuracy � i|.
|yi � y (5)
(4) LSTM: it is the most basic long and short memory K i�1
network for text categorization.
In formula (5), K represents the sample capacity of the
(5) GRU: it is the most basic gated recurrent unit net- sample set, yi represents the label of the sample xi , and yi
work for text categorization. is used as the prediction result. When yi and y i are
(6) LSTM + CNN: firstly, the LSTM model is utilized to consistent, the value of the precision rate is 1; otherwise,
extract the contextual semantic information from the the value is 0.
input text, and then the CNN model is used to extract
the feature of the semantic information. Finally,
complete classification through the Softmax layer. 4.5. Analysis of Experimental Results. The error and accu-
racy of GRU_CNN model obtained by training on the
4.3. Parameters. The experiment presented in this paper is training set are shown in Figure 6. During the experiment,
based on the TensorFlow framework, and the model’s pa- the accuracy and loss values are output on the training set
rameter settings determine how well the model trains. Ta- and validation set every 100 batches and recorded and
ble 2 displays the GRU_CNN model’s specific parameter saved. Continuously train the model, saving the best-
settings. performing model for testing on the test set. We can see
6 International Transactions on Electrical Energy Systems
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
step_100
step_1000
step_1300
step_1500
step_1700
step_1900
step_2100
step_2300
step_2500
step_2700
step_2900
step_3100
step_3300
step_3500
step_3700
step_3900
step_4100
step_4300
step_4500
step_300
step_500
step_700
step_900
accuracy
loss
Figure 6: Error and accuracy results of model training.
that the model’s accuracy on the training set has achieved in this experiment is Adam. The Adam optimization algo-
good results. rithm has high computational efficiency and fast conver-
In the experiment, the number of epochs can affect the gence speed. In order to make the algorithm more efficient,
model’s performance. This study discusses the model clas- different learning rate values are selected for experiments.
sification effect under different epoch numbers through Table 5 and Figure 9 illustrate the experimental outcomes. As
experiments. Table 3 and Figure 7 illustrate the experimental shown by the experimental results, the model’s accuracy is
outcomes. The experimental results show that the number of highest when Adam’s corresponding learning rate is 0.001.
epochs required by different models to achieve the optimal Therefore, the learning rate of the Adam optimizer in this
effect varies. The GRU_CNN model reaches the highest case is 0.001.
accuracy rate when the epoch number is 6. The numbers of Word vector dimension is also an important parameter.
epochs at which CNN, LSTM, GRU, and LSTM + CNN The larger the dimension, the more feature information the
achieve the best results are 3, 2, 4, and 5, respectively. model can learn, and the greater the risk of the model
Figure 6 shows that the model’s performance first rises and overfitting. However, the lower the word vector dimension,
then falls with the increase of the number of epochs, which the greater the risk of the model underfitting. Therefore, the
means that, before reaching the best effect of the model, the appropriate word vector dimension is also important for the
ability of model learning characteristics will continue to model’s training effect. This paper selects different dimen-
increase as the number of epochs increases. After the best sions of the word vectors to train. Table 6 and Figure 10
results, the model will show an overfitting phenomenon so illustrate the experimental outcomes. As can be observed
that the model’s performance will decrease. from the experimental findings that all models’ accuracy
During the training process of the model, the dropout reaches the highest in dimension 100, this paper chooses 100
method is introduced into the model in this experiment. The dimensions to experiment.
value of dropout is also an important parameter, and an Figure 11 illustrates the experimental results of all
appropriate value can enable the model to converge better, models on the Cnews dataset. For proving the performance
avoid overfitting, and improve its performance. Therefore, of the GRU_CNN model, several classical models are se-
we choose different dropout values for training. The dropout lected for comparative experiments. In the classification
values set in this experiment are [0.2, 0.3, 0.4, 0.5, 0.6, 0.7], model based on machine learning, logistic regression (LR)
respectively. The optimal dropout values are selected based and Naive Bayes (NB) are selected for comparative exper-
on the training results of the model. Table 4 and Figure 8 iments. In the classification model based on deep learning,
illustrate the experimental outcomes. The findings show that this paper selects a single CNN, LSTM, and GRU model, as
only the LSTM model has the best effect when the dropout well as a hybrid model LSTM_CNN for comparative ex-
value is 0.7, and the other models have the best effect when periments. The experimental results are selected to compare
the dropout value is 0.5. At this time, the dropout value can the best effect of each model.
effectively prevent the model from overfitting on the premise As can be noticed from the outcomes of the experi-
of ensuring accuracy. Therefore, the dropout value of the ments, the GRU_CNN model has the best classification
model in this paper is set to 0.5. effect and the highest accuracy on the Cnews dataset.
During the course of updating parameters by gradient Among the traditional machine learning classification
backpropagation in the neural network, the optimizer used algorithm models, the classification accuracy of logistic
International Transactions on Electrical Energy Systems 7
99
98
97
accuracy (%)
96
95
94
93
92
1 2 3 4 5 6 9 10
epoch number
CNN LSTM+CNN
LSTM GRU+CNN
GRU
Figure 7: The accuracy curve of each model under different epoch numbers.
regression (LR) on the Cnews dataset is superior to that of account, the GRU_CNN model this study proposes per-
the Naive Bayes model (NB), and the classification ac- forms better.
curacy of all deep learning models is higher than that of
traditional machine learning classification models, which
is also the reason why deep learning models are popular 4.6. Ablation Experiment. In order to verify the effectiveness
nowadays. In the deep learning model, the hybrid neural of using Word2vec to train word vectors as the embedding
network model has a greater categorization accuracy than layer, this paper first selects CNN, LSTM, and GRU as the
the single deep learning model, and the CNN model has a comparative experimental model and compares the exper-
better categorization effect than the LSTM and GRU imental results of Word2vec_CNN, Word2vec_LSTM, and
models, as shown by the results. Although the classifica- Word2vec_GRU. The experimental results are shown in
tion accuracy of the GRU_CNN model and the Table 7.
LSTM_CNN model is not significantly different, the GRU It can be seen that using Word2vec to train the word vector
model has fewer parameters and is easier to calculate than as the embedding layer can effectively optimize the vector
the LSTM model. As a result, when everything is taken into representation of the input text, so as to obtain a better training
8 International Transactions on Electrical Energy Systems
99
98
97
accuracy (%) 96
95
94
93
92
0.2 0.3 0.4 0.5 0.6 0.7
dropout number
CNN LSTM+CNN
LSTM GRU+CNN
GRU
Figure 8: The accuracy curves of each model under different dropout values.
100
90
80
70
accuracy (%)
60
50
40
30
20
10
0.1 0.01 0.001 0.0001 0.00001
learning_rate numbers
CNN LSTM+CNN
LSTM GRU+CNN
GRU
Figure 9: The accuracy curves of each model under different learning rates.
effect. Compared with CNN, the accuracy of Word2vec_CNN Next, in order to verify the effectiveness of the
has increased by 1.23%. Compared with LSTM, the accuracy of Word2vec_GRU_CNN model proposed in this paper in the
Word2vec_LSTM has increased by 3.32%. Compared with text classification task, the Word2vec_CNN and Word2-
GRU, the accuracy of Word2vec_GRU has increased by 2.33%. vec_GRU models that performed relatively well in the above
International Transactions on Electrical Energy Systems 9
99
98
98
98
accuracy (%)
97
96
95
94
93
92
50 100 150 200 250 300
embedding_dim numbers
CNN LSTM+CNN
LSTM GRU+CNN
GRU
Figure 10: The accuracy curves of each model under different word vector dimensions.
99
97.78 97.86
98
96.92 96.78
97
95.92
96
95.18
accuracy (%)
95
94
93
92
92
91
90
LR NB CNN LSTM GRU LSTM+CNN GRU+CNN
model
Figure 11: Classification accuracy of each model.
experiments were selected for comparison. Experiment with From the experimental results in Table 8, it can be seen
the same parameter settings and the experimental results are that, compared with Word2vec word embedding followed
shown in Table 8. by only one layer of training network, the GRU_CNN model
10 International Transactions on Electrical Energy Systems