Uncorrected Author Proof

Journal of Intelligent & Fuzzy Systems xx (20xx) x–xx 1
DOI:10.3233/JIFS-191690
IOS Press
1 Topic modeling in short-text using

2 non-negative matrix factorization based
f
on deep reinforcement learning
roo
3
4 Zeinab Shahbazi∗ , Faisal Jamil and Yungcheol Byun

Department of Computer Engineering, Jeju National University, Jejusi, Jeju Special Self-Governing Provience,
rP
5
6 Korea
tho
7 Abstract. Topic modeling for short texts is a challenging and interesting problem in the machine learning and knowledge
8 discovery domains. Nowadays, millions of documents published on the internet from various sources. Internet websites are full
9 of various topics and information, but there is a lot of similarity between topics, contents, and total quality of sources, which
10
Au
causes data repetition and gives the user the same information. Another issue is data sparsity and ambiguity because the length
11 of the short text is limited, which causes unsatisfactory results and give irrelevant results to end-users. All these mentioned
12 issues in short texts made an interesting topic for researchers to use machine learning and knowledge discovery techniques
13 to discover underlying topics from a massive amount of data. In this paper, we propose a combination of deep reinforcement
14 learning (RL) and semantics-assisted non-negative matrix factorization model to extract meaningful and underlying topics
15 from short document contents. The main objective of this work is to reduce the problem of repetitive information and data
d
16 sparsity in short texts to help the users to get meaningful and relevant contents. Furthermore, our propose model reviews an
17 issue of the Seq2Seq approach based on the reinforcement learning perspective and provides a combination of reinforcement
cte
18 learning and SeaNMF formulation using the block coordinate descent algorithm. Moreover, we compare different real-world
19 datasets by using numerical calculation and present a couple of state-of-art models to get better performance on short text
20 document topic modeling. Based on experimental results and comparative analysis, our propose model outperforms the state
21 of art techniques in terms of short document topic modeling.
rre
22 Keywords: Topic modeling, knowledge discovery, short text, non-negative matrix factorization, machine learning
23 1. Introduction vide valuable data to extract the user’s topic interests 30

co
and to drive better decisions. Hence, the automatic 31
24 In the past two decades, topic modeling for short identification and extraction of valuable topics from 32
25 types of text documents becomes an effective area in short text data is a fundamental problem in a wide 33
26 machine learning and natural language processing. range of applications, such as context analysis [1], 34
Un
27 Every day a major quantity of short text documents and text mining and classification [2]. Nowadays, 35
28 distributed in social media and another source, e.g., knowledge discovery becomes a fundamental and 36
29 emails, search queries, image tags, advertisements, challenging task that achieves a lot of attention from 37
headlines, status messages, etc. These sources pro- researches all over the world [1–5]. The length of 38
the short text is limited, which causes data sparsity 39
∗ Corresponding
and ambiguity. The sparsity and noisy data in short 40
author. Zeinab Shahbazi, Department of Com-
puter Engineering, Jeju National University, Jejusi 63243, Jeju
text documents are major problems in knowledge dis- 41
Special Self-Governing Provience, Korea. E-mail: zeinab.sh@ covery from a large corpus and make the process of 42
jejunu.ac.kr. analyzing underlying knowledge difficult. There are 43
ISSN 1064-1246/20/$35.00 © 2020 – IOS Press and the authors. All rights reserved
2 Z. Shahbazi et al. / Topic modeling for short-Text
44 different topic identification techniques available to dotype of the document and detect words transpire 96
45 uncover and unearth content from short texts. [3, 4, 13, 14]. But the topics generated by using this 97
46 Topic identification techniques are used to dis- strategy is biased based on pseudo-texts, which cause 98
47 cover semantically meaningful and valuable topics exploratory. 99
48 from a massive amount of data [6–8]. These tech- In particular, the probability combination of most 100
49 niques are capable of identifying underlying hidden of the incoherent short texts into pseudo-texts is high. 101
50 patterns (topics) that explain and identify the simi- Another type of challenge in short texts is the lack of 102
51 larity between documents. Another aspect of topic words or the limited length of the text. The inter- 103
f
52 modeling is to consider purification of the document nal semantic relationship between words proposed 104
roo
53 clustering by unsupervised machine learning strat- to extract information through short text documents, 105
54 egy. This process is opposite to document clustering, which based on this strategy, the problem of lack 106
55 in topic modeling procedure, many topics can occur of words in the text is quite settled. The main rea- 107
56 in the individual document, but frequent topics among son to propose this strategy is the fact of semantic 108
57 the document have more process in the training set. information of words in document which efficiently 109
rP
58 Frequently, topic modeling divided into two groups, obtain along with neural network system, which rely 110
59 i.e., the first group known as non-negative matrix fac- on word embedding techniques, e.g., word2vec [15] 111
60 torization (NMF) [9], and the second group known and Glove [16]. 112
61 as latent Dirichlet allocation (LDA) [6]. NMF topic Different word embedding techniques are pre- 113
tho
62 model is a matrix base model that immediately reads sented [17–19] to discovering and underlying hidden 114
63 the topics by break-down the document terms into topics from short texts based on the usage of 115
64 a matrix that presents bag-of-words into text corpus words from various sources, e.g., word embed- 116
65 with low-rank matrix factors. A low-rank matrix is a ding techniques depends on Google News and 117
66 machine learning algorithm technique and a way to Wikipedia. Based on word semantic differences 118
Au
67 understand the recommendation system process. e.g., between Wikipedia and short text documents causes 119
68 it means the given matrix Z where Zi , X demonstrates noise in topics. Therefore, word embedding can be 120
69 the score which teacher A gives to student B. LDA an effective process in modeling short text topics 121
70 is a statistical measure for identifying and extracting because it contains a huge number of words with 122
71 hidden topics that occur in a set of documents. similar semantic properties, which is very efficient 123
To show the prominent performance of NMF- in achieving better performance to cluster topic mod-
d
72 124
73 based model, clustering and dimension reduction els. Another way of raising the performance of topic 125
cte
74 techniques are used to get a high-dimensional dataset models is skip-gram with negative sampling (SGNS). 126
75 [10–12]. Regular documents primarily consist of This methodology is useful to obtain the relevance 127
76 consistent document structure and compact data con- between words using a small type of sliding window 128
77 tents. This document class exhibits the XML view of approach [15, 20]. The SGNS is very useful because 129
78 relational data. The data range of those documents is each window can present an individual document. In 130
rre
79 in the range of within 40 and 60 %. Irregular docu- that case, word-context semantic relationships will 131
80 ments mainly consist of materials that have intense, be taken by SGNS inefficient way. The relationship 132
81 complicated, and improper structures. This document between words can behold in another form of a word 133
82 class primarily focuses on the evaluation of the capa- occurring. 134

co
83 bleness of compressing improper structural data of This system potentially dominates the difficulty of 135
84 XML documents. In the case of the regular type of data sparsity. The existing studies [21, 22], present 136
85 documents, the conventional topic models come with that the SGNS algorithm is equal to factorize the 137
86 great achievement, but this process in case of short relationship between matrix terms. Afterward, we 138
Un
87 text document collection is not working well. The rea- enhance some questions about this technique: 1) Is it 139
88 son for this problem getting high dimension in a short possible to transform the problem of matrix factoriza- 140
89 document set is because of not having enough infor- tion to a non-negative matrix factorization problem? 141
90 mation in a short type of text, extracting meaningful 2) Is there any way to combine these outputs into the 142
91 keywords is very difficult to extract [1, 3]. There are conventional NMF for the term-document matrix? 3) 143
92 various methods presented to investigate topics mod- Is this model suitable to detect underlying topics form 144
93 eling from short text messages in the recent past to short texts? 145
94 overcome this challenge, and the current methods are In this paper, we propose a novel semantic-assisted 146
95 used to combine short text documents into the pseu- model based on RL and SeaNMF for short-text topic 147
Z. Shahbazi et al. / Topic modeling for short-Text 3
148 modeling, which is used to extract topics from a like a word co-occurrence. These approaches have 195
149 large corpus. This work uses and analyzes four differ- been used in various tasks and work well for normal 196
150 ent real-world datasets of short text documents, such text. Nevertheless, there are many shortcomings in 197
151 as Article headlines, Question Answers, Microblogs, traditional topic models approaches in terms of data 198
152 and News. The following NMF-based topic modeling sparsity, and it occurred when the size of the doc- 199
153 technique is used to discover underlying topics from ument is too short. Many studies are focus on and 200
154 a large volume of data. We analyzed and investigated improve the quality of finding a topic in the short 201
155 the semantic relationship between keywords and doc- text to overcome the problem of data sparsity. For 202
f
156 ument information using skip-gram, which shows instance, in [23] author proposed a system that is used 203
roo
157 the efficiency of disclosing the semantic relation of to find the hidden topic from the large external data 204
158 words. The reinforcement learning system applied to source, e.g., Wikipedia. The presented framework 205
159 get the reward function from the algorithm in train- aim is to find hidden patterns from the spare and short 206
160 ing set for input-specific RL and compare it in the text and web segments. Similarly, in [24], Jin et al. 207
161 test set with specific RL. Experimental results show presented an approach to cluster short text in order to 208
rP
162 the significance of the proposed model as compared find the topically related text from long auxiliary text 209
163 with existing methods, e.g., document classification data. The proposed Dual latent Dirichlet Allocation 210
164 and topic coherence accuracy. Finally, we performed (DLDA) approach combined the topics parameters 211
165 a comparative analysis of the proposed model with from the long text and short text and removed the 212
tho
166 state of the art techniques. By comparing different data inconsistency between the datasets. These stud- 213
167 topics keywords and analyze their network, the pro- ies use a large set of data in order or improve the 214
168 posed system shows that the extracted topics from quality of topic representation. Numerous ingenious 215
169 the benchmark datasets are meaningful, and extracted approaches extract latent topics from topic model- 216
170 keywords using this model are more semantically ing using a large set of data by integrating a short 217
Au
171 associated. Therefore, the proposed RL+SeaNMF is text document. For example, in the study [25], use an 218
172 an effective topic model for short texts. LDA model to produce short texts by the similar user. 219
173 The remaining of this paper is categorized into five The main aim of this study is to find influential users 220
174 sections, which are the following: Section 2 explains from social networking and blogging sites, e.g., twit- 221
175 the related work about topic modeling in short texts ter. Likewise, Davison and Hong et al. [3] develop two 222
based on the majority of state-of-the-art topic models. separate aggregated short text approaches that contain
d
176 223
177 Section 3 presents the detailed system methodology, each word in the large set of data vocabulary, includ- 224
cte
178 design architecture, and overall transaction process ing text authors. In [26], author design, a scheme 225
179 of the proposed system. Section 4, elaborates on the to produce pseudo-documents for LDA data prepro- 226
180 implementations and results of the proposed system, cessing using tweet pooling. In another study [27] 227
181 and section 5 concludes the paper. presented an approach based on word co-occurrence 228
named WNTM to improve the sparsity and imbal- 229

rre
ance issues to generate pseudo-documents in word 230
182 2. Literature review network. The develop scheme greatly improved the 231
semantic compactness of data without introducing 232
183 In this section, we will enlighten on some of the space and time complexity. Author Zuo et al. [14] 233
co
184 existing studies based on topic modeling. This sec- also develop a scheme to generate pseudo-documents 234
185 tion is categorized into three-part, i.e., topic modeling using the topic model. The presented approach uses 235
186 for short text, topic modeling for short text using the short text by influencing fewer pseudo documents 236
187 Non-negative Matrix Factorization, and sequence to to conjecture the latent pseudo documents. Kou et al. 237
Un
188 sequence for topic modeling (Seq2Seq). At the end [28] presented a multifeatured probabilistic graphical 238
189 of this section, we will discuss the novelty of our approach to combined short text into long text in order 239
190 proposed model. to improve the search task in social networking sites. 240
The develop MFPGM is also used to overcome the 241
191 2.1. Topic modeling for short text problem of semantic sparsity. Also, Jin et al. [24] pro- 242
pose an approach that is used to generate the pseudo 243
192 Traditionally topic modeling algorithms, e.g., LDA documents from the short text that contains URLs. 244
193 and PLSA, are used to find the hidden and meaningful Afterward the short text is used for combining auxil- 245
194 topic from a set of document corpus using a pattern iary information, e.g., named entities, hash tags, and 246
247 locations etc. [1, 3, 25, 29]. Kou et al. [30] analyzed tion process, data sparsity and repetition [10, 33, 34]. 297
248 a temporal and spatial characteristic which lead to Although only few of studies focus on topic model- 298
249 develop an approach named as social network short ing for short text using NMF approach. For instance, 299
250 text semantic modeling (STTM). The STTM is used Yan et al. [35] present asymmetric term correlation 300
251 to resolve the problem of semantic sparsity for short matrix based on NMF to extract topics from short 301
252 text in social network and improve the temporal and text documents. The proposed system uses a quar- 302
253 spatial characteristic. tic non-convex based loss function in its processing; 303
254 Nowadays, many researcher and scientist used however, the designed system is failed to work in 304
f
255 word co-occurrence information in order to improve terms of stability and reliability. To dominate this 305
roo
256 the topic modeling techniques for short text. For problem, they present the Sym-NMF model in [15, 306
257 instance, Yan et al. [5] design an approach used to 17], but this model doesn’t supply a good insight in 307
258 generate word pair co-occurrence in the document. topic modeling, and furthermore, we can’t use this 308
259 The develop technique biterm topic model (BTM) model to get document representation immediately. 309
260 collectively used information of large set of data Similarly, in NMF, latent semantic space obtains the 310
rP
261 biterms which further used in for topic distribu- topics in a specific document cluster, and each text 311
262 tion. Moreover the biterms is also used to resolve document proposes merged topics in document [9]. 312
263 the problem of document sparsity, but not consider Automatic text segmentation plays an important role 313
264 the order of words in the algorithm processing. Lin in this area. It helps to make instructive information 314
tho
265 et al. [31] proposed a dual-sparse topic scheme to based on document summaries that are the significant 315
266 investigate the problem of data sparsity in word and ideas of the original text documents. The main issue in 316
267 topic within the document. The design approach automatic text summarization is how to calculate and 317
268 used to decouple the sparsity and improve the doc- extract the right information from the document. Tian 318
269 ument in term of topic-word and document-topic et al. [36] proposed a SeaNMF approach integrated 319
Au
270 distribution. Similarly, Quan et al. [13] presented with local word-context correlation which is used to 320
271 self-aggregated approach for shot text. The designed find topics for the short texts. The design approach 321
272 self-aggregated based topic model (SATM) gener- use the block coordinate decent technique to calcu- 322
273 ate pseudo-document from the short text snippet late the proposed SeaNMF approach. Moreover, the 323
274 sampled. The generated topic and the process of develop system also equipped with sparse SeaNMF 324
aggeneration is process using reinforcement tech- in order to achieve high interpretability.

d
275 325
276 niques so that the aggregation is build based in

cte
277 topical affinity of texts. Nevertheless, the size of 2.3. Sequence to sequence for topic modeling 326
278 data will also increase the number of parameters

279 for configuring a suitable extensive pseudo docu- Seq2Seq is one of approach which is widely 327
280 ment. Temporarily inference procedure containing used in natural language processing for different 328
281 both the topic sampling and text aggregation consum- purposes like machine translation [37–41], head- 329
rre
282 ing a large amount of time during processing. Bicalho line generation [41–43], text summarization [44, 45] 330
283 et al. [32] highlighted a general approach for topic and speech recognition [46–48]. Furthermore the 331
284 modeling by generating a corpus pseudo documents Seq2Seq approach is divided into two parts which 332
285 presentation from the actual documents. The pro- takes input and output of the document. The input 333
co
286 posed scheme used vector representation and word section is a specified data sequence, and the output 334
287 co-occurrence to create pseudo-documents. After- section is a sequence of the data module. Generally, 335
288 ward different topic modeling techniques are applied Seq2Seq model training is based on the ground-truth 336
289 on pseudo-document for further analysis. sequence using the teacher-forcing technique where 337
Un
ground-truth is act as a teacher [49]. During the past 338
290 2.2. Non-negative matrix factorization for topic few years many studies have been published by using 339
291 modeling Seq2Seq approach in order to enhance the complex- 340
ity of topic modeling for short texts [50]. Rush et al. 341
292 During the past few year researcher and data scien- [39], seq2seq model based on encoder attention and 342
293 tist have gained a lot of attention in the field of topic neural network language model (NNLM) decoder 343
294 modeling. Different techniques like Non-Negative based on the neural machine translation (NMT) for 344
295 Matrix Factorization (NMF) are applied on topic text summarization. Specifying document topics is 345
296 modeling in order to improve the knowledge extrac- one of the important step of document readability 346
Table 1
Critical analysis of sequence to sequence topic modeling
Authors Propose Issues
Wei Li, Detecting word clusters which contain Task labeling due to words
Andrew McCallum [51] syntactic classes and semantic topics. indeterminacy
Ilya Sutskever, Manufacture sequence Sentence optimization
Oriol Vinyals, learning approach to structure problem between
Quoc V. Le [52] minimal hypothesis of the process sentence and data origin
Martin Riedl, LDA and TopicTiling Segment numbers are
f
Chris Biemann [53] optimization to improve the not investigated.
roo
accuracy of segmentation
Improving the semantic
Martin Riedl, information of topic models Sparsity issue
Chris Biemann [54] to get better performance in which require different
wordbased and TextTiling algorithms. ways to smooth the data
Using Full Dependence
Dan Fisher, Mixture (FDM) to Hyper parameter doesn’t
rP
Mark Kozdoba [55] combine the general contain the topic arrangement
moment and second moment
of data.
which usually comprised of three main approaches, This figure propose the input data (context, word,
tho
347 377
348 i.e., |total words|, |hiragana ratio|, and document) as (Xi ), (Ji ), (Yi ). (G), (Rc ) and T is 378
349 and |word length|. a vector representation of this proposed system, and 379
350 Table 1 shows the critical analysis of state-of-art each part of T represents a topic in the document. 380
351 techniques based on short text document to develop The presented SeaNMF model [36] is able to 381
Au
352 the probabilistic method to paradigms for realizing extract the semantics out of short text data use as a 382
353 short documents. basis of word-content, and word-document relation- 383
354 As stated above, these existing models are not ade- ship, and the proposed objective function integrates 384
355 quately developed for topic modeling from short text the privilege of both the NMF model for topic 385
356 documents and also have some overcoming in terms modeling and the skip-gram model for obtaining 386
of lack of information, such as data sparsity and ambi- word-context semantic relationships. To solve the
d
357 387
358 guity. To the best knowledge of the author, there has problem of optimization, we used a block coordinate 388
been no semantic-based model to extract meaning- algorithm along with the SeaNMF model sparse ver-
cte
359 389
360 ful topics from short text documents using RL and sion to realize further interpretability. Table 2 is used 390
361 SeaNMF techniques so far. to summarizes frequently used notations 391
3.1. Sequence to sequence model (Seq2Seq) 392

rre
362 3. System architecture of proposed Encoder decoder architecture applied in differ- 393
363 RL+SeaNMF model ent types of natural language processing tasks, e.g., 394
machine translation, dialogue generation, document 395

co
364 The proposed paper comprised of three main mod- summarization, and question answering. Seq2Seq 396
365 ules, i.e., non-negative matrix factorization (NMF), model presented with various strategies, e.g., atten- 397
366 Reinforcement learning, and sequence to sequence tion, copying, and coverage. The maximum result 398
367 model (Seq2Seq). First, we explain the problem state- of the proposed cases focus on repetitive words and 399
Un
368 ment in sequence to sequence models and solutions build a fixed-sized vocabulary based on words and 400
369 based on our study. Second reinforcement learning tokens. Therefore, the efficiency of the model is based 401
370 steps in short text document topic generalization on vocabulary limitations. The Algorithm 1 describes 402
371 and some information about block coordinate decent a simple Seq2Seq model steps. 403
372 methodology. Third, the NMF topic modeling sys- The Algorithm 1 consists of two phases, such as 404
373 tem and SeaNMF model based on a block-coordinate training and testing phases. In training phase, there 405
374 algorithm to display the terms of short text doc- are two groups of data, such as X and U. First, we 406
375 uments. Figure 1 represents the proposed system run encoding on X to get the last encoded state hTe . 407
376 architecture. After getting the last encoded state, we run the decod- 408
f
roo
rP
tho
Fig. 1. Overview of the RL+ SeaNMF Model System architecture.
Au
Table 2
Notation used in this paper
Name Description
Seq2Seq model parameters
d
X Length of input sequence Te , X = {x1 , x2 , ... , xTe }

Y Ground-truth of output sequence T, U = {u1 , u2 ,..., uT }
cte
Û Generated output length sequence T, Û = {uˆ1 ,uˆ2 ,..., uˆT }

Te Encoder numbers of input sequence
T Decoder number of output sequence
d Input and output sequence representation size
B shared vocabulary of input and output
et Hidden state encoder at time t.
rre
jt Hiden state decoder at time t.

Pθ The seq2Seq model with parameter θ.
Reinforcement Learning Parameters
rt = r(st , ut ) The reward that the agent receives by taking action ut when the state of the environment is jt .
Û Sets of actions that the agent is taking for a period of time
co
T, Û = {uˆ1 ,uˆ2 ,..., uˆT } This is similar to the output that the Seq2Seq model is generating.
φ The policy that the agent uses to take the action.
Q(jt , ut ) The Q-value (under policy φ) that shows the estimated reward of taking action yt when at state jt .
SeaNMF Parameters
B word-document matrix.
Un
J semantic correlation matrix.

V words Latent factor matrix.
Vc contexts Latent factor matrix.
E Document Latent factor matrix.
vj Words Vector representation vj .
cj Context Vector representation cj .
R+ Real number of Non-negative.
D Document number in the corpus.
M Distinct words Number in vocabulary.
Algorithm 1 the Seq2Seq procedure faces conformity in the oper-

Seq2Seq model simple Training ation process in training and testing set. Equation 1
Input: Input (X) as sequence and (U) as ground-truth defined to evaluate the loss (error) of output ût . Fig-
Output: Seq2Seq model Trained set
ure 2 describes the training process of the decoder,
Train Step:
for <Group X and U> do which has two inputs.
Run encoding on X and get the last encoder state hTe .
Run decoding by feeding hTe to the first decoder and obtain the
T
sampled output sequence Û CIN = − (log πθ (ût |û1 ...ût−1 , jt , jt−1 , X)) (1)
Update the parameters of the model
f
t=1
end for
roo
426
Test Step: Output state is defined as jt−1 , input of ground truth 427
for <Group X and U> do
sampling output based on training model Û
is defined as ut . Current output state is defined as jt 428
Estimate the model by applying performance measure. and next action calculation defined as ût . Although, 429
end for in the testing set, all over the decoder process is based 430
on generated action through the model distribution to
rP
431
anticipate the next action. 432

409 ing process by feeding hTe to the first decoder in As shown in Fig. 2, the right side of the Figure 433
410 order to get a sampled output sequence Û. Finally, is presenting the LSTM network, which is correlated 434
411 we update the parameters to executes the next step. with the encoder e to show hidden state of encoder, 435
tho
412 In the testing phase, the input samples are tested and it has Te modules. Te is representing the length of 436
413 based on trained model Û. Finally, we evaluate the the input sequence and the number of encoders. And 437
414 performance of the tested samples using different per- the right side of the LSTM network is representing 438
415 formance measures. The proposed Seq2Seq model the correlate with decoder, and it has T modules. T is 439
416 in this paper is containing convolutional architecture, presenting the length of output sequence and number 440
Au
417 which is comprising words and topics, reinforcement of decoders. 441
418 learning system, and SeaNMF model. Ground-truth
419 in this algorithm defined as U which generates Û as an 3.2. Reinforcement learning 442
420 output of model. In this paper, we used the following
421 measures, such as CIDEr [56], BLEU [57], ROUGE Machine learning is an advance and robust area 443
d
422 [58], and METEOR, to evaluate the performance of in artificial intelligence (AI), which Reinforce- 444
423 the proposed model [59]. ment learning is one of the tasks in this system. 445
cte
The reinforcement learning algorithm is evaluated 446
424 3.1.1. Sequence to sequence model problem based on Markov Decision Processes (MDPs). The 447
425 statement agent-environment relationship in the reinforcement 448
The main problem of the current Seq2Seq model learning system is divided into three categories 449
is, “cross-entropy minimization output,” which has named “Agent,” “Environment,” and “State.” The
rre
450
not to generate the acceptable result in running the agent is described as software which produces high- 451
network. Hence, applying cross-entropy (C) loss in order intentions, and it is also learner part of the 452
co
Un
Fig. 2. Simple sequence to sequence model.

f
roo
rP
tho
d Au
Fig. 3. Work flows of Reinforcement Learning model.
cte
453 RL system. The environment represents the proce- between document output and reference output. Prac- 474
454 dure problem, which should solve, make a real-world tically, the Seq2Seq model is applied to generate the 475
455 and also simulate the environment to communicate function operation, and task-specification, e.g., text 476
456 with the agent. The last category states which are the summarization is extracting the meaning of the func- 477
rre
457 position of agent for a special pick of time in the envi- tion which denotes the next function summarization, 478
458 ronment. As a simple explanation, an agent performs while, in question answering the task, start and point 479
459 an action the environment gives the agent reward and of the action suppose to defined to get the document 480
460 a new state where the agent reached by acting. answer. 481
co
461 Markov Decision Processes is a tuple of (J, B, P, Through reinforcement learning, two output
462 R, T), where J is a set of states, B is a set of actions. sequences generated with input sequence x. û, which
463 Transition function is defined as P : J ∗ B → J define as the first sequence capture by words that this
464 where P(j, b) giving the next state after performing process maximizes and sampling the output of prob-
Un
465 an action b in state j. The reward function is defined ability distribution. Output of both x and û sequences
466 as R : J ∗ B → R with R(j, b), which is giving the are a reward of the process, and it minimizes the loss
467 immediate reward for performing an action b in the of reinforcement based on Equation 2.
468 state of j. Terminal states are given by T ⊆ J that
marks the end of an episode. Figure 3 shows RL total
LRL = −(r(uj ) − r(û)) log pθ(uj )
469
(2)
470 process in document similarity.
471 As presented in Fig. 3, the ground-truth of this
472 architecture can supply by human or automatic met- Equation 2 shows minimum reward (r) output of 482
473 rics, which is generating similarity measurement sequence x and sequence û. 483
484 3.2.1. Reinforcement learning: Model-free semantic information between words and document 518
485 solution technique contents, Vc (j, :) = −

→
cj for cj ∈ V matrix defined. To 519
486 RL is essentially deal with how to capture detect keywords and document content relationship 520
487 the optimal policy when there is no accessible (word-content) J ≈ VVcT matrix defined. The above 521
488 model. Through this issue, MDPs (Markov Deci- equations defined as a matrix to show the relation- 522
489 sion Process) focus on estimation and defective ship between words and contents. In this equations V 523
490 information, which proposed earlier; it is a crucial defined as latent factor matrix of words, Vc defined 524
491 problem statement in short text documents. To over- as content matrix, Vj represents vector representa- 525
f
492 come this issue, “model-free” reinforcement learning tion of words and cj represents vector representation 526
roo
493 presented, which is not depending on existing infor- of contents. 527
494 mation and reward models. There is a lack of a model

495 for this problem, which generates statistical knowl- 3.3.2. Optimization 528
496 edge to use the sample MDP model. Algorithm 2 Topic modeling on short text documents contains 529
497 presents the common RL based on online document a different type of process that the one which used in
rP
530
resources. this paper is, “each text is defined as a window in the 531
Algorithm 2 skip-gram model.” 532
General Reinforcement Learning algorithm for online short

documents
r Skip-gram model is equal to the relative length 533
tho
of the text document.
r
for <Occurance> do 534
j ∈ J is marked as the start point. Number of short texts is equal to the number of 535
t := 0 windows. 536
while select an action b ∈ Bj do
complete action a
perceive the new state ŝ and received reward r
Block coordinate descent algorithm used to define 537
Au
Update T̂ , R̂, Q̂ and V̂ the term-document matrix B for word representation 538
Using experiment of <j, b, r, ĵ> based on bag-of-words and in next step evaluation of 539
Update the parameters of the model semantic correlation matrix J presented for sampling 540
j := ĵ
the document contents. Equation 3 and 4 are rep- 541
end while <ĵ is a state goal>
end for resenting the SeaNMF model procedure evaluation 542
based on above mentioned explanation.

d
543
498

Algorithm 2 represents the short documents topic (BE)(F ) + α(JVc )(F )
cte
V(F ) ← VF + T
499
544
500 modeling issue by evaluating the minimum reward of (E E)(F,F ) + α(VcT Vc )(F,F )
501 the process.
(VET E)(F ) − α(VTc Vc )(F )
− (3) 545
502 3.3. SeaNMF model (ET E)(F,F ) + α(VcT Vc )(F,F )

rre
546
503 SeaNMF model is a novel semantic-assisted NMF

(JV )(F )
504 model which is proposed to unearth hidden topics Vc(F ) ← V( cF ) + 547
(V T V )(F,F )
from the short text document. This methodology

505
combines the semantic information of document (Vc V T V )(F )

co
506
using word embedding dictionary in a training model, − (4) 548

507
(V T V )(F,F )
508 which this dictionary able to recover words relation-
509 ship, hidden part of a document, etc. The block coordinate descent algorithm in the 549
Un
SeaNMF model is presented in Algorithm 3. 550
510 3.3.1. Model evaluation

511 One of the challenges in this paper is how to 3.3.3. Intuitive evaluation 551
512 recommend words to the NMF system. Based on Furthermore about SeaNMF model, Equation 5
513 latent matrix V ∈ (RM∗F + ) (W components are all and 6 processed from Equation 3 and 4 to represent
514 non-negative), non-negative limitations applied on the update procedure of topic modeling.
−
→
515 words and vectors of document. Thus, V ∈ (RM∗F + )
−
→
and c ∈ (RM∗F ) hold. To set the keyword Vi ∈ V (BE)(F ) (VET E)(F )
516
+ 1
← + −
applied to the system based on Vi,: = −
→ V(F ) V F (5)
517 vi . To detect the (ET E)(F,F ) (ET E)(F,F )
Algorithm 3 ships. Although topic similarity can be calculated 584

SeaNMF Algorithm for short text documents based on word occurrence and average PMI. Similar- 585
Input: Matrix B term-document ity measurement divided into two categories named 586
Matrix J semantic correlation; “Distribution of topic words” and “Semantic space of 587
Number of topics F, α; topic models.” 588

Output: V , Vc , E Based on the distribution of topic words, Every 589
Initialize: V ≥ 0, Vc ≥ 0, E ≥ 0 random real numbers;
t = 1; topic is generated from a Dirichlet distribution of V 590
dimensions where V is the size of a word. For every 591
f
while <repeat> do
for <k = 1, K> do
document, Generate a distribution over topics from a 592
roo
t
Compute V(:,k) by Equation 3 Dirichlet distribution of T dimensions where T is the 593
t
Compute Vc(:,k) by Equation 4 number of topics in the corpus and For every word in 594
end for the document, choose a topic according to the distri- 595
t =t+1 bution generated and Choose a word according to the 596
end while <Close>
distribution corresponding to the chosen topic. So, 597
rP
each topic is a probability distribution over the words 598
of the vocabulary.
(JVc )(F ) (VVcT Vc )(F ) 599
2
V(F ) ← VF + − (6) Semantic space is representing by the topic model, 600
(VcT Vc )(F,F ) (VcT Vc )(F,F )
which is applying to propose topics and words. More- 601
Equation 5 is proposed for standard NMF. It pre-
tho
552 over, each topic is a probability of the words in the 602
553 dicts the terms in short content into the same category training set. Each corpus with a combination of words 603
554 using the term-document matrix. Furthermore, in and documents shows the relationship of them in the 604
555 Equation 6, words that have a similar meaning or topic modeling procedure. 605
556 hidden words with the same significance, share the

Au
557 common context keywords. Hence, using this model
558 increase the topic’s connections. e.g., in Fig. 1 words 4. Implementation and experimental result 606
559 (J1 andJ4 ) are in separate documents, but their con-

560 nection is in J2 based on the same keyword in both 4.1. Experimental setting 607
561 document contents. A simple explanation of SeaNMF

model in short text is as below: In this section, we evaluate the performance of our
d
562 608
563 “Skip-gram model is equal to the relative length model by substantial experiments on different short 609
of the text document. e.g., “Harvard university of

cte
564 document datasets. Finally, Dataset information, 610
565 engineering” and “Oxford university of science” are metrics, baselines estimation, and present various 611
566 two samples of short text documents. “Harvard” and sets of results. Although, comparison between pro- 612
567 “Oxford” are not occurring in the same position posed models and latent Dirichlet allocation (LDA), 613
568 in the second sentence and also “engineering” and Non-negative Matrix Factorization (NMF), Pseudo- 614
rre
569 “science” are not occurring in the first sentence but document-based Topic Model (PTM), and Dirichlet 615
570 the relationship between “Harvard, university” and Multinomial Mixture model (GPUDMM) is pre- 616
571 “Oxford, university” is the meaning of standard NMF sented. 617
572 model.”
co
573 Although, based on Equation 6, both sentences 4.2. Development and simulation environment 618
574 have a shared word “university,” which is the mean-

575 ing of the SeaNMF model. This model is to find the The development environment of the proposed sys- 619
576 similarity between words and also hidden parts of tem is summarized in Table 3. All experiments and 620
Un
577 document and relationship between vectors in short results of the system are carried out using Intel(R) 621
578 text documents to extract meaningful and valuable Core(TM) i7-8700 CPU @3.20GHz 3.19 GHz pro- 622
579 hidden topics, which is the main issue of this type of cessor with 32 GB memory. Moreover, Win-Python 623
580 content because of lack of information. (v3.6.2), based on Jupyter notebook, is a prerequisite 624
to develop the topic modeling environment, config- 625
581 3.4. Similarity measurement ure the model, and get the result detail information. 626
Topic modeling on short text documents evaluates 627
582 Similarity measurement is evaluating based on based on a combination of three different models in 628
583 the word probability distribution and word relation- this area named “Sequence to sequence,” “Reinforce- 629
Table 3 Questions, version 2.04. Collected data is from 10 662

Development environment of proposed short text document topic various categories related to Question subjects, and 663
modeling
it also contains Financial Service, Diet, Fitness, etc. 664
Component Description
Forth is Tag.News that selected dataset is containing 665
Programming language WinPython–3.6.2 part of the TagMyNews dataset that includes tweets, 666
Operating system Windows 10 64bit
Browser Google Chrome, opera news, and snippets. It divided into seven categories 667
Library and framework Jupyter notebook named by “Entertainment”, “Health”, “Sci and Tech”, 668
IDE Anaconda 2019 Release 3 “Sport”, “Business”, “world” and “US”. Each cat- 669
f
CPU Intel(R) Core(TM) i7-8700
egory contains a minimum of 25 keywords. Fifth 670
roo
CPU @3.20GHz 3.19 GHz
Memory 32GB is DBLP which is the computer science bibliogra- 671
Topic modeling algorithm Seq2Seq, Reinforcement phy website. A collected dataset from this website 672
Learning, SeaNMF is divided into four categories named “Database,” 673
Distribution Modeling Algorithm SeaNMF
Reinforcement Learning Model Free
“Machine Learning,” “Information Retrieval,” and 674
Seq2Seq Encoder and Decoder “Data mining.” Each category contains the title of 675
rP
SeaNMF Novel semantic-assisted conference papers.Sixth is Yahoo.CA which Pro- 676
NMF matrix posed dataset extracted from Yahoo Answers Manner 677
Questions, version 2. It contains Questions and a 678
more suitable answer to that question. And finally 679

630 ment Learning,” and “SeaNMF model.” Sequence
tho
ACM.IS data which is collected from Scientific 680
631 to sequence model evaluated based on encoder and Literature Dataset Data-verse Harvard university. it 681
632 decoder architecture to solve the problem of cross- contains the published paper’s information. Dataset 682
633 entropy in short texts. The reinforcement learning information is summarized in Table 4. 683
634 model evaluated based on a model-free technique
to capture the optimal policy on short content, and
635
Au
4.4. Metrics estimation 684
636 finally, the SeaNMF matrix-based model evaluate
637 extracted topics from limited information contents.
To accurate the proposed methodology, Topic 685
638
coherent and Document classification procedure 686
applied in the topic modeling system. More detail 687
4.3. Dataset
d
639 information about each part is explained as below: 688

cte
640 The proposed system contains four real-world 4.4.1. Topic coherent 689
641 short text document datasets correlated with To evaluate the metrics in this process “point
642 related applications, i.e., Article headlines, Question mutual information (PMI)” score used, which is to
643 Answers, Microblogs and News. These categories evaluate information theory and show all possible
644 divided into seven types of datasets that explain in events occurring in text data. Given Equation 7 is
rre
645 detail below: used to calculate Topic coherent.

646 Tweet: This type of data is a good resource for
short text document topic modeling, which accumu- 2 p(vi vj )
647
CK = log (7)
648 lates and labeled by [60]. Out of a large amount of M(M − 1) p(vi vj )
1<=i<j<=M
co
649 tweet data categories, 15 categories selected to apply

650 in the proposed process, i.e., World, Regional, Art, Each topic is defined as K, and the Number of prob- 690
651 Science, Health, News, Shopping, Reference, Soci- ability between words in a document is defined as M. 691
652 ety, Recreation, Computers, Business, Sport, Game, p(vi vj ) is presenting the words occurring together in 692
Un
653 and Home. Each category contains almost 3000 sepa- a similar document. To measure out the topic quality, 693
654 rate tweets with a minimum of two keywords. Second the PMI score applied to the topic modeling process. 694
655 is Collected GoogleNews dataset from the GitHub Based on results of PMI in regular-sized text con- 695
656 website. This type of short document contains 3 tents which it works well in that type, still, there 696
657 million English words based on 300-dimensional are issues in short type of document, which means 697
658 latent embedding applying the word2vec model. This gold-standard topic may be assigned with a low PMI 698
659 data type used for the comparison method with score. 699
660 GPUDMM [17]. Third is Yahoo.Ans which pro- To dominate this issue, First, we evaluate PMI in 700
661 posed dataset extracted from Yahoo Answers Manner four categories of short text data. Next, in two other 701
Table 4
Dataset statistics
Data Type Documents Terms Aggregation (A) Aggregation (B) Document Length Categories
Tweet 54524 21381 1.3855% 1.1824% 8.84 15
Yahoo.Ans 51865 5445 1.2118% 1.1184% 5.41 10
Tag.News 39769 22636 2.3972% 1.2471% 29.25 7
DBLP 26112 3558 1.8714% 1.3788% 7.75 4
Yahoo.CA 41797 5445 6.1642% 1.8865% 53.72 –
ACM.IS 47413 3558 5.3778% 2.1515% 88.51 –
f
roo
702 datasets (Yahoo.Ans and DBLP), which are arranged Table 5
703 documents, based on an external corpus, PMI cal- Topic coherent outputs based on PMI
704 culated. The comparison output of these datasets Tweet Tag.News DBLP Yahoo.ANS
705 presents the effectiveness of the proposed method- Latent Dirichlet 1.3168 1.6159 0.1457 1.3168
706 ology. Allocation
rP
Non-negative 1.9156 1.7525 0.1295 1.2415
Matrix
707 4.4.2. Document classification Factorization
708 Document classification is the further efficient Pseudo-document- 1.4856 1.7739 0.9616 1.2422
based Topic
709 topic modeling process used in the proposed sys- Model
tho
710 tem. In this process, classification is just used for a GPUDMM 0.1324 0.1862 0.3926 0.6819
711 labeled dataset. To calculate the classification per- SeaNMF 4.2588 3.7429 1.7248 1.8664
712 formance, five-fold validation applied to the process Sparse SeaNMF 4.2181 3.7164 1.7341 1.7192
713 and, finally, the LIBLINEAR package used to figure
714 out the quality of classification. F-measure (“Preci- Table 6
Au
715 sion” and “Recall”) used to qualify the classification Topic coherent outputs based on Yahoo.CA and ACM.IS
716 performance. Yahoo.ANS/ DBLP/
Yahoo.CA ACM.IS
717 4.5. Analysis methods Latent Dirichlet Allocation 0.7651 0.5393
Non-negative Matrix Factorization 0.6372 0.4737
d
Pseudo-document-based Topic Model 0.7615 0.5542

718 To show the effectiveness of the presented method, GPUDMM 0.4413 -0.1261
719 we compare the efficiency of our method with other SeaNMF 1.2115 0.7752
cte
720 state-of-art techniques. Sparse SeaNMF 1.1299 0.7558
721 – Latent Dirichlet Allocation (LDA): LDA is a

722 probabilistic-based topic modeling technique – Pseudo-document-based Topic Model (PTM): 741
723 used to extract the topics from the rich source PTM is to propose pseudo-documents to topic 742
rre
724 of information. LDA is used to extract topics models in short text documents to extract the 743
725 from contents based on document keywords. document topic. 744
726 Extracted words are the most relevant keywords – GPUDMM: Applied to extract topics based on 745
727 which occurs in document related to special top- the Dirichlet Multinomial Mixture model. 746
co
728 ics mentioned in content. There are different

729 open-source libraries are available in python, 4.6. Experimental results 747
730 such as Gensim [61], but it does not include

731 NMF. LDA uses a probabilistic mechanism 4.6.1. Topic coherent output results 748
Un
732 while NMF relies on linear algebra. Table 5 represents the topic of coherent outputs 749
733 – Non-negative Matrix Factorization (NMF): via comparison with various methods and effec- 750
734 NMF is an unsupervised learning task which tiveness of the SeaNMF model based on standard 751
735 used for reducing dimension and simulating NMF in short document contents topic modeling. The 752
736 clustering. In this paper, NMF is used to imple- comparison between LDA and PTM also shows the 753
737 ment the block coordinate algorithm. NMF is considerable prove that mentions the SeaNMF model 754
738 a linear-algebraic model, that is used to reduce can extract more coherent topics out of a document. 755
739 the high dimensional vectors into a low dimen- Table 6 represents the topic of coherent outputs 756
740 sional. via a comparison between Yahoo.CA and ACM.IS 757

758 dataset. Based on the mentioned reasons for PMI in correlation matrix of semantic words. The parameter 809
759 topic coherent section, we evaluate topic coherent on f is used to effects the data sparsity of a correlation 810
760 external corpus because it has better performance on matrix, which causes that the semantic words are less 811
761 short text documents. The output result of coherent correlated. It is also observed from the graph that 812
762 topics is presented in Table 6. Going deep into word the value of the F-score is reduced when the param- 813
763 correlation process, SeaNMF obtains more suitable eter f increase. It is found that the F-score slightly 814
764 topics out of short text documents. increases and improves when a smoothing factor Y 815
765 As explained in Tables 5 and 6 by visualizing and is increased. Hence, in the collection of short docu- 816
f
766 comparing the words relationship , the main infor- ments, there is a difference between a high coherent 817
roo
767 mation of contents effects in extracting topics using topic and a high-quality topic. 818
768 Sparse SeaNMF model.

4.7. Combination of Seq2Seq and reinforcement 819
769 4.6.2. Document classification output results
learning model 820
770 Furthermore, Metrics estimation presented into
rP
771 two parts, which document classification is part of
This section is presented the combination Seq2Seq 821
772 the proposed system. Table 7 shows the best results
model with reinforcement learning in every possible 822
773 of our approach in short document dataset (“Tweets”,
technique. The main goal of using these models is 823
774 “Yahoo.ANS”, “Tag.News”). These outputs display
to overcome with the mismatching problem in docu- 824
tho
775 that the proposed system is efficient in short con-
ment keywords and topics. Generally, this system is 825
776 tent. Result comparison with LDA and NMF shows
works base on the reward function for running object 826
777 the presented SeaNMF model has better classifica-
function, which explained earlier in the reinforcement 827
778 tion and also better performance results than other
learning section. The reinforcement learning algo- 828
779 systems. The presented model output is summarized
rithm is an effective system for improving the output 829
Au
780 in Fig. 4.
of state-of-art in the Seq2Seq system. Although, there 830
781 A comparison between these systems represents
are a lot of progressive techniques, e.g. DQN, Actor- 831
782 that skip-gram has an effective role in the high per-
Critic models, and DDQN, which didn’t apply to this 832
783 formance of semantic documents, and also, the NMF
task before. The main issue in this system is the prob- 833
784 model is not good enough comparing with LDA.
lem of using Q-Learning and formative part of that 834
Another effective model in this process is
d
785
in the Seq2Seq model. Text summarization can be an 835
786 GPUDMM, which accomplishes better than other
example of this problem. This system should evaluate 836
cte
787 baselines. The number of Tweets in various cate-

words in each vocabulary even with a lower status of 837
788 gories is almost the same and is more dependable.
training. Related to this reason, the most focus on this 838
789 Although, the Tweet dataset passes the imbalanced
area is a problematic approach, which is the REIN- 839
790 class” issue. Table 7 shows the development of the
FORCE algorithm that is a suitable task to train the 840
791 SeaNMF model, which on average, 14 percent higher
Seq2Seq model. Table 8 presents “policy”, “action”,
rre
841
792 performance in overall the baselines which presents
and “reward” function in Seq2Seq model. 842
793 by precision (P), recall (R), F-measure (F).
794 It is observed that the better performance of the
795 topic coherence does not indicate a better perfor- 4.8. Topics semantic analysis 843
co
796 mance of document classification. The following

797 Fig. 4 presents the performance of topic coher- Generally, topic semantic analysis is to repre- 844
798 ence and classification of documents of the SeaNMF sent the effectiveness of the SeaNMF model based 845
799 model. The parameter alpha indicates the weight- on extracting meaningful keywords. To discover 846
Un
800 ing factor, which used for factorizing the correlation topics out of short text document, specified top 847
801 matrix of semantic words. γ is used as a smoothing keywords based on standard NMF model compare 848
802 factor, which used for the probability of sampling. with SeaNMF system output. Represented dataset 849
803 It is evident in Fig. 4 that the F-score decreases as extracted words depending on the word embedding 850
804 α increases. It is also noticed that there is no sig- system and describe the problem statement of topic 851
805 nificant change in F-score with respect to α value. extraction in short contents. Output covers the set of 852
806 Therefore, our proposed SeaNMF-based model is sta- contents based on word similarity. The topic seman- 853
807 ble for modeling topics for short text documents. The tic analysis procedure is to make a group of sentences 854
808 parameter f also plays a vital role in constructing a by using coherent topics which allocated to individ- 855
f
roo
rP
tho
d Au
cte
Fig. 4. Topic coherence and classification performance of SeaNMF model.
Table 7
Comparison method on various document classification
rre
Tweet Tag.News Yahoo.ANS DBLP

P R F P R F P R F P R F
LDA 0.4938 0.4978 0.4957 0.8434 0.8295 0.8363 0.6131 0.6849 0.6469 0.7192 0.6184 0.6648
NMF 0.4788 0.4628 0.4704 0.7874 0.7482 0.7672 0.7414 0.6581 0.6972 0.7414 0.7337 0.7374
PTM 0.4152 0.4949 0.4524 0.8636 0.8417 0.8524 0.7411 0.7149 0.7277 0.7535 0.7478 0.7505
co
GPUDMM 0.4194 0.4177 0.4183 0.8954 0.8823 0.8887 0.6165 0.7419 0.6732 0.7780 0.7684 0.7731
SeaNMF 0.5759 0.5666 0.5712 0.8979 0.8897 0.8937 0.7677 0.7449 0.7560 0.7759 0.7663 0.7709
Sparse SeaNMF 0.5613 0.5679 0.5644 0.8915 0.8912 0.8913 0.7714 0.7471 0.7590 0.7811 0.7724 0.7766
Un
856 ual sentences and by applying cross-entropy using To display semantically associated discovered top- 865
857 document terms, system ables to extract meaningful ics in the SeaNMF model, we build a network, based 866
858 topics. on top keywords in each topic, which is represent- 867
859 For a start, “Yahoo.Ans” and “DBLP” data used in ing in Fig. 5. Several words with great modulation in 868
860 the training process and extracted topics out of pro- the corpus have less degree or in other meaning, less 869
861 cess selected based on the PMI score. The higher associated with other keywords. 870
862 score selected. By analyzing top keywords, most Figure 5 shows the extracted topics using SeaNMF 871
863 similar topics captured with the SeaNMF system model which has minimum noise in words and 872
864 selected. extracted words are correlated together. Hence, pro- 873
f
roo
rP
tho
d Au
Fig. 5. Topic Semantic Analysis in SeaNMF model.
cte
Table 8
Policy, action, reward in various Seq2Seq models
Seq2Seq Policy (P) Action (A) Reward(R)
T.Summarization
H.Generation Attention-based models, Summarize based on next token, ROUGE, BLEU
rre
M.Translation pointer-generators, etc. translate and headline

Q.Generation
Vocabulary based answer selection
Q&A Seq2Seq or selecting the F-measure
start and end index of the
co
answer in the input document

I.Captioning Seq2Seq Next token chosen as caption CIDEr, SPICE, METEOR
V.Captioning
S.Recognition Seq2Seq Next token chosen as speech Connectionist Temporal
Classification (CTC)
Un
BLEU
D.Generation Seq2Seq Generate Dialogue speech Dialogue Length
Dialogue variety
874 posed semantic analysis output display the presented colors have low chance to occur together in same 879
875 model is able to discover meaningful topics from sentence but dark colors represents the highest prob- 880
876 short text document. Further explanation shows, the ability of words which occur together. 881
877 probability of connection between words through Figure 6 represents the topic visualization between 882
878 dataset. As it shows the color bar in Figure, light words in the selected dataset. The number of topics 883
ment shows that the proposed model performed better 917
in terms of accuracy metric as compared to other mod- 918
els. Topic coherence and classification performance 919
show the consistency and strength of the model. The 920
proposed model results demonstrate that extracted 921
hidden topics from short text data are meaningful, 922
and terms are consistent with topics. The perfor- 923
mance of the proposed model can be enhanced by 924
f
using a graph-based word embedding technique. The 925
roo
graph embedding method overcome the limitations 926
of sequential input methods which is used in the pro- 927
posed system. In future we will use a graph-based 928
word embedding technique to improve the perfor- 929
mance of topics modeling from short text document 930
rP
Fig. 6. Topic distribution in word corpus. using machine learning techniques. 931
884 is manually selected as eight and Fig. 6 Visualize

885 the NMF topic modeling output regarding the short Acknowledgments 932
tho
886 text content matrix. Topics are including the detail
887 information of the input document. Word embedding This work was supported by ’Jeju Industry- 933
888 process applied to find the relationship between indi- University Convergence Foundation’ funded by the 934
889 vidual terms. Each colour shows the importance of Ministry of Trade, Industry, and Energy(MOTIE, 935
890 the topic in the dataset and related contents similarity Korea). [Project Name: ’Jeju Industry-University 936
Au
891 in which light colour present minimum probability of convergence Foundation / Project Number: 937
892 words relationship and dark colours show the maxi- N0002327]. 938
893 mum probability between topic terms.
References 939
5. Conclusion
d
894
[1] W.X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan and 940
X. Li, Comparing twitter and traditional media using topic 941
cte
895 In this paper, we present a combination of

models. In European conference on information retrieval, 942
896 reinforcement learning Seq2Seq system using the (2011), pp. 338–349. Springer. 943
897 SeaNMF model to improve the performance of topic [2] B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu and 944
898 modeling in short text documents. The proposed M. Demirbas, Short text classification in twitter to improve 945
information filtering. In Proceedings of the 33rd interna- 946
899 model leverages the word-context semantic corre- tional ACM SIGIR conference on Research and development
rre
947
900 lations in training, which is effective to overcome in information retrieval, (2010), pp. 841–842. ACM. 948
901 the issue of short text documents, such as lack of [3] L. Hong and B.D. Davison, Empirical study of topic model- 949
902 information in meaningful topics extraction. The ing in twitter. In Proceedings of the first workshop on social 950
media analytics, (2010), pp. 80–88. acm. 951
903 semantic relationship between keywords and docu- [4] Z. Wang and H. Wang, Understanding short texts. 2016. 952
co
904 ment information is from skip-gram, which shows [5] X. Yan, J. Guo, Y. Lan and X. Cheng. A biterm topic model 953
905 the efficiency for disclose semantic relationship of for short texts. In Proceedings of the 22nd international 954
906 words. The reinforcement learning system applied to conference on World Wide Web, (2013), pp. 1445–1456. 955
ACM. 956
907 get the reward function from the algorithm in train- [6] D.M. Blei, A.Y. Ng and M.I. Jordan, Latent dirichlet alloca-
Un
957
908 ing set for input-specific RL and compare it in the tion, Journal of Machine Learning Research 3(Jan) (2003), 958
909 test set with specific RL. It avoids the difficulty of 993–1022. 959
910 the document extraction part. This work used a block [7] S. Deerwester, Susan T. Dumais, George W. Furnas, Thomas 960
K. Landauer and R. Harshman, Indexing by latent semantic 961
911 coordinate descent algorithm to solve the SeaNMF analysis, Journal of the American society for Information 962
912 model issue and compare the proposed model with Science 41(6) (1990), 391–407. 963
913 other state-of-art models using real-world data. In [8] T. Hofmann, Probabilistic latent semantic indexing. In ACM 964
SIGIR Forum 51 (2017), pp. 211–218. ACM. 965
914 this paper, we utilized accuracy as a standard mea-
[9] Daniel D. Lee and H. Sebastian Seung, Learning the parts 966
915 sure to evaluate the performance of topic coherence of objects by non-negative matrix factorization, Nature 967
916 and document classification. The accessible measure- 401(6755) (1999), 788. 968
969 [10] J. Choo, C. Lee, Chandan K. Reddy and H. Park, Weakly [26] R. Mehrotra, S. Sanner, W. Buntine and L. Xie, Improv- 1034
970 supervised nonnegative matrix factorization for user-driven ing lda topic models for microblogs via tweet pooling and 1035
971 clustering, Data Mining and Knowledge Discovery 29(6) automatic labeling. In Proceedings of the 36th international 1036
972 (2015), 1598–1621. ACM SIGIR conference on Research and development in 1037
973 [11] D. Kuang, J. Choo and H. Park, Nonnegative matrix fac- information retrieval, (2013), pp. 889–892. 1038
974 torization for interactive topic modeling and document [27] Y. Zuo, J. Zhao and K. Xu, Word network topic model: 1039
975 clustering, In Partitional Clustering Algorithms (2015), a simple but general solution for short and imbalanced 1040
976 pp. 215–243. Springer. texts, Knowledge and Information Systems 48(2) (2016), 1041
977 [12] D. Kuang, S. Yun and H. Park, Symnmf: nonnegative 379–398. 1042
978 low-rank approximation of a similarity matrix for graph [28] F. Kou, J. Du, C. Yang, Y. Shi, M. Liang, Z. Xue and H. 1043
f
979 clustering, Journal of Global Optimization 62(3) (2015), Li, A multi-feature probabilistic graphical model for social 1044
roo
980 545–574. network semantic search, Neurocomputing 336 (2019), 1045
981 [13] X. Quan, C. Kit, Y. Ge and S.J. Pan, Short and sparse text 67–78. 1046
982 topic modeling via self-aggregation. In Twenty-Fourth Inter- [29] D. Ramage, D. Hall, R. Nallapati and Christopher D. 1047
983 national Joint Conference on Artificial Intelligence, 2015. Manning, Labeled lda: A supervised topic model for 1048
984 [14] Y. Zuo, J. Wu, H. Zhang, H. Lin, F. Wang, K. Xu and H. credit attribution in multi-labeled corpora. In Proceedings 1049
985 Xiong, Topic modeling of short texts: A pseudodocument of the 2009 Conference on Empirical Methods in Nat- 1050
rP
986 view. In Proceedings of the 22nd ACM SIGKDD interna- ural Language Processing: Volume 1-Volume 1, (2009), 1051
987 tional conference on knowledge discovery and data mining, pp. 248–256. Association for Computational Linguistics. 1052
988 (2016), pp. 2105–2114. ACM. [30] F. Kou, J. Du, Z. Lin, M. Liang, H. Li, L. Shi and C. Yang, 1053
989 [15] T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient A semantic modeling method for social network short text 1054
990 estimation of word representations in vector space. arXiv based on spatial and temporal characteristics, Journal of 1055
991 preprint arXiv:1301.3781, 2013. Computational Science 28 (2018), 281–293. 1056
tho
992 [16] J. Pennington, R. Socher and C. Manning, Glove: Global [31] T. Lin, W. Tian, Q. Mei and H. Cheng, The dual-sparse topic 1057
993 vectors for word representation. In Proceedings of the 2014 model: mining focused topics and focused terms in short 1058
994 conference on empirical methods in natural language pro- text. In Proceedings of the 23rd international conference on 1059
995 cessing (EMNLP), (2014), pp. 1532–1543. World wide web, (2014), pp. 539–550. 1060
996 [17] C. Li, H. Wang, Z. Zhang, A. Sun and Z. Ma, Topic mod- [32] P. Bicalho, M. Pita, G. Pedrosa, A. Lacerda and Gisele L. 1061
997 eling for short texts with auxiliary word embeddings. In Pappa, A general framework to expand short text for topic 1062
Au
998 Proceedings of the 39th International ACM SIGIR confer- modeling, Information Sciences 393 (2017), 66–81. 1063
999 ence on Research and Development in Information Retrieval [33] J. Choo, C. Lee, Chandan K. Reddy and H. Park, 1064
1000 (2016), pp. 165–174. ACM. Utopian: User-driven topic modeling based on interac- 1065
1001 [18] V.K.R. Sridhar, Unsupervised topic modeling for short texts tive nonnegative matrix factorization, IEEE Transactions 1066
1002 using distributed representations of words. In Proceedings on Visualization and Computer Graphics 19(12) (2013), 1067
1003 of the 1st workshop on vector space modeling for natural 1992–2001. 1068
1004 language processing, (2015), pp. 192–200. [34] H. Kim, J. Choo, J. Kim, Chandan K. Reddy and H. 1069
d
1005 [19] G. Xun, V. Gopalakrishnan, F. Ma, Y. Li, J. Gao and A. Park, Simultaneous discovery of common and discrimina- 1070
1006 Zhang, Topic discovery for short texts using word embed- tive topics via joint nonnegative matrix factorization. In 1071
cte
1007 dings. In 2016 IEEE 16th international conference on data Proceedings of the 21th ACM SIGKDD International Con- 1072
1008 mining (ICDM), (2016), pp. 1299–1304. IEEE. ference on Knowledge Discovery and Data Mining, (2015), 1073
1009 [20] T. Mikolov, I. Sutskever, K. Chen, Greg S. Corrado and Jeff pp. 567–576. ACM. 1074
1010 Dean. Distributed representations of words and phrases and [35] X. Yan, J. Guo, S. Liu, X. Cheng and Y. Wang, Learning 1075
1011 their compositionality. In Advances in neural information topics in short texts by non-negative matrix factorization 1076
1012 processing systems (2013), pp. 3111–3119. on term correlation matrix. In proceedings of the 2013 1077
rre
1013 [21] O. Levy and Y. Goldberg, Neural word embedding as SIAM International Conference on Data Mining, (2013), 1078
1014 implicit matrix factorization. In Advances in neural infor- pp. 749–757. SIAM. 1079
1015 mation processing systems, (2014), pp. 2177–2185. [36] T. Shi, K. Kang, J. Choo and Chandan K. Reddy, Short- 1080
1016 [22] O. Levy, Y. Goldberg and I. Dagan, Improving distributional text topic modeling via non-negative matrix factorization 1081
1017 similarity with lessons learned from word embeddings, enriched with local word-context correlations. In Proceed- 1082
co
1018 Transactions of the Association for Computational Linguis- ings of the 2018 World Wide Web Conference, (2018), 1083
1019 tics 3 (2015), 211–225. pp. 1105–1114. 1084
1020 [23] X.-H. Phan, L-M. Nguyen and S. Horiguchi, Learning to [37] M.-T. Luong, H. Pham and Christopher D. Manning, 1085
1021 classify short and sparse text & web with hidden topics Effective approaches to attention-based neural machine 1086
1022 from large-scale data collections. In Proceedings of the translation. arXiv preprint arXiv:1508.04025, 2015. 1087
Un
1023 17th international conference on World Wide Web, (2008), [38] Y. Wu, M. Schuster, Z. Chen, Quoc V. Le, M. Norouzi, W. 1088
1024 pp. 91–100. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al., 1089
1025 [24] O. Jin, Nathan N. Liu, K. Zhao, Y. Yu and Q. Yang, Google’s neural machine translation system: Bridging the 1090
1026 Transferring topical knowledge from auxiliary long texts gap between human and machine translation. arXiv preprint 1091
1027 for short text clustering. In Proceedings of the 20th ACM arXiv:1609.08144, 2016. 1092
1028 international conference on Information and knowledge [39] D. Bahdanau, K. Cho and Y. Bengio, Neural machine 1093
1029 management, (2011), pp. 775–784. translation by jointly learning to align and translate. arXiv 1094
1030 [25] J. Weng, E-P. Lim, J. Jiang and Q. He, Twitterrank: finding preprint arXiv:1409.0473, 2014. 1095
1031 topic-sensitive influential twitterers. In Proceedings of the [40] S. Shen, Y. Cheng, Z. He, W. He, H. Wu, M. Sun and Y. 1096
1032 third ACM international conference onWeb search and data Liu, Minimum risk training for neural machine translation. 1097
1033 mining, (2010), pp. 261–270. arXiv preprint arXiv:1512.02433, 2015. 1098
1099 [41] Alexander M. Rush, S. Chopra and J. Weston, A neural [52] I. Sutskever, O. Vinyals and Quoc V. Le, Sequence to 1139
1100 attention model for abstractive sentence summarization. sequence learning with neural networks. In Advances in neu- 1140
1101 arXiv preprint arXiv:1509.00685, 2015. ral information processing systems, (2014), pp. 3104–3112. 1141
1102 [42] S. Chopra, M. Auli and Alexander M. Rush, Abstrac- [53] M. Riedl and C. Biemann, Text segmentation with topic 1142
1103 tive sentence summarization with attentive recurrent neural models, Journal for Language Technology and Computa- 1143
1104 networks. In Proceedings of the 2016 Conference of the tional Linguistics 27(1) (2012), 47–69. 1144
1105 North American Chapter of the Association for Computa- [54] M. Riedl and C. Biemann, How text segmentation algo- 1145
1106 tional Linguistics: Human Language Technologies, (2016), rithms gain from topic models. In Proceedings of the 2012 1146
1107 pp. 93–98. Conference of the North American Chapter of the Asso- 1147
1108 [43] S.-Q. Shen, Y.-K. Lin, C.-C. Tu, Y. Zhao, Z.-Y. Liu, M.-S. ciation for Computational Linguistics: Human Language 1148
f
1109 Sun, et al., Recent advances on neural headline generation, Technologies, (2012), pp. 553–557. Association for Com- 1149
roo
1110 Journal of Computer Science and Technology 32(4) (2017), putational Linguistics. 1150
1111 768–784. [55] D. Fisher, M. Kozdoba and S. Mannor, Topic modeling via 1151
1112 [44] A. See, Peter J. Liu and Christopher D. Manning, Get to full dependence mixtures. arXiv preprint arXiv:1906.06181, 1152
1113 the point: Summarization with pointer-generator networks. 2019. 1153
1114 arXiv preprint arXiv:1704.04368, 2017. [56] R. Vedantam, C. Lawrence Zitnick and D. Parikh, Cider: 1154
1115 [45] R. Nallapati, B. Zhou, C. Gulcehre, B. Xiang, et al., Abstrac- Consensus-based image description evaluation. In Proceed- 1155
rP
1116 tive text summarization using sequence-tosequence rnns and ings of the IEEE conference on computer vision and pattern 1156
1117 beyond. arXiv preprint arXiv:1602.06023, 2016. recognition, (2015), pp. 4566–4575. 1157
1118 [46] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel and Y. [57] K. Papineni, S. Roukos, T. Ward and W.-J. Zhu, Bleu: a 1158
1119 Bengio, End-to-end attentionbased large vocabulary speech method for automatic evaluation of machine translation. In 1159
1120 recognition. In 2016 IEEE international conference on Proceedings of the 40th annual meeting on association for 1160
1121 acoustics, speech and signal processing (ICASSP), (2016), computational linguistics, (2002), pp. 311–318. Associa- 1161
tho
1122 pp. 4945–4949. IEEE. tion for Computational Linguistics. 1162
1123 [47] A. Graves and N. Jaitly, Towards end-to-end speech recog- [58] C.-Y. Lin, Looking for a few good metrics: Automatic sum- 1163
1124 nition with recurrent neural networks. In International marization evaluation-how many samples are enough? In 1164
1125 conference on machine learning, (2014), pp. 1764–1772. NTCIR, 2004. 1165
1126 [48] Y. Miao, M. Gowayyed and F. Metze, Eesen: End-to-end [59] S. Banerjee and A. Lavie, Meteor: An automatic metric 1166
1127 speech recognition using deep rnn models and wfst- for mt evaluation with improved correlation with human 1167
Au
1128 based decoding. In 2015 IEEE Workshop on Automatic judgments. In Proceedings of the acl workshop on intrinsic 1168
1129 Speech Recognition and Understanding (ASRU), (2015), and extrinsic evaluation measures for machine translation 1169
1130 pp. 167–174. IEEE. and/or summarization, (2005), pp. 65–72. 1170
1131 [49] S. Bengio, O. Vinyals, N. Jaitly and N. Shazeer, Sched- [60] A. Zubiaga and H. Ji, Harnessing web page directories for 1171
1132 uled sampling for sequence prediction with recurrent neural large-scale classification of tweets. In Proceedings of the 1172
1133 networks. In Advances in Neural Information Processing 22nd international conference on world wide web, (2013), 1173
1134 Systems (2015), pp. 1171–1179. pp. 225–226. ACM. 1174
d
1135 [50] M. Zamanian and P. Heydari, Readability of texts: State of [61] R. ek and P. Sojka, Gensim—statistical semantics in python. 1175
1136 the art, Theory & Practice in Language Studies 2(1) 2012. Retrieved from genism. org, 2011. 1176
cte
1137 [51] W. Li and A. McCallum, Semi-supervised sequence mod-

1138 eling with syntactic topic models. In AAAI 5 (2005),
pp. 813–818.
rre
co
Un

Uncorrected Author Proof

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Uncorrected Author Proof

Uploaded by

Copyright:

Available Formats

Journal of Intelligent & Fuzzy Systems xx (20xx) x–xx 1

1 Topic modeling in short-text using

4 Zeinab Shahbazi∗ , Faisal Jamil and Yungcheol Byun

23 1. Introduction vide valuable data to extract the user’s topic interests 30

and to drive better decisions. Hence, the automatic 31

the short text is limited, which causes data sparsity 39

jejunu.ac.kr. analyzing underlying knowledge difficult. There are 43

47 cover semantically meaningful and valuable topics exploratory. 99

82 class primarily focuses on the evaluation of the capa- occurring. 134

named WNTM to improve the sparsity and imbal- 229

ance issues to generate pseudo-documents in word 230

semantic compactness of data without introducing 232

The develop MFPGM is also used to overcome the 241

pose an approach that is used to generate the pseudo 243

aggeneration is process using reinforcement tech- in order to achieve high interpretability.

276 niques so that the aggregation is build based in

278 data will also increase the number of parameters

ground-truth is act as a teacher [49]. During the past 338

291 modeling Seq2Seq approach in order to enhance the complex- 340

353 short documents. basis of word-content, and word-document relation- 383

361 SeaNMF techniques so far. to summarizes frequently used notations 391

3.1. Sequence to sequence model (Seq2Seq) 392

machine translation, dialogue generation, document 395

X Length of input sequence Te , X = {x1 , x2 , ... , xTe }

Û Generated output length sequence T, Û = {uˆ1 ,uˆ2 ,..., uˆT }

jt Hiden state decoder at time t.

J semantic correlation matrix.

Algorithm 1 the Seq2Seq procedure faces conformity in the oper-

on generated action through the model distribution to

anticipate the next action. 432

The reinforcement learning algorithm is evaluated 446

425 statement agent-environment relationship in the reinforcement 448

Fig. 2. Simple sequence to sequence model.

485 solution technique contents, Vc (j, :) = −

494 mation and reward models. There is a lack of a model

Algorithm 2 skip-gram model.” 532

General Reinforcement Learning algorithm for online short

based on above mentioned explanation.

502 3.3. SeaNMF model (ET E)(F,F ) + α(VcT Vc )(F,F )

503 SeaNMF model is a novel semantic-assisted NMF

combines the semantic information of document (Vc V T V )(F )

using word embedding dictionary in a training model, − (4) 548

SeaNMF model is presented in Algorithm 3. 550

510 3.3.1. Model evaluation

Algorithm 3 ships. Although topic similarity can be calculated 584

Number of topics F, α; topic models.” 588

dimensions where V is the size of a word. For every 591

Equation 5 is proposed for standard NMF. It pre-

556 hidden words with the same significance, share the

559 (J1 andJ4 ) are in separate documents, but their con-

561 document contents. A simple explanation of SeaNMF

of the text document. e.g., “Harvard university of

564 document datasets. Finally, Dataset information, 610

571 “Oxford, university” is the meaning of standard NMF sented. 617

574 have a shared word “university,” which is the mean-

to develop the topic modeling environment, config- 625

Topic modeling on short text documents evaluates 627

Table 3 Questions, version 2.04. Collected data is from 10 662

Questions, version 2. It contains Questions and a 678

more suitable answer to that question. And finally 679

applied in the topic modeling system. More detail 687

639 information about each part is explained as below: 688

645 detail below: used to calculate Topic coherent.