You are on page 1of 6

Semantic Convolutional Neural Network model for

Safe Business Investment by Using BERT


2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) | 978-0-7381-1180-3/20/$31.00 ©2020 IEEE | DOI: 10.1109/SNAMS52053.2020.9336575

1st Maryam Heidari 2nd Setareh Rafatirad


George Mason University George Mason University
Fairfax, US Fairfax, US
mheidari@gmu.edu srfatir@gmu.edu

Abstract—The real estate market creates one of the significant techniques are used for rent prediction [3]. Greenland [4] in-
business domains for investors, but a wise investment in real vestigates eager methods like bagging REP trees to predict rent
estate is more important for low-income people who have just price for residents near the university campus, and bagging
one home as their life-time investment. So during the economic
recession, many homeowners lose their homes because of two REP trees provides the best prediction accuracy. However, The
major factors: one, they could not pay their mortgages, and coverage area of the training set is limited zip codes near the
second, the house could not be rented to cover the mortgages. university campus. So, their solution can generate a biased
So real estate investment can be secured if the house could model since all data has the same Geo location characteristics.
be an excellent rental property. Information from real estate The previous predictive models for rent price did not
websites can be a rich source of knowledge essential to detect
the house’s potential as a rental property. This work uses a consider the house type and zip code at the same time [5],
natural language processing approach to propose a new real [6]. Rent variances can be very high for properties, even in
estate investment model based on online textual information. For the same city. For instance, the zip codes 20143 and 20105 are
the first time, we apply a transfer learning approach based on in the same neighborhood, but the average rent prices are not
multiple online resources to recognize the house as valuable rental similar. The rent price is also based on area space, bedrooms,
property in the real estate market. Bidirectional attention models
based on transformers (BERT) are used to extract features for bathrooms, community factors, etc. The average rent price for
semantic convolutional neural network models to secure real a zipcode depends on internal and external factors such as
estate investment. This research has three main points: (1) using walking score, transit score, and crime rate of a zip code. The
transformers to implement semantic CNN based on Airbnb, transit score indicates the connectivity or access to essential
Zillow data (2) performance evaluation of traditional Machine services. Crime score shows different violent incidents based
learning models with our new transfer learning model for rent
prediction. (3) a new public data set for more than 5 million on specific zip code. Lazy learning models find the local
houses in the U.S based on semantic information that can be optimal solutions for each test instance by storing the training
used for real estate market research. This research offers a new instances and postpone the generalization until a new instance
model for safe investment in the real estate market based on the arrives [7].
transfer learning approach. This research uses multiple online resources for real es-
Index Terms—Natural language processing, Machine learning,
Transformers, Semantic Neural Network. tate knowledge extraction by using different natural language
processing approaches. Deep learning architectures can be
I. I NTRODUCTION AND RELATED WORKS powerful tools in text processing of online reviews [8]. People
share their ideas about different business topics in social media
Rate of Return shows a profitable investment over a certain
platforms [9] [10]. Data from public and personal blogs can be
time, and Net Present Value is an essential factor [1]. NPV
valuable for improving online data collection phase [11] [12],
Equation is [2]:
and this data can be categorized based on different topics in
K real estate. Homeowners and homebuyers express their ideas
X CFk
N etP resentV alue = (1) in both positive and negative ways on online platforms. So
(1 + re)k
K=0 for each house, the sentiment score can be calculated by the
In equation 1, cash flow is calculated for a house in each sentiment classification of home owners [13].
period K, and ’re’ is a satisfactory rate of return. One of Also, if the online review has a limited number of words,
the essential factors in measuring the Net Present Value of by using advanced sentiment classification techniques, we can
investment is cash flow. Rent, house mortgage, annual house assign a sentiment score to each online review [14]. The
tax, annual house expenses can affect Cash flow. House Polarization of ideas in the real estate market to promote
rent positively affects cash flow, but the higher vacancy rate buyers to buy real estate properties in specific cities can be
can create a negative cash flow. Several machine learning recognized by polarization detection methods in social media

Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 04:12:20 UTC from IEEE Xplore. Restrictions apply.
platforms [15] [16]. It is also essential to detect Discrimination At first, we compare lazy and eager learning performance
or social relations [17] [18] in online comment’s text in rent for rent prediction based on R-squared and Mean-Absolute-
price in different areas in the cities. Real estate companies can Error, and then we introduce our new approach that can predict
use homebuyers’ emotions to make specific real estate deals, the safe business investment by using Transformers.
so it is vital to do emotion classification [19] in the home
owner’s comments to feed the classifiers with more reliable
data.
Real estate websites show the houses in each zip code
based on the price map; pictures with similar price tags can
be recognized by image caption generation techniques [20].
Also, the comparison between the house images by expert
image processing techniques [21] [22] can detect pictures of
the homes which have more number of views by home buyers.
These images can show the customer interest in the real estate
market about specific house types. Chatbots can be used to
analyze more verbal and textual data [23], shared in social
media platforms about the real estate market. In this work, to
compensate for the computational time of NLP algorithms, we
optimize hardware and power usage by using memory-efficient
methods [24] [25] [26]. We optimized memory usage by using
[27] [28] for intensive data workload. The advanced design of
hierarchical convolutional neural [29] [30] networks is used in
this research to accelerate processing time.
Convolutional neural network models are examined in dif-
ferent scientific domains such as Cyber security [31] [32] Fig. 1. Rent price correlation with external house attributes
[33] [34], health [35] [36] , business applications [37]. The
business deal recommender system in this research is based PCA is used to select housing features for rent prediction.
on convolutional neural network models. The number of bedrooms and bathrooms, area space, year, and
the sale price directly affects rent. Figure 1 shows a correlation
II. DATA S ET of rent price with walk score, crime rate. Data set attributes
Airbnb data set provides a rich source of online reviews are: ZipID (unique id for each house), number of bedrooms,
about properties and traveler ideas from different locations. floor size, geographical location of each house, year built,
Housing reviews based on different geographical areas are house type, zip code, estimated rent, Home association fee.
extracted from Airbnb websites. Also, by using Zillow API, The school score is assigned to a house based on the location
we collect a data set of residential buildings [38] based on of the house to the school. PCA results in Figure 2 shows
house types and zip codes. Five million housing property features like Home association fee and walk score show less
records based on three house types townhouse, single-family, importance for single-family but more critical for townhouses
and condo, are collected. The walk score is collected from the and condos. The number of bedrooms is found to be an
public transit website that shows the accessibility of the house essential feature in all house types. Average school rating is
to essential services. Crime rates show the rate of violent crime discovered to be an important feature for both single-family
incidents in different zip codes and are collected from the and condo.
crime rate website. Housing features are based on the Zillow The task is to train a model for each cluster(house type) sj :
real estate website information. sj → Learner → fj , such that fj is a rent model that shows
the house potential to become a profitable rental property.
III. M ETHODS Each cluster has it’s unique rent prediction model. We build
For word embedding of online reviews, it is possible to rent prediction models for each house type and a subset of
reduce the size of architecture for inference by incorporating zip codes. In this research, we use a Grid search to define
methods like knowledge distillation and matrix factorization all possible combinations of hyper-parameters, including the
[39]. However, We use GloVe [40] for word embedding depth of the tree and the number of kernels for each machine
of home reviews in different Geographical locations. This learning model. The test data set covers 30% of the housing
work uses GloVe instead of the word2vec model for word data set.
embedding since Glove is more scalable and fast for converting Table II shows a comparison of lazy and eager machine
5 million housing reviews to the word vectors. Also, Glove learning algorithms in rent prediction. As can be seen, the
includes a huge global word corpus, which is essential when FFNN model as an eager learner and Kstar as a lazy learner
dealing with a variety of online people reviews about different ranked first and second, respectively, for the rent prediction
characteristics of housing properties. accuracy. The neural network model has the best performance

Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 04:12:20 UTC from IEEE Xplore. Restrictions apply.
for single-family and condo, respectively. However, The Kstar representation is not sufficient without using transformers,
model outperformed the FFNN model in the townhouse’s rent which provide language understanding of each online review.
prediction by achieving less prediction error. The rent predic- This work uses a BERT model in Figure 4 for assigning
tion results are very close in Random forest and KNN however, a sentiment score to each real estate property based on
KNN outperformed the RF in Townhouse rent prediction. online reviews. The bidirectional encoder consists of multiple
Based on Table II, lazy methods are not superior to all the transformer layers that give the model the ability to extract
eager methods studied in this work, but the best accuracy is deep features of the online user’s text at both token and
related to the FFNN(eager) and Kstar(Lazy). Our new model sentence levels. Each transformer contains self-attention layers
will be explained in the next section. that can learn multiple attention features for each word in a text
[43]. Self-attention mechanism can be explained as embedding
IV. O UR NEW MODEL FOR SAFE BUSINESS INVESTMENT vector for token input, the query (Q), key (K) and value (V )
are created from each three parameter matrices where W Q ∈
Figure 3 shows the new model for rent prediction. Glove in Rdmodel ×dk ,W K ∈ Rdmodel ×dk and W V ∈ Rdmodel ×dv . So
the first phase convert homeowner reviews, Airbnb reviews, this mechanism can be shown as [44]:
and Zillow home reviews are to the word vector. In figure 3,
based on Zillow homeowner and Airbnb reviews for each zip QK T
Attention(Q, K, V ) = sof tmax( √ )V (2)
code, two different sentiment scores are assigned to each real dk
estate property. For example, more positive reviews about a dk = dv = dmodel = 768 in BERT base version which is used
specific city or zip code assign a higher score to the houses in this research.
in that city or zip code. The new sentiment score is added to Since our sentiment classification task [44] is multi-
our original features set. Then sentiment scores are one part label classification since we have three different house
of the input space for final classifiers. types(Townhouse, Single Family, Condo), we choose the first
However, for sentiment classification of online reviews, embedding vector, which indicates the special token [CLS]
one approach can be using a feature-based solution such as from the transformer final hidden state. In this vector, the
ELMO [41], which is a deep contextual word representation. class label’s occurrence in the online comment is shown with
However, the results for sentiment classification of tweets are C ∈ RH , and H is the embedding size of the online user’s
not satisfactory in comparison with using BERT [42]. The comment. So a dense neural layer is treated as a classification
main reason is that ELMO is useful in creating language layer, which consists of parameters W ∈ RK×H , where K is
representation, but BERT is a bidirectional transformer, a the number of class labels that we have in our data set. The
powerful model for language processing of massive unlabeled sentiment prediction probabilities P ∈ RK are calculated by
data. For generating sentiment scores for each house, sentence a softmax activation function is:

P = sof tmax(CW T + b) (3)

In this research, all the parameters fined tuned together to


minimize the Negative Log Likelihood(NLL) [44] , and the
loss function based on emotion label c in the training data.
1 X  (i) 
L=− log pˆc (4)
N
To solve the problem of an imbalanced number of house
types in each zip code, we consider weighted balance warming
in our loss function in equation (4) in the fine-tuning process.
N  
1 X ˆ(i)
L = − PN log wc pc (5)
i=1 wc (i) i=1

c is the home type label, and its weight is calculated by its


frequency(the number of specific house type in the related zip
code) as :

min(f req(c))
wc = , ∀c ∈ c (6)
f req(c)
We apply the weighted balance in the first epoch of fine-
tuning to improve the performance and normalizing of senti-
Fig. 2. PCA for three different house types ment labels.

Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 04:12:20 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. Our new model for safe real estate investment

TABLE I
P ERFORMANCE EVALUATION OF NEW MODEL

algorithm f1 score mcc


K-Star 0.81 0.76
FFNN 0.82 0.83
Our new model 0.92 0.88

parameters’ performance in the confusion matrix(TP, TN, FP,


FN). TableI shows the results for our new model, which uses a
Fig. 4. BERT, bidirectional attention model [42] transfer learning method for sentiment classification of online
reviews to implement a convolutional neural network model,
and the new method outperformed previous models. The new
V. S EMANTIC C ONVOLUTIONAL N EURAL NETWORK
model predicts the house’s potential to become a good rental
The original house features such as a number of bedrooms, property and can guaranty a safe investment with high accu-
bathroom, year built, BERT sentiment scores, Glove word racy compared to just using one machine learning model eager
embedding results of all online reviews, transit, Crime, and or Lazy learners for the prediction task. Also, considering the
school scores are the input space for the CNN model in multiple textual online information improves the prediction
Figure 3. accuracy significantly. As a result of this research, we create
The new model classified each house as ’Rent’ or ’not Rent.’ a new public data set of 5 million houses and their semantic
’Rent’ means the house is a good rental property, and it is a information assigned to each house in the united states. More
secure business investment, and ’not Rent’ means the house than 3 million housing reviews are added to the data set based
has a low chance to be a rental property. The hyperparameters on each zip code.
and pooling size for the CNN model is based on the Microsoft
CNN design [45] [46]for semantic data extraction of online VI. C ONCLUSION
comments. Our Convolutional neural network model is a
sequential model that contains a two-dimensional convolution In this research, the classification of real estate properties
layer with a flattening layer and two dense layers. We use the into profitable rental properties and not profitable provides
CLSM model for CNN in this work since It is one of the best investors with a model to make a safe business investment.
models to extract text semantics of online users’ reviews. Relu This work contribution can be summarized in 3 main points:
is used as an activation function of the first dense layer and first, Using the BERT model as a bidirectional transformer for
Sigmoid for the final layer. sentiment classification of online reviews based on different
F1 score and MCC are calculated for each classifier for rent online resources to detect profitable rental properties. Second,
prediction. The MCC is a valuable metric since it shows all using semantic CNN model to extract the deep context of

Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 04:12:20 UTC from IEEE Xplore. Restrictions apply.
TABLE II [16] A. Rajabi, C. Gunaratne, A. V. Mantzaris, and I. Garibay, “On countering
P REDICTING PROFITABLE RENTAL PROPERTY BY E AGER AND LAZY disinformation with caution: Effective inoculation strategies and others
LEARNERS that backfire into community hyper-polarization,” in Social, Cultural,
and Behavioral Modeling (R. Thomson, H. Bisgin, C. Dancy, A. Hyder,
Single-Family Town-House Condo and M. Hussain, eds.), (Cham), pp. 130–139, Springer International
Algorithm
R2 MAE R2 MAE R2 MAE Publishing, 2020.
FFNN 0.88 118.7 0.83 119.3 0.87 114.2 [17] F. Ebrahimi and S. A. H. Golpayegani, “Personalized recommender
RF 0.84 122.7 0.79 123.3 0.86 112.2 system based on social relations,” in Iranian Conference on Electrical
LR 0.79 139.1 0.78 139.37 0.82 120.7 Engineering, pp. 218–223, IEEE, 2016.
J48 0.72 132.1 0.78 140.37 0.80 121.7 [18] G. Williams, M. Tushev, F. Ebrahimi, and A. Mahmoud, “Modeling user
SVM 0.74 130.1 0.81 137.37 0.83 118.7 concerns in sharing economy: the case of food delivery apps,” Automated
SMO 0.73 131.1 0.80 138.37 0.81 121.7 Software Engineering, pp. 1–35, 2020.
LWL 0.79 128.1 0.86 128.37 0.80 123.7 [19] S. Zad and M. Finlayson, “Systematic evaluation of a framework for
KStar 0.87 119.7 0.85 116.3 0.84 121.2 unsupervised emotion recognition for narrative text,” in Proceedings of
Lazy-DT 0.78 129.1 0.85 127.37 0.80 124.7 the First Joint Workshop on Narrative Understanding, Storylines, and
ML KNN 0.83 120.7 0.79 123.3 0.85 110.2 Events, (Online), pp. 26–37, Association for Computational Linguistics,
KNN 0.84 118.7 0.82 121.3 0.86 110.2 July 2020.
[20] S. Amirian, K. Rasheed, T. R. Taha, and H. R. Arabnia, “A short
review on image caption generation with deep learning,” in The 23rd
International Conference on Image Processing, Computer Vision and
online housing reviews. The new CNN model improves pre- Pattern Recognition (IPCV‘19), World Congress in Computer Science,
diction accuracy very significantly. Third, a new public data set Computer Engineering and Applied Computing (CSCE’19), pp. 10–18,
of more than five million houses with its sentiment score and IEEE, 2019.
[21] S. Amirian, Z. Wang, T. R. Taha, and H. R. Arabnia, “Dissection of
semantic information. The new data set will be available for deep learning with applications in image recognition,” in Computational
machine learning research, especially in the real estate market Science and Computational Intelligence; ”Artificial Intelligence” (CSCI-
research. ISAI); 2018 International Conference on. IEEE, pp. 1132–1138, 2018.
[22] N. Soans, E. Asali, Y. Hong, and P. Doshi, “Sa-net: Robust state-action
R EFERENCES recognition for learning from observations,” in 2020 IEEE International
Conference on Robotics and Automation (ICRA), pp. 2153–2159, IEEE,
[1] “Investopeida.” https://www.investopedia.com/. 2020.
[2] “propertymetric.” https://www.propertymetrics.com/blog/2015/06/11- [23] E. Asali, F. Shenavarmasouleh, F. G. Mohammadi, P. S. Suresh, and
/what-is-npv/. H. R. Arabnia, “Deepmsrf: A novel deep multimodal speaker recognition
[3] M. Kuntz and M. Helbich, “Geostatistical mapping of real estate prices: framework with feature selection,” arXiv preprint arXiv:2007.06809,
an empirical comparison of kriging and cokriging. international journal 2020.
of geographical information science,” 2014. [24] M. R. Jokar, L. Zhang, and F. T. Chong, “Cooperative nv-numa:
[4] J. Lambert and J. Greenland, “Is the price right? prediction of monthly prolonging non-volatile memory lifetime through bandwidth sharing,”
rental prices in provo, utah,” 2015. in Proceedings of the International Symposium on Memory Systems,
[5] V. Limsombunchai, “House price prediction: hedonic price model vs. ar- pp. 67–78, 2018.
tificial neural network. new zealand agricultural and resource economics [25] A. Holmes, M. R. Jokar, G. Pasandi, Y. Ding, M. Pedram, and F. T.
society conference,” 2004. Chong, “Nisq+: Boosting quantum computing power by approximating
[6] H. Yu and J. Wu, “Real estate price prediction with regression and quantum error correction,” in Proceedings of the 47th International
classification,” 2016. Symposium on Computer Architecture, 2020.
[7] J. H. Friedman, R. Kohavi, and Y. Yun, “Lazy decision trees. aaai/iaai.,”
[26] M. R. Jokar, J. Qiu, F. T. Chong, L. L. Goddard, J. M. Dallesasse,
1996.
M. Feng, and Y. Li, “Baldur: A power-efficient and scalable network
[8] S. Saeedi, A. Panahi, S. Saeedi, and A. C. Fong, “Cs-nlp team
using all-optical switches,” in 2020 IEEE International Symposium on
at semeval-2020 task 4: Evaluation of state-of-the-artnlp deep learn-
High Performance Computer Architecture (HPCA), pp. 153–166, IEEE,
ing architectures on commonsense reasoning task,” arXiv preprint
2020.
arXiv:2006.01205, 2020.
[27] H. M. Makrani and H. Homayoun, “Mena: A memory navigator for
[9] T. A. Oghaz, E. c. Mutlu, J. Jasser, N. Yousefi, and I. Garibay,
modern hardware in a scale-out environment,” in 2017 IEEE Interna-
“Probabilistic model of narratives over topical trends in social media:
tional Symposium on Workload Characterization (IISWC), pp. 2–11,
A discrete time model,” in Proceedings of the 31st ACM Conference
IEEE, 2017.
on Hypertext and Social Media, HT ’20, p. 281–290, Association for
Computing Machinery, 2020. [28] H. M. Makrani, H. Sayadi, S. M. P. Dinakarra, S. Rafatirad, and
[10] I. Garibay, T. A. Oghaz, N. Yousefi, E. C. Mutlu, M. Schiappa, H. Homayoun, “A comprehensive memory analysis of data intensive
S. Scheinert, G. C. Anagnostopoulos, C. Bouwens, S. M. Fiore, workloads on server class architecture,” in Proceedings of the Interna-
A. Mantzaris, et al., “Deep agent: Studying the dynamics of in- tional Symposium on Memory Systems, pp. 19–30, 2018.
formation spread and evolution in social networks,” arXiv preprint [29] M. Farhadi, M. Ghasemi, and Y. Yang, “A novel design of adaptive
arXiv:2003.11611, 2020. and hierarchical convolutional neural networks using partial reconfigu-
[11] F. Jafariakinabad and K. A. Hua, “Maximal sequence mining approach ration on fpga,” in 2019 IEEE High Performance Extreme Computing
for topic detection from microblog streams,” in 2016 IEEE Symposium Conference (HPEC), pp. 1–7, IEEE, 2019.
Series on Computational Intelligence (SSCI), pp. 1–8, IEEE, 2016. [30] M. Farhadi and Y. Yang, “Tkd: Temporal knowledge distillation for
[12] F. Jafariakinabad and K. A. Hua, “A self-supervised representation active perception,” in 2020 IEEE Winter Conference on Applications of
learning of sentence structure for authorship attribution,” arXiv preprint Computer Vision (WACV), pp. 942–951, IEEE, 2020.
arXiv:2010.06786, 2020. [31] M. Heidari, J. H. J. Jones, and O. Uzuner, “Deep contextualized word
[13] M. Hajiali, “Big data and sentiment analysis: A comprehensive and embedding for text-based online user profiling to detect social bots
systematic literature review,” Concurrency and Computation: Practice on twitter,” in IEEE 2020 International Conference on Data Mining
and Experience, vol. 32, no. 14, p. e5671, 2020. Workshops (ICDMW), ICDMW 2020, 2020.
[14] M. Bastan, M. Koupaee, Y. Son, R. Sicoli, and N. Balasubramanian, [32] M. Heidari and J. H. J. Jones, “Using bert to extract topic-independent
“Author’s sentiment prediction,” 2020. sentiment features for social media bot detection,” in IEEE 2020 11th
[15] I. Garibay, A. V. Mantzaris, A. Rajabi, and C. E. Taylor, “Polarization Annual Ubiquitous Computing, Electronics Mobile Communication
in social media assists influencers to become more influential: analysis Conference, UEMCON 2020, 2020.
and two inoculation strategies,” Scientific reports, vol. 9, no. 1, pp. 1–9, [33] A. Vakil, F. Behnia, A. Mirzaeian, H. Homayoun, N. Karimi, and
2019. A. Sasan, “Lasca: Learning assisted side channel delay analysis for

Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 04:12:20 UTC from IEEE Xplore. Restrictions apply.
hardware trojan detection,” in 2020 21st International Symposium on
Quality Electronic Design (ISQED), pp. 40–45, 2020.
[34] F. Behnia, A. Mirzaeian, M. Sabokrou, S. Manoj, T. Mohsenin, K. N.
Khasawneh, L. Zhao, H. Homayoun, and A. Sasan, “Code-bridged clas-
sifier (cbc): A low or negative overhead defense for making a cnn classi-
fier robust against adversarial attacks,” arXiv preprint arXiv:2001.06099,
2020.
[35] M. Saadati, J. Nelson, and H. Ayaz, “Mental workload classification
from spatial representation of fnirs recordings using convolutional neural
networks,” in 2019 IEEE 29th International Workshop on Machine
Learning for Signal Processing (MLSP), pp. 1–6, IEEE, 2019.
[36] N. Nazari, S. A. Mirsalari, S. Sinaei, M. E. Salehi, and M. Daneshtalab,
“Multi-level binarized lstm in eeg classification for wearable devices,” in
2020 28th Euromicro International Conference on Parallel, Distributed
and Network-Based Processing (PDP), pp. 175–181, IEEE, 2020.
[37] M. Heidari and S. Rafatirad, “Using transfer learning approach to
implement convolutional neural network to recommend airline tickets
by using online reviews,” in IEEE 2020 15th International Workshop
on Semantic and Social Media Adaptation and Personalization, SMAP
2020, 2020.
[38] “zillow.” https://www.zillow.com/.
[39] A. Panahi, S. Saeedi, and T. Arodz, “word2ket: Space-efficient word
embeddings inspired by quantum entanglement,” in International Con-
ference on Learning Representations, 2020.
[40] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors
for word representation,” in Proceedings of the 2014 Conference on
Empirical Methods in Natural Language Processing, EMNLP 2014,
October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special
Interest Group of the ACL (A. Moschitti, B. Pang, and W. Daelemans,
eds.), pp. 1532–1543, ACL, 2014.
[41] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee,
and L. Zettlemoyer, “Deep contextualized word representations,” 2018.
[42] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training
of deep bidirectional transformers for language understanding,” in
Proceedings of the 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language
Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7,
2019, Volume 1 (Long and Short Papers) (J. Burstein, C. Doran,
and T. Solorio, eds.), pp. 4171–4186, Association for Computational
Linguistics, 2019.
[43] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.
Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” CoRR,
vol. abs/1706.03762, 2017.
[44] Y. Huang, S. Lee, M. Ma, Y. Chen, Y. Yu, and Y. Chen, “Emotionx-
idea: Emotion BERT - an affectional model for conversation,” CoRR,
vol. abs/1908.06264, 2019.
[45] J. Gao, P. Pantel, M. Gamon, X. He, and L. Deng, “Modeling inter-
estingness with deep neural networks,” Tech. Rep. MSR-TR-2014-56,
October 2014.
[46] Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil, “A latent semantic
model with convolutional-pooling structure for information retrieval,” in
CIKM, November 2014.

Authorized licensed use limited to: Robert Gordon University. Downloaded on May 28,2021 at 04:12:20 UTC from IEEE Xplore. Restrictions apply.

You might also like