Professional Documents
Culture Documents
Review
a r t i c l e i n f o a b s t r a c t
Article history: Mobile application (app) websites such as Google Play and AppStore allow users to review their down-
Received 31 August 2017 loaded apps. Such reviews can be useful for app users, as they may help users make an informed deci-
Revised 15 March 2018
sion; such reviews can also be potentially useful for app developers, if they contain valuable information
Accepted 30 May 2018
concerning user needs and requirements. However, in order to unleash the value of app reviews for mo-
Available online 9 June 2018
bile app development, intelligent mining tools that can help discern relevant reviews from irrelevant
Keywords: ones must be provided. This paper surveys the state of the art in the development of such tools and
Mobile application review techniques behind them. To gain insight into the maturity of the current support mining tools, the paper
App review will also find out what app development information these tools have discovered and what challenges
App review mining, app development they are facing. The results of this survey can inform the development of more effective and intelligent
Intelligent app review mining tools app review mining techniques and tools.
Intelligent app review mining techniques
Crown Copyright © 2018 Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license.
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
1. Introduction However, not all the user reviews are relevant and useful for
app development. In order to unleash the value of app reviews
Web-based software application distribution platforms have be- for app development, intelligent mining tools that can help dis-
come increasingly popular among the Internet users. According cern relevant reviews from irrelevant ones must be provided. In
to statistica.com, the number of mobile application (app) down- recent years, a variety of such techniques have been proposed,
loads from Google Play and App Store has increased from 17 bil- ranging from sentiment analysis (Fernández-Gavilanes, Álvarez-
lion in 2013 to 80 billion in 2016. These platforms also al- López, Juncal-Martínez, Costa-Montenegro, & González-Castaño,
low users to share their opinions on their downloaded applica- 2016), spam detection (Heydari, Tavakoli, & Salim, 2016; Savage,
tions (apps). From the user perspective, such opinions can in- Zhang, Yu, Chou, & Wang, 2015), to more general mining tech-
fluence their decision on purchasing or choice of a particular niques (Castelli, Manzoni, Vanneschi, & Popovič, 2017). Yet, there
app (Heydari, ali Tavakoli, Salim, & Heydari, 2015). From the app is a lack of systematic understanding of these techniques in the
provider perspective, positive reviews can attract more customers context of mining mobile app reviews and their support tools. So
and bring financial gains. Similarly, negative reviews often cause far, we have found only two surveys that assess App Store min-
sales loss. In addition, user reviews or opinions may also contain ing techniques. The first survey (Genc-Nayebi & Abran, 2016) pro-
information relevant to app development and improvement, such vides an analysis of general data-mining techniques for spam de-
as bug reports, user experience, and user requirements. tection, opinion mining, review evaluation, and feature extraction.
The second survey (Martin, Sarro, Jia, Zhang, & Harman, 2016) is
concerned with App Store analysis, such as API analysis, feature
analysis, app review analysis, and so on. None of these surveys
∗
Correspondence author at: Schoolof Computer Science, University of Manch- cover the work on specific mining techniques or tools for app re-
ester, Manchester, M13 9PL, United Kingdom. views.
E-mail addresses: mohammadali.tavakoli@manchester.ac.uk (M. Tavakoli), To address this gap, this paper surveys the state of the art in the
liping.zhao@manchester.ac.uk (L. Zhao), atefeh.heydari@manchester.ac.uk (A. Hey-
development of mobile app review mining techniques and tools. To
dari), gnenadic@manchester.ac.uk (G. Nenadić).
https://doi.org/10.1016/j.eswa.2018.05.037
0957-4174/Crown Copyright © 2018 Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license.
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
M. Tavakoli et al. / Expert Systems With Applications 113 (2018) 186–199 187
gain insight into the maturity of the current support mining tools,
the paper will also find out what app development information
these tools have discovered and what challenges they are facing.
The results of this survey can inform the development of more ef-
fective and intelligent app review mining techniques and tools.
The rest of the paper is organized as follows: Section 2 de-
scribes our survey methodology. Section 3 presents and analy-
ses the survey results. Section 4 discusses some open issues and
challenges facing the development of app review mining tools. Fi-
nally, Section 5 discusses the validity threats to this review and
Section 6 concludes the review.
2. Survey methodology
R1: What mobile app review mining techniques have been re-
ported in the literature?
R2: What software tools have been developed to support these
techniques?
R3: What mobile app development topics are used to illustrate
the reported techniques and tools? Which topics are used
most?
The definition of a robust strategy for performing an SLR is es- Fig. 1. Phases in the selection process. N denotes the number of papers.
sential as it enables researchers to efficiently retrieve the majority
of relevant studies. In this section, we discuss our search strategy
in detail.
We developed our search string by selecting keywords from the team OR requirements team OR software vendor) OR (requirement
studies that we had already reviewed in the domain. Then, we engineering OR RE OR requirements elicitation OR requirements
identified and applied alternatives and synonyms for each term analysis)’.
and linked them all by the use of AND/OR Boolean expressions After constructing the search string, we used it to search
to cover more search results. In order to perform a widespread the following databases: ScienceDirect, IEEEXplore, ACM,
search, formulating a comprehensive query is indispensable. Thus, GoogleScholar, Scopus, and SpringerLink. The search returned
we optimized and refined our preliminary search string during a total of 46,413 results. We found that the first related paper was
multiple iterations as mulling over the revealed results and skim- published on 2011 and the last one on 2017. Table 2 summarizes
ming retrieved relevant studies helped us to manipulate our search the search results.
string effectively with more appropriate keywords. We excluded
keywords whose inclusion was not advantageous or replaced them
with more apt ones. Our search terms are presented in Table 1 and 2.3. Selecting relevant studies
the finalized search term is ‘(mobile OR android OR IOS) AND (app
OR application) AND (feedback OR review OR opinion OR com- Based on the search results, we have used the following inclu-
ment) AND (bug report OR feature request OR complain OR re- sion and exclusion criteria to select the relevant primary studies.
quirement OR issue OR expectation) AND (analysis OR process OR The selection phases are shown in Fig. 1.
mining OR extract OR discover) AND (developer OR development Inclusion criteria:
188 M. Tavakoli et al. / Expert Systems With Applications 113 (2018) 186–199
Table 1
Types and components of the search strings.
Table 2
Search results (2011–2017).
Exclusion criteria:
Fig. 2. Distribution of the selected 34 primary studies from 2011 to 2017.
E1. White and grey literature (i.e. research outputs that are not
peer reviewed, such as reports, online documents, and work-
ing papers) was excluded. Table 3
E2. Abstract papers with no full-text available were excluded. The data extraction form.
E3. Short papers with less than four pages were excluded.
Data item Description
These inclusion and exclusion criteria were applied in the fol- Paper ID The unique ID assigned to each paper
lowing three steps: Year In which year was the study published?
Author(s) The author(s) of the paper
1. E1, E2 and E3 were applied in turn to the search results to ex- Title The title of the paper
Venue Publication venue of the study
clude irrelevant studies.
Techniques Mining techniques used in the study
2. I1 and I2 were applied to each remaining study to include the Tools Support tools for extracting software development information
studies that met these criteria. Topics Software development topics discovered in the study
3. I3 was applied to duplicate studies to include the fullest stud-
ies.
3. Survey results based on their contents. In classifying user feedback into infor-
mative and non-informative, they did not justify the superiority
This section analyses the review results and answers our re- of Expectation Maximization for Naive Bayes (EMNB) to other ex-
search questions. isting algorithms. Testing the performance of other classifiers and
comparing the outcomes was a good way for justification of su-
periority of their approach. Moreover, there are several important
3.1. Mobile app review mining techniques (RQ1)
aspects neglected in their approach. Firstly, sentiment of the re-
view relates to its content and could be used to leverage the per-
RQ1: What mobile app review mining techniques have been re-
formance of the tool (Guzman & Maalej, 2014). Secondly, although
ported in the literature?
they employed time stamp, text, and rating of a review to do their
The mining techniques identified by our survey are of three
task, important features such as title of review and meta-data fea-
types: supervised machine learning, natural language processing
tures are neglected. Finally, they removed stop-words, stemmed,
(NLP) and feature extraction. This section presents each type of
and applied other pre-processing techniques on raw crawled re-
technique in detail.
views. State-of-the-art approaches (McIlroy, Ali, Khalid, & Hassan,
2016) show that these pre-processing tasks increase the chance of
3.1.1. Supervised machine learning techniques losing helpful content. Additionally, they discriminated informative
Supervised machine learning techniques had been found in 14 and uninformative reviews based on their own understanding of
(40%) of the selected studies. These techniques have been used to “informative” and reviewing few numbers of online forums, while
classify mobile app reviews. to identify what really is important for mobile app developers to
Platzer (2011) used motivational models to find usage motives extract from user reviews, their needs and requirements should be
that are addressed in the text of user reviews. To attach these studied comprehensively.
motives to the reviews, they applied SVM and NB algorithms. AR-MINER was used further by Palomba et al. (2015) to in-
Oh, Kim, Lee, Lee, and Song (2013) believed that uninformative re- vestigate how addressing user feedback by developers influences
views should be filtered out to help developers overcoming the the overall rank of the app. Their proposed tool, named CRISTAL
overload of feedback. First, they defined three categories for re- (Crowdsourcing RevIews to SupporT App evoLution), collects re-
views (i.e. functional bug, functional demand, and non-functional views posted for previous release of a certain application and
request) and manually classified 2800 reviews into the categories tracks its rating. Then it extracts informative reviews using the AR-
to train their filtering model. Identification of categories was done MINER and checks whether the comments are applied in next re-
by app developers who analysed the review contents manually, lease. Finally, it checks the effect of them on rating after the last
though details of the process, such as number of experts, number release.
of analysed reviews, and selection criteria is not provided. Then, Maalej and Nabil (2015) designed and applied different prob-
they extracted keywords of each review using Latent Dirichlet Al- abilistic techniques and heuristics to automatically classifying re-
location (LDA) and heuristic methods and applied SVM algorithms views into four basic types: Bug reports, feature requests, User ex-
to classify reviews into informative and uninformative. The authors periences, and rating. They generated a list of keywords by string
argued that the performance of LDA in identifying keywords is matching, bag of words, sentiment scores, NLP pre-processed text,
weak due to shortness of reviews in length. review rating and length, to be used for classification task. Then,
In order to identify the reasons why users dislike apps, they applied Naive Bayes, Decision Trees, and MaxEnt to com-
Fu et al. (2013) developed a system, named WisCom, analysing re- pare the performance of binary to multiclass classifiers in classi-
views in three different levels. Firstly, at the review level, they dis- fication of user feedback into the predefined basic types. The au-
covered reviews with inconsistent ratings by applying a regular- thors extended their approach in Maalej, Kurtanović, Nabil, and
ized regression model. This was to understand individual reviews Stanik (2016) by adding bigram and its combinations to utilized
and the words used in those. Secondly, at the app level, they ap- classification techniques, and by improving preprocessing phases
plied topic modelling (LDA) techniques to discover specific prob- and classification scripts. They argued that by the use of meta-
lems of each app behind the users’ complaints about that. Finally, data combined with text classification and natural language pre-
in entire app market level, they analysed top 10 complaints of each processing of the text, the classification precision rises significantly.
app to find similar complaints and the most critical aspects of apps Guzman, El-Haliby, and Bruegge (2015), relied on categories
leading to identifying global trends in the market. Their focus was found in Pagano and Maalej (2013) to form their taxonomy with
only on negative reviews as they hypothesized that those could be 7 categories relevant for software evolution. The authors then used
directly used to enhance the quality of applications. However, re- the taxonomy to investigate the performance of various machine
searchers observed that feature requests that are important feed- learning techniques (i.e. Naive Bayes, Support Vector Machines
back for developers mostly appeared in positive reviews (Iacob & (SVMs), Logistic Regression and Neural Networks) in classification
Harrison, 2013; Iacob, Veerappa, & Harrison, 2013). of reviews. The set of features they used includes number of up-
Chen, Lin, Hoi, Xiao, and Zhang (2014) developed a computa- per/lower case characters, length, positive/negative sentiment, and
tional framework for App Review Mining, namely AR-Miner which rating. However, many other features could be found in user feed-
filters useless reviews and uses topic modelling techniques to back to be used for classification purposes in order to enhance the
group informative reviews based on the topics discussed in them, throughput of the model (Tavakoli, Heydari, Ismail, & Salim, 2015).
and a review ranking scheme to prioritize them with respect to Panichella et al. (2015) argued that among the 17 topics iden-
the developers’ needs. AR-Miner uses Expectation Maximization tified by Pagano and Maalej (2013), only 8 of them were rele-
for Naive Bayes (EMNB), a semi-supervised algorithm in machine vant to software maintenance and evolution tasks. They proposed
learning to classify reviews to informative and uninformative. In a method to identify constructive feedback for software mainte-
next phase, it uses LDA for grouping, and creates a ranked group nance and evolution tasks. The authors hypothesized that under-
of reviews based on their rating, fluctuation in time of reviews, standing the intention in a review has an important role in ex-
and volume of reviews reporting a similar issue/request. The au- tracting useful information for developers. And to understand the
thors compared the performance of two algorithms in topic mod- intention of a review, they used sentences structure, sentiment of a
elling, i.e., Latent Dirichlet Allocation (LDA) and Aspect and Senti- review, and text features contained in a review. Thus, they formed
ment Unification Model (ASUM) for grouping informative reviews a taxonomy (i.e. Information Giving, Information Seeking, Feature
190 M. Tavakoli et al. / Expert Systems With Applications 113 (2018) 186–199
Table 8
Support tools for mining mobile app reviews.
MARA Based on the frequency and length of the terms in a Syntactical rules, LDA Sentences S11
sentence, the tool uses 273 syntactical rules to identify,
extract and summarize feature requests.
WisCom The tool identifies inconsistent feedback using a regression Regression models LDA Aspect Multi-level S5
model, uncovers main causes of complaints for each app and Sentiment Unification
using LDA, and aggregates top complaints of apps in a Model
category to identify market trends
AR-miner In order to filter out useless reviews. The tool uses topic Expectation Maximization for Sentences S13
modelling techniques to group informative reviews and a Naive Bayes, LDA
review ranking scheme to prioritize them.
CRISTAL The tool detects traceability links between incoming app AR-MINER Information Whole Documents S23
reviews and source code changes likely addressing them. Retrieval heuristics
It uses such links to analyse the impact of crowd reviews
on the development process.
DIVERSE For detection of conflicting opinions, the tool responds to Collocation finding algorithm, Whole Documents S17
developer’s query by grouping reviews by their Lexical sentiment analysis
mentioned features and sentiment. tool, Greedy algorithm
SURMiner To summarize reviews, the tool classifies reviews and Sentiment patterns, Max Sentences S16
extracting aspects using a pattern-based parser. Entropy
MARK The semi-automated tool assists developers in searching keyword-based approaches, Whole Documents S24
reviews for certain features. NLP techniques, Vector Space
Model
SURF Combining topic extraction and intention classification NLP classifier, WordNet Sentences S27
techniques, the tool categorizes and summarizes user synonyms
reviews.
ARdoc The tool automatically classifies useful sentences in user NLP, Sentiment analysis, Text Sentences S32
reviews according to a taxonomy. analysis, J48
of the feedback. Secondly, feedback comments of individual apps that mention a certain app feature, the tool will retrieve reviews,
are aggregated and LDA algorithm is applied to discover why users which represent the diverse user opinions concerning the app fea-
dislike these apps. The algorithm was trained using words that re- tures. The tool applies the collocation finding algorithm to extract
ceive negative weight in the regression model. Finally, to identify the app features mentioned in the reviews, uses lexical sentiment
outstanding complaints in each category of apps, most common analysis in order to excerpt the sentiments associated to the ex-
complaints from negative feedback comments of each app identi- tracted features, uses a greedy algorithm to retrieve a set of di-
fied in the previous step are aggregated on categories, summarized verse reviews in terms of the mentioned features, and group re-
and displayed to the user. views whose content and sentiment are similar.
AR-Miner. This tool analyses user feedback comments. It filters, SURMiner. This tool summarizes users’ sentiments and opinions
aggregates, prioritizes, and visualizes informative (information that on certain software aspects. By the use of Max Entropy algorithm,
can directly help developers improve their apps) reviews. Non- the tool classifies each sentence in user reviews into five categories
informative (noises and irrelevant text) reviews are filtered out (i.e. aspect evaluation, praises, feature requests, bug reports, and
applying a pre-trained classifier (i.e. Expectation Maximization for others) and filter only aspect evaluation sentences for extraction
Naive Bayes). The remaining informative reviews are then put into of aspects and corresponding opinions and sentiments. The tool
several groups using topic modelling techniques (i.e. LDA and As- uses a pattern-based parsing method to extract aspects and sen-
pect and Sentiment Unification Model) and prioritized by applica- timents. The method analyses the syntax and semantics of review
tion of a ranking model. Finally, the ranking results are visualized sentences. The resulting aspect-opinion-sentiment triplets are then
in a radar chart to help app developers spot the key feedback users clustered by mining frequent opinionated words for all aspect and
have. clustering aspect-opinion pairs with common frequent words. Fi-
CRISTAL. This tool is used for tracing informative (providing any nally, the results are visualized on graphs.
insight into specific problems experienced or features demanded MARK. This tool is a semi-automated review analysis framework
by users) reviews onto source code changes, and for monitoring that takes relevant keywords from developers as input and then re-
how these changes impact user satisfaction as measured by follow- trieves a list of reviews that match the keywords for further analy-
up ratings. AR-Miner is used first to discard non-informative re- sis. The tool has a keyword extraction component which extracts
views. A set of heuristics are used, then, extract issues and com- a set of keywords from raw reviews. These keywords could be
mits driven by each review. IR techniques are used then, to iden- clustered based on Word2Vec and using K-mean algorithm or ex-
tify possible links between each review and the issues/commits. panded based on based on the vector-based similarity of the key-
The set of links retrieved for each informative review is stored in words. Once the analyst specifies a set of keywords (via clustering
a database grouping together all links related to a certain release. or expanding), the tool will query its database and return relevant
This information is exploited by the monitoring component, which the most relevant results. MARK employs the popular tf-idf (term
creates reports for managers/developers and shows stats on the re- frequency - inverse document frequency), a standard term weight-
views that have been implemented. ing scheme to compute the element values for those vectors, and
DIVERSE. This is a feature and sentiment centric retrieval tool the Vector Space Model for this task.
for generating diverse samples of user reviews that are repre- SURF. This tool summarizes user reviews to assist developers in
sentative of the different opinions and experiences mentioned in managing huge amount of user reviews. The tool relies on a con-
the whole set of reviews. When a developer queries the reviews ceptual model for capturing user needs useful for developers per-
194 M. Tavakoli et al. / Expert Systems With Applications 113 (2018) 186–199
Table 9
Mobile app development topics discovered in the 34 selected studies.
an example of these tasks. Various proposed methods include velopers’. Previous research (Begel & Zimmermann, 2014) has
these tasks to perform more efficiently in terms of time and tried to discover what developers are highly interested in, and
computation. However, a group of approaches have argued that what the designers’ viewpoint is when mulling the reviews
using some of the preprocessing tasks causes missing valuable over (Liu, Jin, Ji, Harding, & Fung, 2013; Qi, Zhang, Jeon, &
information in user feedback (Jha & Mahmoud, 2017b). There-
fore, more comprehensive and analytical research is required to
investigate the impact of applying each pre-processing task on
performance and accuracy of the techniques.
5) Data interpretation: While various approaches and techniques
for extraction of actionable information from user feedback
have been proposed by researchers, their interpretation of ac-
tionable information is, to a certain extent, different from de-
196 M. Tavakoli et al. / Expert Systems With Applications 113 (2018) 186–199
Zhou, 2016). Although some of these factors (i.e. functionality, supporting tools for feedback mining. After that, we described top-
and app features) are used by Guzman, El-Haliby et al. (2015), ics discovered by the selected studies and identified the most pop-
and informativity of reviews from developers’ perspective was ular one. Finally, we discussed some challenges and open problems
investigated by reading some relevant forum discussions by in feedback mining, which require further research. The findings
Chen et al. (2014), many proposed approaches have made the from this review provide several implications for researchers, re-
data analysis from the authors’ point of view. Inconsistency be- quirements engineers and tool developers to gain a better under-
tween researchers’ and developers’ interpretation results in de- standing of the available studies and tools and their suitability for
velopment of inefficient methods and techniques. More precise different contexts resulting in significant improvements in devel-
study of what exactly needs to be extracted from user reviews opment of intelligent app review mining techniques and tools.
is required to make proposed approaches more efficient. Our results showed that strict quality control on discovered
topics in user feedback did not seem sufficient to ensure that the
5. Validity threats to this SLR feedback is a rich source of RE related useful information. This fact
affects investments of application vendors on user feedback min-
The primary threats to validity of results from this SLR are ing. Moreover, we discovered that the accuracy and effectiveness
concerned with comprehensiveness and coverage of the relevant of proposed techniques varies when applied on user reviews of
studies. Firstly, the search only covers publications that were pub- different types of applications indicating that the terminology and
lished before the end of February 2017. We conducted the review app features used in user reviews is impressed by the domain of
in 2017. We, therefore, used 2017 as an upper bound on the search the app. Developers and requirements engineers, therefore, should
term. Various other relevant studies will have been published since take the domain dependency issues into account while selecting
March 2017 that we have not included in this review. the target techniques for mining reviews of their apps.
A further search-related limitation of the review is that we The results of this study also show that recently published pa-
might have missed some papers in identification of relevant stud- pers have studied extraction techniques comparing to earlier ones
ies. The completeness of the search is dependent upon the search focusing on discovering discussed topics. By considering such a
criteria used and the scope of the search, and is also influenced dissemination, domain dependency of approaches, and probable
by the limitations of the search engines used (Brereton, Kitchen- newly emerged topics, researchers and practitioners might want to
ham, Budgen, Turner, & Khalil, 2007). We used a set of well- get prepared for encountering newfangled topics in the reviews.
known references to validate the search string and made necessary There are several future research directions derived from in-
amendments before undertaking the review. However, four papers sights provided by this SLR. Firstly, although many researchers
were identified by snowballing that were indexed by the digital have proposed approaches in user feedback mining, the usefulness
libraries but were not found with the search terms used in the re- and value of user feedback for RE experts is still under question
view. as there is, to our knowledge, no solid research on investigation
Aiming to cope with construct validity which is related to gen- of quality and usefulness of user feedback from RE experts’ view-
eralization of the result to the concept or theory behind the study point. So, in-depth exploratory studies are needed to address this
execution (Claes et al., 20 0 0), we defined and applied several syn- shortcoming. Secondly, we discovered that there is no user feed-
onyms for main constructs in this SLR: “requirements engineering”, back mining tool addressing all the challenges listed in Section 4.
“feedback analysis”, and “software evolution”. Development of such a tool could assist RE experts in mining user
The other validity threat could be related to selection, analysis feedbacks efficiently. Thirdly, there are other sources of user feed-
and synthesis of the extracted data being biased by the interpre- backs on mobile apps available for users and developers. Experi-
tation of the researcher. Inclusion/exclusion of studies has passed menting current techniques on them, enriches the data that will
through accurate selection, on-going internal discussion and cross- result in more interesting findings. Finally, proposing techniques
checking between the authors of the SLR. We tried to mitigate the for automatically transforming extracted user needs into require-
thread by conducting the selection process iteratively. Furthermore, ment specifications and modelling would be a great complemen-
collecting data by two extractors who are PhD candidates in the tary component of current requirements extraction tools.
field was also helpful to minimize any risk of researchers’ own
bias. Acknowledgements
If the identified literature is not externally valid, neither is the
synthesis of its content (Gasparic & Janes, 2016). To alleviate this We wish to thank the reviewers for their constructive com-
threat, we formed our search process after multiple trial searches ments, which have helped improve the paper. Ali and Atefeh are
and compromise of the authors. Unqualified papers were excluded grateful for the President Doctoral Scholarship of the University of
as well by the application of our exclusion criteria. Manchester, which enable them to carry out their PhD study.
6. Conclusion
S1 Platzer (2011) Opportunities of automated Central European Conference on Information and Intelligent
motive-based user review analysis in Systems
the context of mobile app acceptance
S2 Harman et al. (2012) App store mining and analysis: MSR for Proceedings of the 9th IEEE Working Conference on Mining
app stores Software Repositories
S3 Hoon et al. (2012) A preliminary analysis of vocabulary in Proceedings of the 24th Australian Computer-Human
mobile app user reviews Interaction Conference
S4 Vasa et al. (2012) A preliminary analysis of mobile app Proceedings of the 24th Australian Computer-Human
user reviews Interaction Conference
S5 Fu et al. (2013) Why people hate your app: Making Proceedings of the 19th ACM SIGKDD international conference
sense of user feedback in a mobile app on Knowledge discovery and data mining
store
S6 Ha and Wagner (2013) Do android users write about electric Consumer Communications and Networking Conference
sheep? examining consumer reviews in
google play
S7 Iacob et al. (2013) What are you complaining about?: a Proceedings of the 27th International BCS Human Computer
study of online reviews of mobile Interaction Conference
applications
S8 Khalid (2013) On identifying user complaints of iOS 35th International Conference on Software Engineering
apps
S9 Oh et al. (2013) Facilitating developer-user interactions CHI’13 Extended Abstracts on Human Factors in Computing
with mobile app review digests Systems
S10 Galvis Carreno and Winbladh (2013) Analysis of user comments: an Proceedings of the International Conference on Software
approach for software requirements Engineering
evolution
S11 Iacob and Harrison (2013) Retrieving and analyzing mobile apps 10th IEEE Working Conference on Mining Software Repositories
feature requests from online reviews
S12 Pagano and Maalej (2013) User feedback in the appstore: An 21st IEEE international requirements engineering conference
empirical study
S13 Chen et al. (2014) AR-miner: mining informative reviews Proceedings of the 36th ACM International Conference on
for developers from mobile app Software Engineering
marketplace
S14 Guzman and Maalej (2014) How do users like this feature? a fine 22nd IEEE international requirements engineering conference
grained sentiment analysis of app
reviews
S15 Khalid et al. (2014) What do mobile app users complain IEEE Software
about?
S16 Gu and Kim (2015) What Parts of Your Apps are Loved by 30th IEEE/ACM International Conference on Automated
Users? Software Engineering
S17 Guzman, Aly et al. (2015) Retrieving diverse opinions from app Empirical Software Engineering and Measurement (ESEM),
reviews. 2015 ACM/IEEE International Symposium on, IEEE.
S18 Guzman, El-Haliby et al. (2015) Ensemble methods for app review Automated Software Engineering
classification: An approach for software
evolution
S19 Khalid et al. (2015) Towards improving the quality of International Journal of Information Technology and Computer
mobile app reviews Science
S20 Lee et al. (2015) They’ll Know It When They See It: In Proc 21st Amer Conf. Info. Sys.-AMCIS. Puerto Rico
Analyzing Post-Release Feedback from
the Android Community
S21 Maalej and Nabil (2015) Bug report, feature request, or simply 23rd IEEE international requirements engineering conference
praise? on automatically classifying
app reviews
S22 Moghaddam (2015) Beyond sentiment analysis: mining European Conference on Information Retrieval, Springer.
defects and improvements from
customer feedback
S23 Palomba et al. (2015) User reviews matter! tracking IEEE International Conference on Software Maintenance and
crowdsourced reviews to support Evolution
evolution of successful apps
S24 Vu et al. (2015) Mining User Opinions in Mobile App 30th IEEE/ACM International Conference on Automated
Reviews: A Keyword-Based Approach Software Engineering
S25 Yang and Liang (2015) Identification and Classification of ACM International conference on Software engineering and
Requirements from App User Reviews knowledge engineering
S26 Panichella et al. (2015) How can i improve my app? classifying IEEE International Conference on Software Maintenance and
user reviews for software maintenance Evolution
and evolution
(continued on next page)
198 M. Tavakoli et al. / Expert Systems With Applications 113 (2018) 186–199
S27 Di Sorbo et al. (2016) What would users change in my app? Proceedings of the 24th ACM SIGSOFT International
summarizing app reviews for Symposium on Foundations of Software Engineering
recommending software changes
S28 Genc-Nayebi and Abran (2016) A Systematic Literature Review: Journal of Systems and Software
Opinion Mining Studies from Mobile
App Store User Reviews
S29 Maalej et al. (2016) On the automatic classification of app Journal of Requirements Engineering
reviews
S30 Martin et al. (2016) A survey of app store analysis for IEEE Transactions on Software Engineering
software engineering
S31 McIlroy et al. (2016) Analyzing and automatically labelling Empirical Software Engineering
the types of user issues that are raised
in mobile app reviews
S32 Panichella et al. (2016) ARdoc: app reviews development Proceedings of 24th ACM SIGSOFT International Symposium on
oriented classifier. Foundations of Software Engineering
S33 Jha and Mahmoud (2017b) Mining User Requirements from International Working Conference on Requirements
Application Store Reviews Using Frame Engineering: Foundation for Software Quality
Semantics
S34 Jha and Mahmoud (2017a) MARC: A Mobile Application Review International Working Conference on Requirements
Classifier Engineering: Foundation for Software Quality
References
Fernández-Gavilanes, M., Álvarez-López, T., Juncal-Martínez, J., Costa-Montene-
gro, E., & González-Castaño, F. J. (2016). Unsupervised method for sentiment
Begel, A., & Zimmermann, T. (2014). Analyze this! 145 questions for data scientists
analysis in online texts. Expert Systems with Applications, 58, 57–75.
in software engineering. In Proceedings of the 36th international conference on
Fu, B., Lin, J., Li, L., Faloutsos, C., Hong, J., & Sadeh, N. (2013). Why people hate your
software engineering (pp. 12–23). ACM.
app: Making sense of user feedback in a mobile app store. In Proceedings of
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of
the 19th ACM SIGKDD international conference on knowledge discovery and data
Machine Learning Research, 3, 993–1022.
mining (pp. 1276–1284). ACM.
Boeije, H. (2002). A purposeful approach to the constant comparative method in the
Galvis Carreno, L. V., & Winbladh, K. (2013). Analysis of user comments: An ap-
analysis of qualitative interviews. Quality & Quantity, 36, 391–409.
proach for software requirements evolution. In Proceedings of the international
Brereton, P., Kitchenham, B. A., Budgen, D., Turner, M., & Khalil, M. (2007). Lessons
conference on software engineering (pp. 582–591). IEEE Press.
from applying the systematic literature review process within the software en-
Gasparic, M., & Janes, A. (2016). What recommendation systems for software engi-
gineering domain. Journal of Systems and Software, 80, 571–583.
neering recommend: A systematic literature review. Journal of Systems and Soft-
Castelli, M., Manzoni, L., Vanneschi, L., & Popovič, A. (2017). An expert system for
ware, 113, 101–113.
extracting knowledge from customers’ reviews: The case of Amazon. com, Inc.
Genc-Nayebi, N., & Abran, A. (2016). A systematic literature review: Opinion mining
Expert Systems with Applications, 84, 117–126.
studies from mobile app store user reviews. Journal of Systems and Software, 125,
Chen, N., Lin, J., Hoi, S. C. H., Xiao, X., & Zhang, B. (2014). AR-miner: Mining infor-
207–219.
mative reviews for developers from mobile app marketplace. In Proceedings of
Gu, X., & Kim, S. (2015). “What Parts of Your Apps are Loved by Users?” (T). In Au-
the 36th international conference on software engineering (pp. 767–778). ACM.
tomated software engineering (ASE), 2015 30th IEEE/ACM international conference
Claes, W., Per, R., Martin, H., Magnus, C., Björn, R., & Wesslén, A. (20 0 0). Experi-
on (pp. 760–770). IEEE.
mentation in software engineering: An introduction Online Available:, http://books.
Guzman, E., Aly, O., & Bruegge, B. (2015). Retrieving diverse opinions from app re-
google.com/books.
views. In Empirical software engineering and measurement (ESEM), 2015 ACM/IEEE
Di Sorbo, A., Panichella, S., Alexandru, C. V., Shimagaki, J., Visaggio, C. A., Can-
international symposium on (pp. 1–10). IEEE.
fora, G., et al. (2016). What would users change in my app? Summarizing
Guzman, E., El-Haliby, M., & Bruegge, B. (2015). Ensemble methods for app review
app reviews for recommending software changes. In Proceedings of the 2016
classification: An approach for software evolution (N). In Automated software
24th ACM SIGSOFT international symposium on foundations of software engineering
(pp. 499–510). ACM.
M. Tavakoli et al. / Expert Systems With Applications 113 (2018) 186–199 199
engineering (ASE), 2015 30th IEEE/ACM international conference on (pp. 771–776). McIlroy, S., Ali, N., Khalid, H., & Hassan, A. E. (2016). Analyzing and automatically la-
IEEE. belling the types of user issues that are raised in mobile app reviews. Empirical
Guzman, E., & Maalej, W. (2014). How do users like this feature? A fine grained Software Engineering, 21, 1067–1106.
sentiment analysis of app reviews. In 2014 IEEE 22nd international requirements Moghaddam, S. (2015). Beyond sentiment analysis: Mining defects and improve-
engineering conference (RE) (pp. 153–162). IEEE. ments from customer feedback. In European conference on information retrieval
Ha, E., & Wagner, D. (2013). Do android users write about electric sheep? Examining (pp. 400–410). Springer.
consumer reviews in google play. In Consumer communications and networking Oh, J., Kim, D., Lee, U., Lee, J.-G., & Song, J. (2013). Facilitating developer-user inter-
conference (CCNC), 2013 IEEE (pp. 149–157). IEEE. actions with mobile app review digests. In CHI’13 extended abstracts on human
Harman, M., Jia, Y., & Zhang, Y. (2012). App store mining and analysis: MSR for factors in computing systems (pp. 1809–1814). ACM.
app stores. In Proceedings of the 9th IEEE working conference on mining software Pagano, D., & Maalej, W. (2013). User feedback in the appstore: An empirical study.
repositories (pp. 108–111). IEEE Press. In 21st IEEE international requirements engineering conference (RE) (pp. 125–134).
Heydari, A., ali Tavakoli, M., Salim, N., & Heydari, Z. (2015). Detection of review IEEE.
spam: A survey. Expert Systems with Applications, 42, 3634–3642. Palomba, F., Linares-Vásquez, M., Bavota, G., Oliveto, R., Di Penta, M., Poshyvanyk, D.,
Heydari, A., Tavakoli, M., & Salim, N. (2016). Detection of fake opinions using time et al. (2015). User reviews matter! tracking crowdsourced reviews to support
series. Expert Systems with Applications, 58, 83–92. evolution of successful apps. In Software maintenance and evolution (ICSME),
Hoon, L., Vasa, R., Schneider, J.-G., & Mouzakis, K. (2012). A preliminary analysis 2015 IEEE international conference on (pp. 291–300). IEEE.
of vocabulary in mobile app user reviews. In Proceedings of the 24th Australian Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C. A., Canfora, G., &
computer-human interaction conference (pp. 245–248). ACM. Gall, H. C. (2015). How can i improve my app? Classifying user reviews for soft-
Iacob, C., & Harrison, R. (2013). Retrieving and analyzing mobile apps feature re- ware maintenance and evolution. In Software maintenance and evolution (ICSME),
quests from online reviews. In Mining software repositories (MSR), 10th IEEE IEEE international conference on (pp. 281–290). IEEE.
working conference on (pp. 41–44). IEEE. Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C. A., Canfora, G., &
Iacob, C., Veerappa, V., & Harrison, R. (2013). What are you complaining about?: Gall, H. C. (2016). ARdoc: App reviews development oriented classifier. In Pro-
A study of online reviews of mobile applications. In Proceedings of the 27th in- ceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of
ternational BCS human computer interaction conference (p. 29). British Computer software engineering (pp. 1023–1027). ACM.
Society. Platzer, E. (2011). Opportunities of automated motive-based user review analysis in
Jha, N., & Mahmoud, A. (2017a). MARC: A mobile application review classifier. Inter- the context of mobile app acceptance. CECIIS-2011.
national working conference on requirements engineering: Foundation for software Qi, J., Zhang, Z., Jeon, S., & Zhou, Y. (2016). Mining customer requirements from
quality. online reviews: A product improvement perspective. Information & Management,
Jha, N., & Mahmoud, A. (2017b). Mining user requirements from application store 53, 951–963.
reviews using frame semantics. In International working conference on require- Savage, D., Zhang, X., Yu, X., Chou, P., & Wang, Q. (2015). Detection of opinion
ments engineering: Foundation for software quality (pp. 273–287). Springer. spam based on anomalous rating deviation. Expert Systems with Applications, 42,
Khalid, H. (2013). On identifying user complaints of iOS apps. In 2013 35th interna- 8650–8657.
tional conference on software engineering (ICSE) (pp. 1474–1476). IEEE. Sohail, S. S., Siddiqui, J., & Ali, R. (2016). Feature extraction and analysis of online
Khalid, H., Shihab, E., Nagappan, M., & Hassan, A. E. (2014). What do mobile app reviews for the recommendation of books using opinion mining technique. Per-
users complain about. IEEE Software, 32, 70–77. spectives in Science, 8, 754–756.
Khalid, M., Asif, M., & Shehzaib, U. (2015). Towards improving the quality of mobile Tavakoli, M., Heydari, A., Ismail, Z., & Salim, N. (2015). A Framework for review spam
app reviews. International Journal of Information Technology and Computer Science detection research. World Academy of Science, Engineering and Technology, Inter-
(IJITCS), 7, 35. national Journal of Computer, Electrical, Automation, Control and Information Engi-
Kitchenham, B., Charters, S., Budgen, D., Brereton, P., & Turner, M. (2007). Guide- neering, 10, 67–71.
lines for performing systematic literature reviews in software engineering Technical Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for
report, Ver. 2.3 EBSE Technical report. EBSE. the social web. Journal of the Association for Information Science and Technology,
Lee, C. W., Licorish, S., MacDonell, S., Patel, P., & Savarimuthu, T. (2015). They’ll know 63, 163–173.
it when they see it: Analyzing post-release feedback from the android commu- Vasa, R., Hoon, L., Mouzakis, K., & Noguchi, A. (2012). A preliminary analysis of mo-
nity. In Proc 21st Americas conference on information systems-AMCIS. Puerto Rico: bile app user reviews. In Proceedings of the 24th Australian computer-human in-
AISeL, 2015: Vol. 11 (p. 1). teraction conference (pp. 241–244). ACM.
Liu, Y., Jin, J., Ji, P., Harding, J. A., & Fung, R. Y. (2013). Identifying helpful online Vu, P. M., Nguyen, T. T., Pham, H. V., & Nguyen, T. T. (2015). Mining user opinions
reviews: A product designer’s perspective. Computer-Aided Design, 45, 180–194. in mobile app reviews: A keyword-based approach (T). In Automated software
Maalej, W., Kurtanović, Z., Nabil, H., & Stanik, C. (2016). On the automatic classifica- engineering (ASE), 2015 30th IEEE/ACM international conference on (pp. 749–759).
tion of app reviews. Requirements Engineering, 21, 311–331. IEEE.
Maalej, W., & Nabil, H. (2015). Bug report, feature request, or simply praise? On au- Yang, H., & Liang, P. (2015). Identification and classification of requirements from
tomatically classifying app reviews. In 2015 IEEE 23rd international requirements app user reviews. In ACM international conference on software engineering and
engineering conference (RE) (pp. 116–125). IEEE. knowledge engineering (SEKE) (pp. 7–12).
Martin, W., Sarro, F., Jia, Y., Zhang, Y., & Harman, M. (2016). A survey of app store
analysis for software engineering. IEEE Transactions on Software Engineering, 43,
817–847.