Professional Documents
Culture Documents
by
Lili Wei
A Thesis Submitted to
The Hong Kong University of Science and Technology
in Partial Fulfillment of the Requirements for
the Degree of Doctor of Philosophy
in Department of Computer Science and Engineering
HKUST Library
Reproduction is prohibited without the author’s prior written consent
Acknowledgments
The pursuit of the Ph.D. degree is a long journey that I can never accomplish by
myself alone. I would like to express my gratitude to all the lovely people who have
generously offered me help and support during my Ph.D. study.
Second, I would like to thank Dr. Yepang Liu. I have learned a great deal from
Dr. Yepang Liu, especially when I was junior. He has given me many practical
suggestions and helped me get through many obstacles in my research projects.
Yepang is a very careful, helpful, and responsible co-author. I will always remember
when we were catching a deadline when his baby was just born. A fresh father can
be very busy but Yepang still spent a lot of time discussing with me and helping me
revise our paper. Yepang is also a good mentor and friend. I really appreciated the
experience working with Yepang.
Third, I would like to thank Prof. Yuanyuan Zhou, with whom I spent a
memorable half year. I have learned a great deal from Prof. Zhou’s philosophy in
doing research. Prof. Zhou has always been teaching us to be good researchers and
willing to share her prolific experiences. I have seen a different environment at UCSD,
from which I benefit a lot. In addition, Prof. Zhou also gave me many suggestions
from the perspective of a female researcher. These are also valuable experiences to
me.
Fourth, I would like to thank my thesis defense committee, Prof. Kai Prof. Mary
Lou Soffa, Prof. Rhea Liem, Prof. Raymond Wong, Prof. Shuai Wang, and Prof. Kai
Liu. They have carefully reviewed my thesis, spent their valuable time attending my
defense (at the very special moment when COVID-19 broke out). They have given
iv
me many valuable suggestions to further improve the manuscript of this thesis. In
particular, Prof. Soffa joined the defense remotely with twelve-hour time difference.
I especially appreciate that she could attend the defense and give me suggestions.
There are so many people that I would like to express my gratitude to. I would
like to thank Prof. Chang Xu. Without his recommendation, I would never have
the chance to pursue my Ph.D. in CASTLE group. I would also like to thank many
of my friends who made my Ph.D. life very enjoyable. I would like to thank Dr.
Rongxin Wu, Dr. Valerio Terraggni, Dr. Ming Wen as well as other group members
of mine. I would like to thank them for all the joy they brought to me as friends and
colleagues. I will always remember the time we spent together catching deadlines,
discussing research problems, as well as chatting or making jokes. I would like to
thank Xiaonan Huang (Doctor to be). We were roommates for over three years. We
have been sharing all of our secrets, enjoyable experiences, as well as all kinds of
troubles and obstacles we experienced. Her support and encouragement has helped
me through many worrying and upset moments. I would also like to thank my friends
with whom I played Werewolve. The joy you bring to me lightened my daily life in
the last year of my Ph.D. study. At last, I would like to thank my family and my
partner, for their support for me as always.
v
Contents
Title Page i
Authorization Page ii
Acknowledgments iv
Table of Contents vi
List of Figures x
Abstract xv
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
vi
2 Background & Related Work 9
4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
vii
4.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
viii
6.1.4 Subroutine Selector: Guiding Exploration of System Function-
alities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7 Future Work 95
8 Conclusion 98
References 102
ix
List of Figures
2.2 Save For Offline Issue #49: App crashes on API 16 while working well
on API 19. (a) MainActivity of Save For Offline, (b) ViewActivity
works well on an emulator with API level 19, (c) App crashes when
starting ViewActivity on an emulator with API level 16. . . . . . . . 11
x
5.1 Patch for Camera Preview Frame Rate Issue on Nexus 4 . . . . . . . 63
6.2 An example of API Graph, its corresponding control flow graph and
code snippet. For presentation simplicity, we simplified the example
and omitted the parameters of the APIs in API Graph. . . . . . . . . 84
xi
List of Tables
xii
List of Algorithms
xiii
List of Abbreviations
xiv
Taming Fragmentation-Induced Compatibility Issues in
Android Applications
by Lili Wei
Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
Abstract
An important input required by FicFinder is the FIC issue patterns that capture
specific Android APIs as well as their associated context by which compatibility
issues can be triggered. I denote such FIC issue patterns as API-context pairs. In
the initial version of FicFinder, the API-context pairs were manually extracted
from the empirical study dataset. Manually extracting FIC issue patterns can be
expensive. In addition, API-context pairs can eventually get outdated since FIC
issues are evolving as new Android versions and devices are released. To address this
problem, I developed a novel framework, Pivot, that combines program analysis and
data mining techniques to automatically learn API-context pairs from large corpora
of popular Android apps. Specifically, Pivot takes an Android app corpus as input
and outputs a list of API-context pairs ranked by their likelihood of capturing real
FIC issues. With the learned API-context pairs, we can further transfer knowledge
learned from existing Android apps to automatically detect potential FIC issues using
FicFinder. This can significantly reduce the search space for FIC issues and benefit
Android development community. To evaluate Pivot, I measured the precision
of the learned API-context pairs and leverage them to detect previously-unknown
compatibility issues in open-source Android apps.
One fundamental limitation of Pivot is that it can only learn FIC issue patterns
that have already been identified and fixed by some existing apps. Pivot can
never identify a FIC issue if it is not fixed by any apps in the input app copora.
To address this problem, I proposed Pixor to leverage existing Android apps to
proactively expose inconsistent behaviors of different Android versions. Pixor selects
and exercises subroutines of Android apps on different Android versions to expose
inconsistent Android system behaviors. Such inconsistent behaviors can be useful to
both Android system and app developers. On the one hand, some of the inconsistent
behaviors may indicate bugs in the Android systems. Reporting them to the Android
system developers may help improve the reliability of Android systems. On the other
hand, some of the inconsistent behaviors can be intended customization of specific
Android versions. Identifying such inconsistent behaviors can help app developers
avoid FIC issues. In the evaluation, Pixor has identified both bugs and intended
system behavior changes in Android Q (Android 10, the latest version of Android
system). Three of the identified bugs have been confirmed and fixed by the Android
development team.
xvi
Chapter 1
Introduction
1.1 Motivation
Android is the most popular mobile operating system with over 80% market
share [74]. Due to its open nature, many manufacturers (e.g., Samsung and LG)
develop their mobile devices by adapting the original Android systems. While this
has led to the wide adoption of Android smartphones and tablets, it has also induced
the heavy fragmentation of the Android ecosystem. The fragmentation causes
unprecedented challenges to app developers: there are more than 10 major versions
of Android OS running on 24,000+ distinct device models [12]. It is impractical
for developers to fully test the compatibility of their apps on such a large number
of devices. In practice, developers often receive user complaints on their apps’
poor compatibility on various device models [14, 122]. However, resolving these
fragmentation-induced compatibility issues is difficult.
The search space for fragmentation-induced compatibility issues is huge and evolv-
ing. First, their search space is large, consisting of three dimensions: device models,
API levels and Android APIs. There are 24,000+ distinct device models running
10+ different API levels in the market [12, 24]. Each API level supports thousands
of APIs. Detecting such compatibility issues by checking if each combination of
device model, API level and invoked API can induce inconsistent app behaviors is
practically infeasible. Second, the search space is dynamic. It continually evolves
with the release of new device models and API levels. For example, the number of
distinct device models in 2015 was six times as many as in 2012 [82]. In addition,
there have been two or more API level upgrades each year, which are commonly
customized by Android device manufacturers and shipped with new device models.
1
Section 1.1 Lili Wei
• RQ1: (Issue type and root cause): What are the common types of FIC
issues in Android apps? What are their root causes?
2
Section 1.1 Lili Wei
• RQ4: (Issue testing): Is FIC issue testing an important step during Android
app development process? What are the common practices for FIC issue testing?
Are there any effective tools or frameworks to facilitate the testing tasks?
• RQ5: (Issue diagnosis and fixing): What are the common challenges when
diagnosing FIC issues? How do Android developers fix FIC issues in practice?
Are there any common patterns?
3
Section 1.1 Lili Wei
with one of the interviewees, I found and reported one valid FIC issue to his team.
The reported issue was quickly confirmed and fixed. These results demonstrate the
usefulness of the empirical findings and FicFinder.
In the preliminary version of FicFinder [163, 164], the API-context pairs were
manually extracted from the empirical study. However, this manual approach is
impractical because the FIC issue search space evolves quickly. It requires tremendous
efforts to manually keep track of FIC issues that continually arise from new device
models. This motivates us to study how to automatically extract the knowledge of
FIC issues, which can then be leveraged to derive API-context pairs and effectively
reduce the search space for FIC issue detection. In the empirical study, I observed
that FIC issues can be categorized into two types: non-device-specific ones and
device-specific ones [163]. FIC issues are non-device-specific if they can be triggered
on any device model running a particular Android version, which is denoted by an
integer (e.g., 27 for Android 8.1) known as an API level. FIC issues are device-
specific if they can only be triggered on certain device models running particular
API levels. Compared with non-device-specific ones, the search space for device-
specific FIC issues is much enlarged with an additional dimension of device models.
Hence, device-specific FIC issues are more difficult to detect. As a result, I focus
on learning knowledge for device-specific FIC issues. These issues are common
but challenging to resolve [163, 164]. While resources provided by Google (e.g., API
Guide [11] and emulators) can help developers locate and patch non-device-specific
FIC issues, no similar resources are provided for device-specific FIC issues. The
root causes of device-specific FIC issues often reside in the proprietary systems
customized by the device vendors. These systems are typically closed-source with
little documentation available to the public.
4
Section 1.1 Lili Wei
them. Such learned knowledge can then be leveraged to facilitate FIC issue testing
or infer rules/patterns to pinpoint FIC issues in Android apps via static analysis
such as API-context pairs and FicFinder .
Pivot extracts API-device correlations of FIC issues from a given Android app
corpus by combining program analysis and mining code repository techniques. It also
prioritizes the extracted correlations based on their likelihood to capture real FIC
issues. An outstanding challenge is to find valid API-device correlations from massive
noises. In the evaluation, the noises account for 97% of the sampled API-device
correlations extracted from popular Android app corpora. Existing techniques in API
precondition mining [151, 157, 168, 174] remove noises based on the assumption that
most API usages in code corpora are correct (i.e., mistakes are rare). However, this
assumption does not hold for APIs that can induce FIC issues due to two phenomena.
First, since FIC issues continually evolve, lots of FIC issues, especially the new
ones, are likely left undetected in Android apps. Second, common code clones and
library usages across Android apps can confuse the mining of FIC issues based on
frequency. These cloned code snippets can be duplicated in many apps [104, 135].
These two phenomena can introduce noises that greatly affect the learning accuracy
of FIC issues. To address this challenge, I devise a novel ranking strategy to prioritize
extracted API-device correlations based on the likelihood that the analyzed apps have
handled the corresponding FIC issues and the diversity of the originating occurrences
of API-device correlations.
To show the usefulness of the learned API-device correlations and the issue
archive, I conducted a case study and encoded the knowledge of the FIC issues
learned by Pivot in to API-context pairs of FicFinder to locate undetected issues
in open-source Android apps. I successfully found 10 apps that suffer from these FIC
issues. I further reproduced and reported the detected issues to the app developers.
5
Section 1.1 Lili Wei
So far, seven reported issues have been acknowledged, among which four issues have
been quickly fixed. This demonstrates that the API-device correlations learned by
Pivot can be used to facilitate FIC issue detection for Android apps.
Essentially, FIC issues are caused by those system behaviors that only exhibit
on specific device models or Android versions. For ease of presentation, we refer to
such behaviors specific to particular Android versions as platform-specific behaviors.
Pixor performs differential testing to actively expose platform-specific behaviors
of different Android system versions by leveraging code in existing Android apps
to exercise diverse functionalities. Existing Android apps intensively use Android
system functionalities via API calls. While such prolific Android system functionality
use cases are good test cases to expose platform-specific behaviors, many of them
can be equivalent in exposing platform-specific behaviors. Blindly selecting code in
Android apps to exercise Android systems can induce redundant efforts in repeatedly
exercising the same system functionalities. This is the key challenge of Pixor:
how to effectively select code in existing apps to guide the exploration of system
functionalities so that diversified system behaviors are exercised while redundant
testing efforts are minimized?
Pixor resolves this challenge by breaking down apps into subroutines and esti-
mating the exercisable system functionalities of each subroutine by constructing API
Graphs. Specifically, API Graphs models Android API use cases of app subroutines
that capture the invoked APIs and their invocation orders. Pixor guides the explo-
ration of Android system functionalities by selecting app subroutines that can cover
the most uncovered Android system API use cases.
Identifying platform-specific behaviors can benefit both Android app and system
developers. On the one hand, platform-specific behaviors are the causes of FIC issues.
Identifying platform-specific behaviors can guide Android app developers detect FIC
issues in their own app. On the other hand, some platform-specific behaviors may
indicate bugs in the Android systems. Detecting such platform-specific behaviors
can help Android system developers improve the reliability of the Android system.
6
Section 1.2 Lili Wei
1.2 Contributions
7
Section 1.3 Lili Wei
• The details of the study as well as the delivered tools of this thesis are publicly
available at the project homepages [30, 63].
8
Chapter 2
9
Section 2.2 Lili Wei
Android Apps
Application Framework
Higher-level
Inter-Process Communication Proxies System
Android System Services and Managers
Hardware Abstraction Layer Lower-level
Linux Kernel (with Hardware Drivers) System
driver implementations that are specific to device models; (2) they modify
the higher level system to meet device models’ special requirements (e.g., UI
style). Such customization induces many variations across device models (see
examples in Section 3.2.1).
Android fragmentation creates burden to app developers, who need to ensure the
apps that they develop offer compatible behavior on diverse device models, which
support multiple OS versions. This requires tremendous testing and diagnosis efforts.
In reality, it is impractical for developers to exhaustively test their apps to cover
all combinations of device models and OS versions. Hence, FIC issues often escape
testing and are frequently encountered and reported by app users [119, 125, 146].
The API level and device model variations can make Android apps exhibit different
behavior and induce FIC issues. Figure 2.2 shows an example of FIC issue my tools
detected in an open-source Android app, Save For Offline [70]. Save For Offline
is developed to save the webpages to local storage so that the saved webpages
can be viewed even if the phone is offline. This app invokes an Android API,
setMediaPlaybackRequiresUserGesture, which was introduced in API 17, in its
ViewActivity without checking the API level at runtime. The ViewActivity works
well on devices of which the API level is not lower than 17 but crashes on devices
with lower API levels when the API is not available. Figure 2.2 (a) shows the
MainActivity of Save For Offline. Clicking any saved page will lead the app to start
ViewActivity and load the saved webpage. Figure 2.2 (b) shows a screenshot of
ViewActivity that correctly displays the saved webpage on an emulator with API
level 19. On the other hand, Figure 2.2 (c) shows a screenshot of the app crashing on
an emulator with API level 16 when ViewActivity is started by clicking the same
saved page. Such inconsistent behavior is a typical example of FIC issues.
10
Section 2.2 Lili Wei
Figure 2.2: Save For Offline Issue #49: App crashes on API 16 while
working well on API 19. (a) MainActivity of Save For Offline, (b)
ViewActivity works well on an emulator with API level 19, (c) App
crashes when starting ViewActivity on an emulator with API level 16.
Android fragmentation has been a major challenge for Android app develop-
ers [108, 122]. Recent studies have explored the problem largely from three aspects.
11
Section 2.2 Lili Wei
API Release
Version Codename Market share
level Time
2.3.3 - 2.3.7 Gingerbread 10 2011 0.6%
4.0.3 - 4.0.4 Ice Cream Sandwich 15 2011 0.6%
4.1.x 16 2012 2.3%
4.2.x Jelly Bean 17 2012 3.3%
4.3 18 2013 1.0%
4.4 Kitkat 19 2013 14.5%
5.0 21 2014 6.7%
Lollipop
5.1 22 2015 21.0%
6.0 Marshmallow 23 2015 32.0%
7.0 24 2016 15.8%
Nougat
7.1 25 2016 2.0%
8.0 Oreo 26 2017 0.2%
are sensitive to device models. Fan et al. [111] performed an empirical study on
framework crashing issues in Android apps and found that the compatibility issues
induced by fragmentation commonly cause framework crashes. Many studies focus on
security problems induced by Android fragmentation. Zhou et al. [175] has leveraged
problematic file permissions of driver files to hijack sensitive functionalities of specific
device models. Zhang et al. [172] studied vulnerabilities of the memory management
system on different customized device models. Aafer et al. [99] compared security
configurations of ROMs of different device models and identified vulnerabilities.
These studies documented various functional and non-functional issues caused by
Android fragmentation, but did not aim to understand the root causes of such issues,
which are different from this thesis.
In addition, the HCI community also found that different resolutions of device
displays have brought unique challenges in Android app design and implementa-
tion [150,171]. Holzinger et al. reported their experience on building a business appli-
cation for different devices considering divergent display sizes and resolutions [121].
In the study dataset, I also found that these issues were part of FIC issues on Android
platforms and they can seriously affect user experience.
12
Section 2.2 Lili Wei
al. proposed a testing framework, DiffDroid, that can automatically identify GUI
inconsistencies of Android apps across different device models [112]. DiffDroid
compares GUI models of an app when running on different device models, but such
comparisons do not help reduce the search space of FIC issues. While these papers
proposed general frameworks to test Android apps across devices, they did not
address the problem of huge search space for FIC issue testing (Section 3.2.6).
Several recent studies aimed to reduce the search space in Android app testing by
prioritizing devices. For example, Vilkomir et al. proposed a device selection strategy
based on combinatorial features of different devices [161]. Khalid et al. conducted
a case study on Android games to prioritize device models to test by mining user
reviews on Google Play store [125]. The most recent work by Lu et al. proposed an
automatic approach to prioritizing devices by analyzing user data [146]. It is true
that by prioritizing devices, testing resources can be focused on those devices with
larger user impact. However, these studies alone are not sufficient to significantly
reduce the search space when testing FIC issues. The reason is that the combinations
of different platform versions and issue-inducing APIs are still too many to be fully
explored. In this thesis, I addressed the challenge from a different angle by studying
real-world FIC issues to gain insights at source code level (e.g., root causes and
fixing patterns). By this study, I observed common issue-inducing APIs and the
corresponding software and hardware environments that would trigger FIC issues.
Such findings can further help reduce the search space during app testing and
facilitate automated issue detection and repair.
Kyle et al. [129] proposed DexFuzz, a tool that detects defects in Android
virtual machine by fuzzing Android apps. This work tests Android virtual machines
to identify their bugs and mitigate the problem of Android fragmentation. The
effectiveness of DexFuzz heavily depends on the seed apps and the mutation
operators. However, while DexFuzz provides the ability to fuzz existing Android
apps, it does not provide any solutions on how to effectively select Android apps
for testing. Pixor, on the contrary, focuses on resolving the problem of test app
selection. In this sense, Pixor is complimentary to DexFuzz. In the future, I
plan to combine Pixor with DexFuzz to produce more effective test suites for
platform-specific behavior detection.
13
Section 2.2 Lili Wei
in Android apps [147]. Linares-Vásquez et al. studied the impact of evolving Android
APIs on user ratings and StackOverflow discussions [139, 140]. Bavota et al. further
conducted a survey to investigate developers’ experience with change-prone and fault-
prone APIs [102]. They observed that apps using change-prone and fault-prone APIs
tend to receive lower user ratings and changes of Android API have motivated app
developers to open discussion threads on Stack Overflow. These studies focused on
understanding the influence of using fast-evolving Android APIs on the development
procedure and app qualities, but did not study the concrete characteristics or fixes
of FIC issues induced by such API changes. In comparison, this thesis not only
uncovered the root causes of FIC issues but also studied concrete issues at the
application source code level so that I can provide new insights (e.g., fixing patterns)
to facilitate FIC issue detection and repair. Lastly, a recent work by Li et al. [136]
proposed a technique CiD to automatically learn patterns of compatibility issues
caused by Android framework API evolution. However, their scope is restricted to
compatibility issues induced by API accessibility inconsistencies at different API
levels, while the scope of this thesis is broader (e.g., I also studied device-specific
FIC issues). I provide a detailed comparison between CiD and FicFinder in
Section 4.3.2.
Various techniques have been proposed to mine API usage patterns from code
repositories. The majority of them aim to mine API co-occurrence relationships. PR-
Miner [138] and DynaMine [144] were proposed to mine sets of APIs that frequently
co-occur in code repositories. MAPO [168, 174] was among the first techniques
to mine frequently-used API sequences. Follow-up studies further improved the
pioneering techniques [100, 114, 162]. Most of them rely on mining techniques and
adopt light-weight static analysis without analyzing code control dependencies.
Several techniques were proposed to infer API calling preconditions via static
analysis and mining code repositories. Ramanathan et al. [155] inferred preconditions
that must hold before API invocations from code revision histories. Nguyen et
al. [151] mined API preconditions with regard to the API arguments and receivers.
14
Section 2.2 Lili Wei
These techniques target at mining general API preconditions that may not be relevant
to FIC issues. In addition, these techniques leverage traditional filtering metrics
such as confidence to identify valid API preconditions. As shown in the evaluation,
such metrics cannot effectively prioritize valid API-device correlations. In contrast,
Pivot focuses on learning API-device correlations and features a new and effective
ranking strategy to prioritize the correlations.
15
Section 2.2 Lili Wei
Selector component of Pixor. While the idea of selecting subroutines and selecting
test cases from an available test suite is similar, none of the existing work abstracts
the overall characteristics of the input test cases. In comparison, Pixor abstracts the
API use cases in the input Android apps as an API Graph. With API Graph, Pixor
can effectively select app subroutines that can exercise diverse system functionalities.
16
Chapter 3
In order to understand FIC issues and answer our five research questions, we
performed a large-scale study, which consists of two parts. First, we collected real-
world FIC issues that affected popular open-source Android apps and investigated
them down to source code level to answer RQ1–3 and RQ5. Second, we communicated
with Android practitioners of different roles and experiences via interviews and online
questionnaires to understand FIC issues from their perspective. The findings of
this part will answer RQ4 and additionally help cross validate and supplement the
findings of the first part. In the following, we present the design and setup of our
empirical study of FIC issues, practitioner interviews and survey in detail.
Table 3.1: Android Apps Used in Our Empirical Study (1M = 1,000,000)
Project
App Name Start Category Description Rating Downloads KLOC # FIC Issues
Time
CSipSimple [22] 04/2010 Communication Internet call client 4.3/5.0 1M - 5M 59.2 83
AnkiDroid [16] 07/2009 Education Flashcard app 4.5/5.0 1M - 5M 50.8 44
K-9 Mail [42] 10/2008 Communication Email client 4.2/5.0 5M - 10M 93.7 37
VLC [88] 11/2010 Media & Video Media player 4.4/5.0 10M - 50M 151.1 36
AnySoftKeyboard [18] 05/2009 Tools 3rd-party keyboard 4.4/5.0 1M - 5M 38.9 20
17
Section 3.1 Lili Wei
Dataset Collection
Step 2: identifying FIC issues. To locate the FIC issues that affected the
27 candidate apps, we searched their source code repositories for two types of code
revisions:
18
Section 3.1 Lili Wei
The keyword search returned many results, suggesting that all of our 27 candidate
apps might have suffered from various FIC issues. We selected the five apps with the
most number of code revisions that contain our searching keywords and manually
examined these revisions. Table 3.1 shows the information of the five apps. In total,
2,108 code revisions were found from the selected five apps. We carefully checked
these revisions and understood the code changes. After such checking, we collected
a set of 303 revisions, concerning 220 FIC issues. The number of revisions is larger
than the number of issues because some issues were fixed by multiple revisions.
Note that it is generally impossible to guarantee that all our collected issues have
been completely fixed. There are chances that the issues, although deemed fixed by
developers, could still occur on certain untested device models (e.g., those with very
small user bases). However, we made our best effort to locate all the patches for
each of our collected issues from the corresponding code repository. Table 3.1 (last
column) reports the number of found issues for each app subject. These 220 issues
form the dataset for our empirical study.
Data Analysis
To understand FIC issues and answer our research questions, we performed the
following tasks. First, for each issue, we (1) identified the issue-inducing APIs,
(2) recovered the links between the issue-fixing revisions and the bug reports, and
(3) collected and studied related discussions from online forums such as Stack
Overflow [76]. Second, we followed the process of open coding in the grounded theory
method, a widely-used approach for qualitative research [109], to analyze our findings
and classify each issue by its type, root cause, consequence, and fixing pattern.
19
Section 3.1 Lili Wei
Interviewees
Table 3.3 shows the basic information of our interviewees, including their compa-
nies, positions, and work experience. As we can see from the table, our interviewees
are experienced in Android development: all of them have at least two years of
experience and three of them (i.e., P1, P4, and P5) have over five years of experience.
In particular, the interviewee P4 is experienced in developing Android system frame-
works. The interviewees also have diversified background. They work for companies
of different scales, from small startups to large leading IT companies. For example,
P4 and P5 are from Alibaba1 and Tencent2 , two largest IT companies in China. P4
is a senior developer working on the Android version of the online shop Alibaba
T-Mall, which is the largest online shop in China. P5 is a test manager for the
quality assurance of WeChat, which is an app of Tencent and one of the leading
mobile messaging apps all over the globe. On the other hand, P2 is the leader of
the Android development team in a startup company, Chelaile, which provides pub-
lic transportation tracking services in China. Interviewing these practitioners with
different backgrounds allowed us to understand FIC issues from different perspectives.
Interview Design
20
Section 3.1 Lili Wei
real issues, challenges, and solutions that are not covered by our research questions.
At the end of each interview, we also shared with the interviewee our vision of
developing automated tools to help developers combat FIC issues, aiming to seek
feedback to guide the tool’s design and development.
For all interviews, we sought the interviewees’ consent to make audio recording
of the interviews for subsequent analysis. On average, each interview took around 55
minutes. The interview questions are publicly available at our project website [30].
Questionnaire Design
The questionnaire consists of two parts: background information and FIC issue
experiences. Specifically, the questionnaire collects the following information:
We collected this background information to (1) filter respondents (e.g., the survey
proceeds only when the respondents have encountered FIC issues), (2) investigate
the awareness of FIC issues among respondents, and (3) analyze survey results of
different participant groups (e.g., groups of respondents with different experience
levels, etc.).
21
Section 3.1 Lili Wei
• Challenges of FIC issue diagnosis and fix: Huge search space / Difficulties in
resetting issue-triggering device contexts / Difficulties in generating test cases
/ Difficulties in locating issue root causes / Other (fillable)
• Importance of FIC issue testing before app release: Very important / Important
/ Normal / Not important / Totally ignorable
• Platforms used to test FIC issues: Real devices / Online platforms / Emulators
/ Other (fillable)
• Used and expected tools to effectively reveal FIC issues (short answer)
We designed these questions to learn the common practices of the surveyed respon-
dents in solving FIC issues. Specifically, we investigated the common experiences
regarding the major root causes of the encountered FIC issues (RQ1).3 We also
collected information on practices that cannot be learned from the empirical study
materials (e.g., challenges in FIC issue diagnosis and fixing, practices in FIC issue
testing) and surveyed on their experiences and opinions of tools that can effectively
facilitate FIC issue diagnosis.
Participants
22
Section 3.2 Lili Wei
23
Section 3.2 Lili Wei
14.8% 0 - 1 year
24.1%
1 -3 years
24.9% 3 - 5 years
36.3% Over 5 years
We further disclosed and categorized the common root causes of FIC issues. We
identified the root causes of each FIC issue by studying their related discussions or
descriptions in the issue reports, forum posts and API documentations. Specifically,
we identified three common root causes for device-specific issues and two for non-
device-specific issues. In the following, we will discuss the issue types and issue root
causes in detail.
24
Section 3.2 Lili Wei
Table 3.4: Issue types and root causes for their occurrences
Type Root Cause Issue Example # of FIC
Issues
Problematic driver CSipSimple bug #353 57
Device-Specific implementation
Non-compliant OS Android bug #78377 39
customization
Peculiar hardware VLC bug #7943 24
configuration
Android platform API AnkiDroid pull request 88
Non-Device-Specific
evolution #130
Original Android system Android #62319 12
bugs
Device-specific FIC issues occur when invoking the same Android API exhibits
inconsistent behavior on different device models. These issues can cause serious
consequences such as crashes and lead to poor user ratings [125]. We found 120 such
issues in our dataset and identified three primary reasons for the prevalence of these
issues in Android apps: (1) problematic driver implementation, (2) non-compliant
OS customization, and (3) peculiar hardware configuration.
Typical examples we observed are the issues caused by the proximity sensor
APIs. According to the Android API guide, proximity sensors can sense the distance
between a device and its surrounding objects (e.g., users’ cheek) and the invocation
of the API getMaximumRange() will return the maximum distance the sensor can
detect [64]. In spite of such clear specifications, FIC issues often arise from the use
of proximity sensors in reality. For instance, on devices such as Samsung Galaxy S5
mini or Motorola Defy, the return value of getMaximumRange() may not indicate the
25
Section 3.2 Lili Wei
actual maximum distance the proximity sensor can detect. As a result, the proximity
between the device and other objects often gets calculated incorrectly, leading to
unexpected app behaviors [10]. CSipSimple developers filed another example in
their issue tracker (issue 353) [23]. The issue occurred on Samsung SPH-M900
whose proximity sensor API reports a value inversely proportional to the distance.
This causes the app to compute a distance in reverse to the actual distance. The
consequence is that when users hold their phones near their ears to make phone calls
using CSipSimple, the screen would be touch sensitive. So, users often accidentally
hang up their calls. More annoyingly, the phone would be touch insensitive when
the users move their phones away from their ears, making it difficult to hang off a
call. CSipSimple developers later tracked down the problem and fixed the issue by
adapting their code to specially deal with the Samsung SPH-M900 devices, as shown
in Figure 3.3.
26
Section 3.2 Lili Wei
1. Intent startSettings =
2. new Intent(android.provider.Settings.ACTION_INPUT_METHOD_SETTINGS);
3. startSettings.setFlags(Intent.FLAG_ACTIVITY_NEW_TASK);
4. + try {
5. mAppContext.startActivity(startSettings);
6. + } catch (ActivityNotFoundException notFoundEx) {
7. + showErrorMessage("Does not support input method settings");
8. + }
27
Section 3.2 Lili Wei
of hardware configuration can easily lead to FIC issues, if app developers do not
carefully deal with all hardware variants. For example, SD card has caused much
trouble in real-world apps. Android devices with no SD card, single SD card and
multiple SD cards are all available on market. Even worse, the mount points for
large internal storage and external SD cards may vary across devices. Under such
circumstances, app developers have to spend tremendous effort to ensure correct
storage management on different device models. This is a tedious job and developers
can often make mistakes. For instance, issue 7943 of VLC [89] reported a situation,
where the removable SD card was invisible on ASUS TF700T, because the SD card
was mounted to an unusual point on this device model. Besides ASUS devices, VLC
also suffered from similar issues on some Samsung and Sony devices (e.g., revisions
50c3e09 and 65e9881 [88]), whose mount points of SD cards were uncommon. To
fix these issues, VLC developers hardcoded the directory path in order to correctly
recognize and read data from SD cards on these devices. Besides SD cards, different
screen sizes and resolutions also frequently caused compatibility issues (e.g., revision
b0c9ae0 of K-9 Mail [42], revision 6b60085 of AnkiDroid [16]).
We observed 100 non-device-specific issues in our dataset. They arose mostly due
to the Android platform evolution or bugs in the original Android systems. These
issues are independent from device models and can affect a wide range of devices
running specific Android OS versions.
Android platform API evolution. Android platform and its APIs evolve
fast [137, 147]. Since its first release in 2008, Android has released 8 versions
comprising 27 different API levels [24]. The APIs are updated with such frequent
system evolutions [137]. These updates may introduce new APIs, deprecate old
ones, or modify API behavior. 88 non-device-specific issues in our dataset were
caused by such API evolutions. For instance, Figure 3.5 gives a typical example
extracted from AnkiDroid revision 7a17d3c [16], which fixed an issue caused by
API evolution. The issue occurred because the app mistakenly invoked an API
SQLiteDatabase.disableWriteAheadLogging(), which was not introduced until
API level 16. Invoking this API on devices with an API level lower than 16 could
crash the app. To fix the issue, developers put an API level check as a guarding
condition to skip the API invocation on devices with lower level APIs (Line 1).
28
Section 3.2 Lili Wei
First, overall, our observed root causes are supported by the survey results. 11
respondents selected the “Other” option, while 3 of them only selected this option
without selecting any root cases identified by our empirical study. This shows that
98.2% of our respondents agreed that at least one of our identified root causes
induced their encountered FIC issues (Figure 3.6). Eight of the 11 respondents
who selected “Other” provided elaboration on their answers. We processed their
elaborations and re-categorize them if appropriate. As a result, the root causes
specified by six of these eight respondents can be categorized into one of the existing
options. For example, one respondent filled in “Required API (is) not available on
29
Section 3.2 Lili Wei
1.8%
Respondents selecting at
least one "non-Other"
option
Respondents selecting
only "Other" option
98.2%
Non-Compliant OS Customization
Android Platform API Evolution
Original Android System Bugs
Peculiar Hardware Comfiguration
Problematic Driver Implementation
Other
0 20 40 60 80 100 120
older devices”. This answer is a typical case of “API evolution” where the respondent
intends to use an API that is unavailable on old devices with low API levels. After
such re-categorization, only two respondents specified possible root causes for FIC
issues other than the root causes we have observed (i.e., interactions with other
apps installed on the device and different versions of Android development tools).
The survey results confirm that our five categories of root causes for FIC issues are
commonly encountered by developers in practice.
30
Section 3.2 Lili Wei
considered critical by them. For example, a respondent put down “Most issues are
device-specific (e.g. Samsungs love of breaking APIs or UI), which Lint can’t know
about”. Besides the online survey, our interviewees also shared similar opinions.
On commenting the two types of FIC issues, interviewees P1 and P2 found that
non-device-specific issues are easier to address while device-specific issues are common
and challenging. P1 explained “We can use Lint and Android support libraries to
locate and help fix the non-device-specific issues but for device-specific issues, we
don’t have such supports. Most of the time, we can only guess and try (to search for
valid solutions)”. Such observation is in line with our empirical study result that both
device-specific and non-device-specific issues are pervasive while the device-specific
issues occur relatively more often than non-device-specific issues in practice due to
the difficulties of locating and fixing them.
Answer to RQ1: We observed five major root causes of FIC issues in Android
apps, which can be categorized into two general types: device-specific and non-
device-specific. The results of our survey and interviews showed that the observed
root causes are common in practice. It is challenging to ensure app compatibility
on various device models with diverse software and hardware environments.
FIC issues’ triggering contexts (e.g., the names of device models or API levels that
can trigger FIC issues) are commonly discussed in the issue reports and leveraged in
the patches for FIC issues. In our study, we investigated the issue reports and issue
patches to identify the key issue-triggering contexts. We observed that the issue-
triggering contexts can be categorized into device contexts and API-level contexts.
This is compliant with the two major sources of FIC issues: diversified device models
and API levels. We now discuss these two types of FIC issue-triggering context.
31
Section 3.2 Lili Wei
identifiers of the device models. The majority of explicit device contexts are specific
identifiers of the device models, including DEVICE (i.e., the name of the industrial
design), MODEL (i.e., the end-user-visible name for the device product), or PRODUCT
(i.e., the name of the overall device product). Other explicit device contexts are more
general like the MANUFACTURER (i.e., the manufacturer of the device/hardware) or
BRAND (i.e., the consumer-visible brand with which the device/hardware is associated)
of the devices. These identifiers are encapsulated in the android.os.Build class and
can be retrieved at runtime. Many developers leverage such context information to
patch FIC issues. For example, Figure 3.3 shows an issue where the device context is
explicitly applied in the issue patch. The triggering context of this issue is a typical
explicit device context: device models of which the PRODUCT identifier is “SPH-
M900”. Apart from explicit device contexts, implicit device contexts are indicators
for specific device properties such as special software or hardware configurations.
AnysoftKeyboard issue 289 [18] is an example, where implicit device context is
applied. The issue describes a crash when the app starts the system input method
setting on a device without this functionality as discussed in Section 3.2. This is a
typical example of implicit device context that can trigger FIC issues.
Device and API-level contexts are orthogonal. The triggering contexts of real-
world FIC issues can be combinations of these two types of contexts, e.g., some issues
can only be triggered on specific device models with specific API levels. 20 issues
in our dataset require both types of contexts to trigger. On the other hand, the
manifestation of FIC issues could also depend on specific program states such as the
argument values flowing into API calls. In this aspect, FIC issues do not differ from
other types of software defects and we do not make further discussions here.
In our dataset, device contexts and API-level contexts are widely adopted in FIC
issue patches. The patches check such contexts either explicitly or implicitly to fix
FIC issues accordingly. Our interviewees shared similar experiences. P1, P2, P3,
32
Section 3.2 Lili Wei
4.1% 15.9%
Functional Issues
Performance Issues
User Experience
80.0% Degradation
and P4 stated that they would check the two types of contexts to fix FIC issues
but sometimes it was not easy to explicitly isolate the exact contexts that could
trigger FIC issues (especially for device contexts). For example, P3 said “Yes, we
used platform-related context checkings to fix FIC issues. Sometimes, it is difficult
to enumerate all the device models that are affected by certain FIC issues so we
may use some implicit method to check the behavioral features to identify the
problematic devices.” With such observations, we further leverage the information of
issue-triggering context to model FIC issues and design detection tools. More details
are discussed in Section 4.1.
Answer to RQ2: Device and API-level contexts are two key types of triggering
contexts of FIC issues. They are orthogonal and the triggering contexts of real-
world issues can be combinations of the two types of contexts. Such context
information are widely used in FIC issue patching and can provide guidance for
FIC issue modeling and detection.
FIC issues can cause inconsistent app behavior across devices. Such behavioral
inconsistency can be functional or non-functional. We studied the inconsistent
behaviors discussed in the code comments and issue reports to investigate the
consequences of FIC issues. We discuss some common consequences below.
Figure 3.8 shows the distribution of the common consequences induced by FIC
issues. 185 (84.1%) of the 220 FIC issues are functional and performance issues.
Their consequences resemble those of conventional functional and performance issues.
The vast majority (176 / 220 = 80.0%) of them are functional issues. Performance
33
Section 3.2 Lili Wei
issues only account for a minority (9 / 220 = 4.1%) of the 220 issues found. As
aforementioned in Section 3.2.1, the functional issues can cause an app to crash (e.g.,
AnkiDroid issue 289 in Figure 3.5) or not function as expected (e.g., CSipSimple
issue 353 in Figure 3.3). The performance issues can slow down an app. For instance,
AnkiDroid introduced a simple interface specifically for low-end devices in revision
35d5275 [16] to deal with an issue that slows down the app. In addition to slowing
down an app, some performance issues can cause an app to consume an excessive
amount of memory resources. For example, K-9 Mail fixed an issue in revision
1afff1e [42] caused by an old API delete() of the SQLiteDatabase class, which does
not delete database journal files, resulting in a significant waste of storage space.
The remaining 35 (15.9%) of the 220 issues do not cause functional or performance
problems, but still seriously affect user experience. These issues are not bugs per se.
Some of them lead to UI variations on devices with different screen sizes. For example,
AnySoftKeyboard added ScrollView to enhance the UI of the app on small-screen
devices in revision 8b911f2 [18]. Other issues require support for device-specific
features. For example, in revision e3a0067, AnkiDroid developers revised the app to
use Amazon market instead of Google Play store to support app store features on
Amazon devices [16]. If Google Play store is used on Amazon devices, the app store
features cannot function properly. These issues present a major challenge to Android
app developers [12,122] and are a primary reason for bad user reviews of an app [125].
App developers often need to fix such issues by enhancing or optimizing their code to
interoperate with the diversified features of different devices to guarantee satisfactory
user experience.
In our interviews, the participants also shared with us their experiences regarding
consequences of FIC issues. All of them have encountered FIC issues arising from UI
display variations. Despite the commonness of such issues, they are mostly “tolerable”
and “subjective”. For example, P1 stated “The determination of UI problems is
subjective. For some devices, it is not possible to implement exactly the same user
interface as the designs. These interface differences are not necessarily FIC issues
if they do not cause serious problems”. P4 held similar opinions: “If UI display
problems do not affect the major functionalities, we will consider it tolerable and may
not fix it immediately”. Functional problems are also common and the interviewees
especially care about crashing issues. P1, P2 and P3 stated that they use crash
reporting systems to collect crash information (including device model information
and API levels) from end users for crash diagnosis. They observed crashes that only
occurred on certain device models. P2 stated “We will fix these crashes if they affect
34
Section 3.2 Lili Wei
Answer to RQ3: FIC issues can cause both functional and non-functional
consequences such as app crashing, app not functioning, performance, and user
experience degradation. In practice, Android app developers pay most attention
to FIC issues causing functional problems, especially app crashes.
35
Section 3.2 Lili Wei
3.1%
Very Important
22.1% 33.1% Important
Normal
Not Important
41.7%
Ignorable
Test platforms. Physical Android devices and emulators are widely used for
Android app testing. Existing commercial online platforms also provide cloud-
based services for app developers to test their apps with remote Android devices.
We surveyed what testing platforms Android developers use to test FIC issues.
Figure 3.10 shows the distribution of commonly used testing platforms. The vast
majority of the respondents (152 out of 163 respondents) use “real devices” to test
FIC issues and a considerable number (78) of the respondents use emulators to test
FIC issues. Although testing on real devices is preferred, acquiring an adequate
coverage of such devices for testing is expensive. Android device emulators, which
support configurable API levels and screen sizes, provide a nice alternative for testing
non-device-specific and UI display issues. A respondent commented that emulators
“cannot reveal problems caused by interactions between hardware and software”. They
cannot emulate the actual Android systems on real devices to test device-specific
issues. 32 (19.6%) of 163 respondents have used online testing platforms. Although
online testing platforms provide a good alternative to access various device models,
they have not yet been widely adopted. We designed a follow-up short answer
question for those respondents who have used online testing platforms to name the
specific online platforms they used and to identify the advantages and disadvantages
of these platforms. In their answers, they commented that they used online platforms
mainly from the following three aspects: (1) high testing costs, (2) limited number
of supported device models, and (3) limited number of supported testing tools (e.g.,
Android debug bridge). This information could guide the online testing providers to
further improve their products and services. The remaining six respondents selected
the “other” choice and specified that they also use their own built test farms or
crowd-based testing platforms to test FIC issues.
36
Section 3.2 Lili Wei
Real Devices
Emulators
Online Testing Platforms
Other
0 20 40 60 80 100 120 140 160
Testing frameworks and tools. Various testing frameworks and tools have
been designed to facilitate app testing. For example, Monkey is a tool maintained
by Google that randomly generates user event sequences and automatically exercise
Android apps [87]. We are interested in learning if there are (1) tools that the
respondents commonly used for testing (general purpose), (2) tools that they found
effective for FIC issue testing, and (3) other non-testing tools that they used for
finding FIC issues.
Figure 3.11 shows the common frameworks/tools used by our respondents for
app testing. The most commonly used tools are Monkey and AndroidJUnitRunner.
In our survey and interviews, none of our respondents or interviewees named an
effective tool for FIC issue testing and one respondent indicated that he or she
manually tested FIC issues. Existing tools are primarily designed for general testing
purposes of Android apps and cannot resolve unique challenges of FIC issue testing
such as the huge search space discussed in Section 3.2.5. In our interviews, all of our
interviewees claimed that their apps are mainly manually tested on different devices
owned by the company for finding FIC issues. P4 commented that his company
has a dedicated team for improving app compatibility by manually testing the apps
on diversified device models. Among our survey respondents, some respondents
mentioned that Lint can help expose some non-device-specific issues, but it often
generates a large amount of false positives and is not helpful in finding device-specific
issues as discussed in Section 3.2.1. This suggests a lack of effective testing support
for FIC issues that can resolve the unique challenges of FIC issue testing. Future
studies can focus on designing specific testing techniques that facilitate FIC issue
testing by resolving the challenges discussed in the next section.
37
Section 3.2 Lili Wei
Monkey
AndroidJUnitRunner
Espresso
UI Automator
Appium
Robotium
Other
0 10 20 30 40 50 60 70 80
To answer RQ5, we further performed the following tasks: (1) we studied the
challenges developers encountered when diagnosing and fixing FIC issues, (2) we
interviewed developers on how they resolve FIC issues, (3) we analyzed the patches
applied to the 220 issues in our dataset, and (4) we analyzed the discussions in the
issue reports that we have successfully associated with our collected FIC issues. We
aim to identify (1) the major challenges encountered by developers when diagnosing
and fixing FIC issues, (2) how app developers debug and fix FIC issues, (3) whether
FIC issue patches exhibit common patterns, and (4) how complicated the FIC issue
patches are.
Challenges. Figure 3.12 shows the challenges that our survey respondents
encountered in their experiences when resolving FIC issues. A respondent may put
down more than one challenge during the online survey. Here, we discuss the major
challenges that received votes from over half of our respondents.
The top one ranked challenge is the huge search space: 104 of our respondents
considered the huge search space as a challenge that complicates FIC issue diagnosis
and fixing. The search space for Android FIC issues not only involves numerous
components and used APIs of Android apps but also includes runtime device platforms
as FIC issues can only be triggered on particular device platforms. The search space
for device platforms can be a combination of device models and API levels. Android
developers need to make tremendous efforts to exhibit, debug, and patch FIC issues
within such a huge search space.
38
Section 3.2 Lili Wei
Another prominent challenge is to locate the root causes of FIC issues, which often
reside at the system or device APIs instead of the application logic. Thus, locating
such root causes likely requires close examination of the software and hardware
environments of the concerned Android devices. However, device firmwares and
implementation details of hardware drivers are mostly inaccessible to developers. In
our empirical study, we also observed cases in which app developers could not find a
valid patch and gave up fixing the issues. For example, issue 1491 of CSipSimple [23]
was caused by the buggy call log app on an HTC device. However, since the
implementation details are not accessible, CSipSimple developers failed to figure out
the root cause of the issue and hence left the issue unfixed. Below is their comment
after multiple users reported similar issues (issue 2436 [23]):
“HTC has closed source this contact/call log app which makes things almost impossible
to debug for me.”
Identifying valid patches. When the root causes occur at the operating system
or device driver levels, developers tend to call their patches “workarounds” rather
than “fixes”. This is because they mostly resolved the issues by finding a way to work
around the problems, making their apps compatible with the problematic devices.
39
Section 3.2 Lili Wei
Root Cause
Test Case
Other
0 20 40 60 80 100 120
In the situations where the root causes cannot be located, app developers derived
workarounds by trial and error. In many cases, they leveraged information collected
by crash reporting systems and asked volunteers to help test different workarounds
or sought advice from other developers whose apps suffered from similar issues.
Our interviewees indicated that they usually fix FIC issues based on guesses and
experience. For crashing issues, they leveraged the stack traces collected from crash
reporting systems for diagnosis. Patches of such issues are often validated by checking
if the patch deployment leads to a reduction of crashes. Another information source
for FIC issue diagnosis and fixing is end users’ feedback. All of our interviewees
agreed that user feedback can help them diagnose and fix FIC issues. P3 said “User
feedback is a very important information source for problems of our app. We have a
forum where users can provide all kinds of feedback for products in our company.
Many real problems are actually identified by our users”. P1 also agreed that user
feedback can facilitate FIC issue diagnosis and fixing, but he also indicated that the
quality or usefulness of the feedback strongly depend on the background of the users.
He gave us an example: “One of our users who has reported an issue is a developer
himself so he can provide very important debugging information for us. However,
for normal users, they may just complain on the problems but cannot provide any
useful information”.
40
Section 3.2 Lili Wei
• The most common pattern (154 out of 220 patches) is to check explicit device
context or API level (e.g., manufacturer, SDK version) before invoking the
APIs that can cause FIC issues on certain devices. One typical way to avoid
the issues is to skip the API invocations (see Figure 3.5 for an example) or
to replace them with an alternative implementation on problematic devices.
This can help fix issues caused by problematic driver implementations, Android
platform API evolutions as well as Android system bugs.
• Another common strategy (14 out of 220 issues) is to check the availability
of certain software or hardware components on devices (i.e., implicit device
context) before invoking related APIs and methods. This is a common patching
strategy for FIC issues caused by OS customizations and peculiar hardware
compositions on some device models.
The patches are usually simple and small in size (comprising several lines of
code). Many patches only require adding a condition that checks device information
before invoking issue-inducing APIs. Similarly, checking the presence of certain
software/hardware components usually only requires adding a try-catch block or a
null pointer checking statement. Two-thirds of the 220 issues in our dataset were
patched by these two kinds of patches.
Answer to RQ5: FIC issue diagnosis and fixing are challenged by the huge
search space, complications in setting up specific software/hardware environments
for reproducing issues as well as the difficulties of locating the root causes. The
majority of FIC issue fixes follow a simple pattern: performing careful checking
of device contexts or API levels before invoking issue-inducing APIs.
In this section, we discuss two possible applications of our study findings and
illustrate how we can help developers combat FIC issues.
41
Section 3.2 Lili Wei
Automated detection and repair of FIC issues. By studying the root causes
and patches of the FIC issues, we observed that FIC issues recur (e.g., Android
issue 78377 discussed in Section 3.2.1) in multiple apps and share similar fixes. It
suggests the possibility to generalize the observations made by studying the 220 FIC
issues to automatically detect and repair FIC issues in other apps. As discussed in
Section 3.2.5, some app developers fixed the FIC issues in their apps by referencing
the patches of similar issues in other apps. This indicates the transferability of such
knowledge. Furthermore, we invited our respondents to share their own ideas on
possible tools that can be useful for FIC issues in our survey. Some respondents
proposed similar ideas on FIC issue documentation and detection. For example,
two respondents hoped to have public platforms or databases to archive known
FIC issues from the field and one respondent further suggested that a static code
analysis tool built based on the database would also be helpful. This motivates us
to design automated FIC issue detection tool to generalize the knowledge obtained
in our empirical study. To demonstrate the feasibility, we manually extracted 20
concrete issue patterns from our dataset and designed a static analysis tool to check
for potential FIC issues in Android apps by pattern matching. In Section 4.1, we will
present how to encode the issue patterns and the algorithm for issue detection. We
will also show by experiments that such a tool can help detect real and unreported
FIC issues in many real-world Android apps in Section 4.2.
42
Chapter 4
With our empirical findings, we propose a static code analysis technique FicFinder
to detect FIC issues in Android apps automatically. The detection is based on a
model that captures issue-inducing APIs and the issue-triggering context of each
pattern of FIC issues. FicFinder takes an Android app’s Java bytecode or .apk
file as input, performs static analysis, and outputs its detected compatibility issues.
In the following, we first explain the modeling of compatibility issue patterns and
then describe the detection algorithm of FicFinder.
From the root causes and fixes of FIC issues, we observed that many issues are
triggered by the improper use of an Android API, which we call issue-inducing API, in
a problematic software/hardware environment, which we call issue-triggering context.
This motivates us to encode compatibility issue patterns into pairs of issue-inducing
API and issue-triggering context, or API-Context pairs in short. We observed that
most issue-triggering contexts can be expressed in the following context-free grammar.
43
Section 4.1 Lili Wei
Let us illustrate this using an example extracted from two issues that affected
AnkiDroid (discussed in pull request 128 and 130 [16]).
API: SQLiteDatabase.disableWriteAheadLogging(),
Context: API level < 16 ∧ Dev model != “Nook HD”
This simple example models the issue that invoking the API SQLiteDatabase.dis-
ableWriteAheadLogging causes the app to crash on devices other than “Nook HD”
running Android systems with an API level lower than 16 [16]. In this example,
API level < 16 is an API-level condition and Dev model != “Nook HD” is a device
condition. Such an API-context pair can be extracted from apps that have fixed the
compatibility issue, and used for detecting its occurrences in other apps.
44
Section 4.1 Lili Wei
The algorithm works as follows. For each API-context pair acP air, it first searches
for the call sites of the issue-inducing API of concern (Line 2). For each call site
callsite, it then performs two checks (Line 4): (1) it checks whether the usage of API
matches the problematic usage defined in acP air, if there is a condition regarding
API usage, (2) it checks whether the supported SDK version configuration of the
app matches the problematic configuration defined in acP air, if there is a condition
regarding API level. The checks for API usage are specific to the issue-triggering
context CTX defined in each acP air and need to be specifically implemented. SDK
version configuration are checked against the minimum SDK version (or API level)
that is supported by the app of concern. For instance, for an API requiring a lowest
SDK version, the algorithm will check the minimum SDK version specified in the
app’s AndroidManifest.xml file, a mandatory configuration file of an Android app.
If it is newer than the required SDK version, invocations of this API will not be
regarded as issues. By these two checks, FicFinder can filter out call sites that
do not match the issue-triggering context at early stage. If both checks pass, the
algorithm proceeds to check device or API-level related conditions defined in acP air
in the API calling contexts. These checks require analyzing the statements that the
issue-inducing API invocation statement transitively depends on. For this purpose,
45
Section 4.1 Lili Wei
46
Section 4.2 Lili Wei
FIC issues. Fortunately, this limitation does not significantly affect FicFinder’s
performance as it is uncommon to have an invocation of issue-inducing API and its
context checkings implemented in different handlers. Our evaluation results show
that FicFinder reports few false warnings.
4.2 Evaluation
• RQ6: (Issue detection effectiveness): Can FicFinder, which is built with the
20 API-context pairs extracted from our collected FIC issues, help detect unknown
FIC issues in real-world Android apps?
• RQ7: (Usefulness of FicFinder): How do app developers view the FIC issues
detected by FicFinder? Can FicFinder provide useful information for app
developers to facilitate the FIC issue diagnosis and fixing process?
1. We first crawled from F-Droid all the 1,258 apps, of which the source code
repositories are hosted on GitHub.
2. Among the 1,258 apps, we then selected all the 750 apps that invoke at least
one of the issue-inducing APIs in our API-context pair list by scanning the
.apk files crawled from F-Droid. We performed this step as FicFinder can
only report FIC issues with regard to our known API-context pairs. Inclusion
of subjects that do not even invoke the APIs of concern will not return any
results and is not meaningful for our FicFinder evaluation.
47
Section 4.2 Lili Wei
48
Section 4.2 Lili Wei
4. As we need to manually verify the experiment results for each app, which
is labor-intensive, we further filtered the apps by their categories. We first
identified the Google Play store page for each app and collected the app’s
category information to classify the 96 apps into different categories. If a
category contains no more than five apps, we selected all the apps in the
category as our experimental subjects. If a category contains more than five
apps, we only selected five of them as experimental subjects based on the
number of stars their projects received on GitHub. To select representative
subjects, we sorted the apps in the category according to the number of stars
and selected the first app (i.e., the one that received the largest number of
stars), the first quartile, the median, the third quartile and the last app in the
sorted list. For apps that failed to identify their corresponding Google Play
store page, we grouped them into one special category for subject selection.
49
Section 4.2 Lili Wei
induce issues on device models running API levels lower than 15. According to the
statistics released by Google [24], such devices rarely exist in practice now. As a
result, these API-context pairs were removed and replaced by new ones derived from
newly collected FIC issues.
50
Section 4.2 Lili Wei
issues can be easily identified, it is more likely that they have already been fixed
by the developers, and thus fewer issues could be detected by the corresponding
API-context pairs.
Besides the warnings, FicFinder also found 78 good practices in 27 out of the
53 apps analyzed, covering 10 distinct API-context pairs. This confirms that FIC
issues are common in Android apps and developers often fix such issues to improve
the compatibility of their apps.
One may also observe from Table 4.1 that there are several apps, for which
FicFinder did not report any TP, FP or GP. As discussed earlier, in the experiments,
we considered the APIs in the 20 API-context pairs as issue-inducing and scanned
the .apk files to make sure that the concerned APIs are used in our app subjects.
Therefore, it is expected that FicFinder would report either warnings or good
practices. However, it is not the case for several apps. We looked into this unexpected
phenomenon and found out the reasons. First, the .apk files on F-Droid are release
versions of our app subjects, which sometimes are old snapshots of the apps, but our
experiments were conducted on the latest development versions of the apps available
51
Section 4.2 Lili Wei
52
Section 4.2 Lili Wei
We applied Lint to the 53 app subjects and manually analyzed the warnings
generated by the checkers that are relevant to the APIs in our 20 API-context
pairs. We categorized the warnings them as true positives (TPs) and false positives
(FPs), leveraging the same criteria that we adopted when inspecting those warnings
generated by FicFinder. Column “Lint Results” in Table 4.1 shows the evaluation
results of Lint. In total, Lint reported 11 warnings, four of which are true positives,
i.e., the precision is 36.4%. In comparison, FicFinder reported 129 true positives
and achieved 91.5% precision. All the four true positives reported by Lint were also
reported by FicFinder, but Lint missed 125 true positives reported by FicFinder,
among which, 100 have already been acknowledged or fixed by the app developers.
This shows that FicFinder is more effective at detecting FIC issues than Lint.
We further inspected the true positives missed by Lint and figured out the major
reason. While Lint supports detecting FIC issues caused by the use of deprecated
and removed APIs, it does not support detecting subtle issues that are induced by the
improper use of those APIs whose behaviors have significantly evolved. For example,
Forecastie issue #246 was induced by the API AsyncTask.execute, whose behavior
changed at API level 11 (the tasks scheduled via this API will run sequentially on a
single background thread rather than a pool of threads on devices with API level 11
or higher). Lint could not detect such common issues. We also studied the false
positives reported by Lint and found the main reason of Lint’s low precision. The
1
To the best of our knowledge, Lint is the only widely-used static analysis tool that
supports compatibility checking for Android apps.
53
Section 4.2 Lili Wei
false positives are reported at the call sites of the issue-inducing APIs that have
already been properly guarded by conditional statements, which prevents the APIs’
invocations at certain API levels. In all these cases, the guarding condition and
the API call sites are located in different methods. Identifying the dependencies
between the call sites and the context checks requires inter-procedural analysis that
is capable of handling long chains of method calls. Lint, which aims to provide
real-time feedback to developers, could not capture such complex inter-procedural
dependencies and therefore reported false positives. These findings here may explain
why some of our survey respondents mentioned that Lint can only help expose
certain non-device-specific issues and often generates a large number of false positives
(see the “testing frameworks and tools” part of Section 4.4).
While FicFinder can detect more FIC issues induced by the APIs in our API-
context pair list and achieved high precision, it could miss valid FIC issues that
can be detected by Lint since its current version only supports a small number
of API-context pairs manually extracted from our empirical study. This can be
improved by automating the API-context pair extraction process and incorporating
Lint’s checking rules, which we plan to explore in our future work.
Answer to RQ6: The API-context pairs extracted from existing FIC issues can
help FicFinder effectively detect unknown FIC issues in real-world Android
apps. FicFinder achieved high precision and acceptable recall, and outperforms
the widely-used tool Lint.
54
Section 4.2 Lili Wei
reports were quickly fixed by app developers (marked with “b ” in Table 4.1) and (2)
the warnings reported in the remaining ten reports will be fixed according to the
developers’ feedback or have already been assigned to certain developers (marked with
“a ” in Table 4.1). Other than the 22 acknowledged ones, eight of our 39 submitted
issue reports are still pending. There are also nine issue reports that were closed or
labeled as “won’t fix” by developers because the reported issues do not cause serious
consequences to the apps (marked with “c ” in Table 4.1).
Besides generating warnings, our tool can also provide information including
issue related discussions, API guides, and possible fixes to help developers diagnose
and fix FIC issues. Such information was all collected from our empirical study.
In the output of FicFinder, it provides links to online discussions or API guides.
FicFinder can also suggest possible fixes because the patches for FIC issues are
small and demonstrate common patterns. As discussed in Section 4.5, the majority
of our studied issues were patched by explicitly checking device context or API
levels before invoking issue-inducing APIs. The patches suggested by FicFinder
also follow this pattern. To study whether such provided information is helpful, we
included them in our submitted issue reports. We learnt developers’ opinions on FIC
issues by communicating with them and made several interesting findings.
First, for the warnings mentioned in 12 issue reports, developers adopted our
suggested fixes. Developers of ownCloud [61] and Stocks Widget [78] asked for pull
requests and have merged our submitted ones after careful code reviews. Other
developers fixed the issues as we suggested. For example, in Twidere #974, we
reported an issue caused by invoking an API defined in the class CookieSyncManager,
which was designed to synchronize the browser cookie store between RAM and
permanent storage. The class has been deprecated since API level 21 and another
class, CookieManager, can provide full functionality to manage the cookies since
then. We suggested Twidere developers that they should avoid invoking this API
from API level 21. Figure 4.1 shows the patch adopted by the app developers for
this issue. The issue was fixed by adding a guarding condition to check the API level
55
Section 4.3 Lili Wei
of the running device and only invoking the API when the API level is lower than 21.
Such results show that the issue diagnosis information and fix suggestions provided
by FicFinder are useful to developers. They also indicate that the knowledge and
developers’ practices learned from our studied FIC issues can be transferable across
apps.
Second, while our issue reports facilitate developers to identify potential FIC
issues in their apps, whether to fix such issues depends on the apps’ functionality
and purpose. For issues mentioned in nine of our issue reports, developers chose not
to fix them. For instance, we reported to K-9 Mail developers that the behavior of
the AlarmManager.set() API has changed since API level 19: using this API on
devices with a higher API level does not guarantee the scheduled task to be executed
exactly at the specified time. However, the developers considered that such API
behavior change would not significantly affect their app’s main functionality (i.e.,
their scheduled tasks do not necessarily need to be executed precisely at a specific
time) and did not proceed to fix the issues, but linked our issue report to two existing
unresolved bugs. Other issues that are closed without fixes are due to similar reasons.
On the other hand, there are also apps whose functionality could be affected by this
issue. The app Stocks Widget, which is a home widget that displays stock price
quotes, is a typical example. AlarmManager is used to schedule timely updates for
stock price. The developers confirmed our issue report and quickly fixed the reported
issue. From the examples and comparison, we can see that it is difficult to determine
whether a FIC issue can affect the app’s functionality or not by static analysis alone.
To address this problem, we plan to leverage dynamic analysis to further validate
the impact of FIC issues in our future study.
4.3 Discussions
56
Section 4.3 Lili Wei
of our findings may not generalize to commercial apps. We mitigated this threat
by conducting surveys and interviews with developers of commercial Android apps.
As discussed in Section 3.2, the survey and interview findings are consistent with
our empirical study findings. In addition, our proposed tool FicFinder can also
be applied to commercial apps. Our interviewees and survey respondents showed
interest in the tool. The tool also helped one of our interviewees find a real issue in
the product of his team during our interview.
Subject selection. The validity of our empirical study results may be subject
to the threat that we only selected five open-source Android apps for the study.
However, these five apps are selected from 27 candidate apps as they contain most
FIC issues. The apps are also diverse, covering four different categories. More
importantly, we observed that the findings obtained from studying the 220 real FIC
issues in them are useful and helped locate previously-unknown FIC issues in many
other apps.
FIC issue selection. Our FIC issue selection strategy may also pose threats to
the validity of our empirical study results. We understand that using our current
strategy, we could miss some FIC issues in dataset preparation. For example, our
keywords do not contain less popular Android device brands. To reduce the threat,
we made effort to find the code revisions whose diff contains code that checks the
device information encapsulated in the android.os.Build class. This indeed helped
us find issues that happened to many small brand devices (e.g., Wiko and Nook).
On the other hand, some existing empirical studies also selected issues by keyword
searching in the issue tracking system of open-source apps [119, 143]. However, such
strategies are inapplicable to this work due to two reasons. First, FIC issues usually
do not have specific labels, and thus are mixed with various other issues. Second,
as app developers often suggest users to provide device information when reporting
bugs, simply searching device related keywords can return many results that are
irrelevant to FIC issues.
57
Section 4.3 Lili Wei
In future, we plan to automate the API-context pair extraction process and improve
the detection capability of FicFinder. With more API-context pairs, we can then
conduct larger-scale studies to evaluate FicFinder’s recall.
Errors in manual checking. The last threat to the validity of our findings is
the errors in our manual checking of FIC issues. We understand that such a process
is subject to mistakes. To reduce the threat, we followed the widely-adopted open
coding approach [109] and cross-validated all results for consistency. We also release
our dataset and results for public access [30].
58
Section 4.3 Lili Wei
CiD does not support detecting device-specific FIC issues such as those caused
by problematic driver implementation or non-compliant OS customization. This is
because that such issues often arise from closed-source system components customized
by device manufacturers while CiD requires Android framework code as input for
learning FIC issue patterns. However, as we have pointed out in our study, device-
specific FIC issues widely exist in real-world Android apps. To the best of our
knowledge, our work is the first systematic study of such device-specific issues.
59
Section 4.3 Lili Wei
In our empirical study, we observed that FIC issues can be triggered on a variety
of Android device models. All kinds of device models can exhibit their specific
behaviors and trigger compatibility issues despite their brands or popularity. We
also observed that Android app developers tend to complain more about FIC issues
on popular Android device models. On the one hand, such device models may be
deeply customized. On the other hand, such popular device models are used by more
users. FIC issues that can be triggered on such device models are more likely to be
identified and complained about by app users.
60
Chapter 5
The FicFinder work [163] found that FIC issues demonstrate patterns: they
often arise when certain APIs are invoked on certain device models. We observed
that although FicFinder successfully detected previously-unknown FIC issues, its
major limitation is that the patterns required for FIC issue detection were manually
extracted from an empirical study. As the Android ecosystem evolves quickly, the
applicability of these once effective issue patterns gradually diminishes. For example,
among the 25 issue patterns used by FicFinder, 12 are now outdated because the
device models that can trigger the issues have faded out from the market.
61
Section 5.1 Lili Wei
provided by the Android SDK. Such conditional statements characterize the existence
of code snippets that handle FIC issues. For illustration, we show a code snippet
that handles a low frame rate issue in camera preview on Nexus 4 in Figure 5.1.
When invoking the camera API on Nexus 4 to take photos, the default frame rate
for photo preview is low (6–10 frames per second), making the preview choppy. The
patch applies a workaround specifically for Nexus 4 by setting the recording mode
hint to true. At Line 4, the code checks the app’s runtime environment. If the device
model is Nexus 4, the workaround is applied (Lines 4–6). From this example, we
observe that the conditional statement (Line 4) encapsulates the information of the
device model (Nexus 4) that can trigger the FIC issue. The API call (Line 5) that
is dependent on the conditional statement demonstrates a strong correlation with
the FIC issue. By identifying such conditional statements and API calls that are
dependent on the conditions, we can recover API-device correlations to characterize
FIC issues.
Currently, Pivot does not include API levels in the extracted API-device corre-
lations. This is because we found that it is uncommon (16.7%) to check both device
models and API levels in the code snippets that handle FIC issues according to the
dataset published by Wei et al. [163]. The underlying reason may be that many
devices’ operating system rarely gets major updates after the devices are shipped. As
a result, most FIC issues can be patched without checking API levels. Therefore, we
62
Section 5.1 Lili Wei
include only the information of APIs and device models in our model of API-device
correlations.
API-device correlations can help reduce the search space of FIC issues and save
testing efforts. For example, the API-device correlation extracted from Figure 5.1
suggests the existence of a FIC issue when using the camera of a Nexus 4. With such
information, developers can focus on testing the app components that use cameras on
Nexus 4. This can help trigger the FIC issue quickly. Otherwise, triggering the issue
would require developers to extensively test their apps on a huge number of different
devices. Let us analyze the reduction of testing efforts. Assume that developers
carefully test their apps using Amazon Device Farm [25], a widely-used online testing
platform providing over 200 device models including Nexus 4. According to an
existing study [145], a popular Android app contains 50 entry methods on average.
Then, app developers may need to test 10,000 (200 × 50) combinations of entry
methods and device models to trigger the issue. Such testing efforts are unaffordable
for most development teams. As a result, FIC issues would likely be left undetected in
the released apps. Comparatively, knowing the correlation between Android camera
APIs and Nexus 4, developers can focus on testing a few entry methods that involve
camera APIs on Nexus 4 to expose potential FIC issue before releasing their apps.
API-device correlations can also be analyzed to derive rules to automate FIC issue
detection and patching. Let us consider the example in Figure 5.2 again. From the
API-device correlation hParameters.setRecordingHint(boolean), “Nexus 4 ”i, we can
infer that Android camera APIs may induce FIC issues on “Nexus 4”. We can also
infer that a possible patch is to invoke setRecordingHint and set the flag to true.
We can further validate the observations by checking relevant resources online. For
63
Section 5.1 Lili Wei
example, by a search on Google, we can find concrete evidence (e.g., [21,40]) showing
that camera APIs indeed can cause FIC issues on Nexus 4. Via such analyses, we
can build a knowledge base of FIC issues containing their patterns and high-quality
patches. The issue patterns can be used as inputs to compatibility analysis tools such
as FicFinder [163]. The issue patches can serve as templates to help developers
fix FIC issues and support future research on automatically repairing compatibility
issues. To show the feasibility of this application scenario, in our evaluation, we
derived five FIC issue patterns based on learned API-device correlations. With these
issue patterns, FicFinder detected 39 previously-unknown FIC issues in popular
open-source Android apps. For instance, with the Nexus 4 camera issue pattern,
FicFinder detected three new issues. The developers also quickly fixed these three
issues according to our suggested patch (i.e., invoking setRecordingHint). More
details are discussed in Section 5.3.2.
Options menu crash issue on LG devices. Figure 5.2 shows a code snippet
handling an infamous crashing FIC issue triggered by pressing the physical menu
button on some LG devices if the corresponding options menu is customized. The
LG’s solution (Lines 2–7) helps avoid app crashes by explicitly opening the options
menu (Line 5) rather than calling the onKeyUp() method of the super class. With
the example, we now illustrate the major steps and challenges in learning valid
API-device correlations from an Android app corpus.
64
Section 5.1 Lili Wei
Huge and evolving FIC issue search space. The existing API usage mining
techniques assume that the majority of API usages are correct and adopt conventional
statistical metrics (i.e., confidence or support). However, this assumption may not
hold for API-device correlation learning. Since the search space of FIC issues is huge
and evolving, in practice, FIC issues are commonly left unhandled in released apps.
As such, many valid API-device correlations, especially those related to new FIC
issues, are not subject to high confidence or support.
65
Section 5.2 Lili Wei
66
Section 5.2 Lili Wei
(1)
keyCode ==
KEYCODE_MENU
(2) call isLGE();
isLGE():
(3) return
MANUFACTURER.compareTo("LGE")==0;
(4) b1 = isLGE();
(5)
b1 == true
View v = getCurrentFocus();
Log.i(TAG, “Applying LG …"); …
(6) openOptionsMenu(); (7) return
return true; super.onKeyUp(keyCode, event);
First, the correlation extractor performs static analysis to extract raw API-device
correlations from the input apps. Then the correlation prioritizer ranks the extracted
API-device correlations.
Building inter-procedural control flow graphs. For each input app, the
correlation extractor builds an inter-procedural control flow graph by combining
the app’s call graph and each method’s control flow graph. Figure 6.2c shows the
inter-procedural control flow graph for the code snippet in Figure 5.2. For ease
of presentation, we label each block in the graph with a unique number. Note
that Android apps are event-driven. When building inter-procedural control flow
graphs, Pivot does not consider the implicit control flow between different event
handlers. As reported in a prior study, it is unlikely that FIC issue patches would
cross event handlers [163]. This means that the pieces of code by which an API-device
67
Section 5.2 Lili Wei
correlation is learned likely reside in the same event handler or its callees. As a
result, eliminating control flows between event handlers may not significantly affect
API-device correlation extraction.
In this step, the API-device correlations extracted in the first step are prioritized
based on their likelihood of capturing real FIC issues. As discussed in Section 5.1.3,
existing techniques cannot effectively filter invalid API-device correlations. To
prioritize API-device correlations, we propose a new approach that leverages two
metrics: in-app confidence and occurrence diversity. The two metrics are inspired by
our observations discussed in Section 5.1.3.
68
Section 5.2 Lili Wei
In-App Confidence
New FIC issues continually arise with the release of new device models and the
evolution of Android platforms. These issues may not be commonly and quickly
fixed in real-world Android apps. Therefore, the conventional metric confidence,
which is based on an API-device correlation’s popularity over the whole app corpus,
cannot effectively prioritize API-device correlations. On the other hand, although
FIC issues may not be commonly fixed, we observed that once developers of an app
identify a real FIC issue, they tend to modify all callsites of the issue-inducing API
within the app to fix the issue. The callsites of APIs irrelevant to FIC issues are
mostly not guarded by any device constraints. As such, we propose to use in-app
confidence (IAC ) to prioritize API-device correlations. The IAC of an API-device
correlation c in an app A is defined as:
# occurrences of c in A
IAC (A, c) = (5.1)
# callsites of c.api in A
where c.api is the API of c. Intuitively, IAC computes how often the invocation of
c.api is guarded by a device-checking statement that checks the app’s runtime device
model against the device model captured in c.
The total in-app confidence metric wIAC (c) for an API-device correlation c is the
sum of IACs of all apps in the corpus that contain the API-device correlation:
X
wIAC (c) = IAC (A, c) (5.2)
Occurrence Diversity
69
Section 5.2 Lili Wei
widely applied in ecology to measure the diversity of species [103,159]. Shannon index
is computed based on the distribution of different groups of entities in a population
and is defined as follows:
Xn
H=− pi log pi (5.3)
i=1
App-level diversity measures the diversity of the apps that contain the instances
of c. We use app names and app company names, which can be extracted from app
markets, as identifiers to distinguish different groups of apps.
70
Section 5.3 Lili Wei
Ranking Score
The extracted API-device correlations are ranked based on their ranking scores. If
the ranking score of an API-device correlation is higher, it is more likely to capture
FIC issues.
5.3 Evaluation
71
Section 5.3 Lili Wei
Pivot from the perspective of Android app developers, we reported all of our
detected FIC issues to their original app developers and sought for their feedback.
Sections 5.3.1 and 5.3.2 report the details of the two experiments and answer RQ1–2.
The experiments were conducted on a Linux server running CentOS 7.4 with two
Intel Xeon E5-2450 Octa Core CPU @2.1GHz and 192 GB RAM.
We analyzed the APK files of the apps in the two corpora and collected the apps’
meta data from Google Play [38]. Table 5.1 shows the statistics. As shown in the
table, the two app corpora both have tens of millions of classes and over one hundred
million methods. The apps are also of high quality and popular. The average rating
is 4.1 out of 5. On average, each app received over 16 and 14 million downloads in
Corpus 2017 and Corpus 2018, respectively.
72
Section 5.3 Lili Wei
Extractor (Section 5.2.1) with the technique’s filtering and ranking components.
Specifically, the API-device correlations were first filtered by their support. Those
API-device correlations that only occurred once in a corpus were filtered out. The
API-device correlations were then ranked by the multiplication of method-level
confidence and project-level (i.e., app-level) confidence. App-level confidence of an
API-device correlation c is computed as the ratio of the apps that contain c’s instances
to the apps that contain callsites of c’s API. Similarly, method-level confidence of an
API-device correlation c is the ratio of the methods containing callsites of c’s API
that are guarded by statements checking the device identifier modeled in c to the
methods that contain the callsites of c’s API.
73
Section 5.3 Lili Wei
100
Precision@N (%)
80
60
40
20
0
0 5 10 15 20 25 30 35 40 45 50
Top N API-Device Correlations
Pivot 2017 Pivot 2018 Baseline 2017 Baseline 2018
in Figure 5.2, the patch suggested by LG developers was to explicitly invoke an API
to open the options menu (Line 5) [45, 77]. As a result, the API-device correlation
hActivity.openOptionsMenu(), “LGE ”i is considered valid. We published the valid
API-device correlations and the corresponding online discussions at our project
website [63].
Results of Experiment I
Pivot extracted 32,047 and 24,355 API-device correlations from Corpus 2017 and
Corpus 2018, respectively. To evaluate the effectiveness of Pivot, we (1) randomly
sampled 50 API-device correlations extracted from each app corpus to show the
massiveness of noises, (2) evaluated Precision@N (N = 1, 2, 3, . . ., 50) for top 50
API-device correlations in each ranked list, (3)compared the results of Pivot for
the two different app corpora, and (4) compared the results of Pivot and that of
the baseline approach.
74
Section 5.3 Lili Wei
code snippets. This shows that noises are massive in the API-device correlations
extracted from the app corpora (approximately 97%). Identifying valid API-device
correlations is an outstanding challenge.
Precision@N of Pivot. Figure 5.5 plots the results of Experiment I. The two
solid lines show the results of Pivot for the two app corpora. In both corpora,
Pivot achieves 100% precision among the top five API-device correlations and over
90% precision among the top ten. The precision gradually drops as N grows. This
shows that Pivot can effectively prioritize valid API-device correlations. Pivot
identified 29 valid API-device correlations among the top 50 (58% precision@50) for
Corpus 2017 and 28 valid ones among the top 50 for Corpus 2018 (56% precision@50).
Among these 29 and 28 valid correlations, eight are overlapped.This means that
Pivot identified a total of 49 distinct valid API-device correlations.
Comparison with baseline. The two dashed lines in Figure 5.5 show the results
of the baseline approach for the two app corpora. Pivot significantly outperforms
the baseline approach. The baseline approach failed to rank any valid API-device
correlations to top 50 for Corpus 2017. For Corpus 2018, the baseline approach
only identified one valid API-device correlation at the 14th position with a small
75
Section 5.3 Lili Wei
We further studied why the baseline approach performed poorly. We found that
most of its top-ranked API-device correlations involve rarely-used APIs. For example,
13 API-device correlations received confidence of 1.0 and were ranked top by the
baseline approach for Corpus 2018. However, none of them are valid. This shows
that learning API-device correlations differs from mining API preconditions. General
API precondition mining techniques cannot be applied to identify valid API-device
correlations.
Experiment II Setup
76
Section 5.3 Lili Wei
and reproduce undiscovered instances of our archived FIC issues. In the following,
we discuss the details of this study.
Issue selection. From our archive, we selected five FIC issues to carry out the
case study. We provide videos to demonstrate the inconsistent app behaviors caused
by these issues on our project website [63]. We selected these five issues because:
(1) the device models that can trigger the issues are available on Amazon Device
Farm [8] or WeTest [94] and (2) the inconsistent app behaviors caused by the issues
are observable on the online testing platforms. The first criterion enables us to
reproduce the later detected FIC issues on online testing platforms. The second
criterion allows us to have a clear oracle to determine the occurrence of FIC issues.
App selection. To find undiscovered instances of the five FIC issues in real-world
Android apps, we collected the latest version of 44 apps indexed by F-Droid [28].
All these 44 apps (1) have received at least 50 stars on GitHub [33], (2) have over
500 commits, (3) have at least one push during a five-month period before the case
study, and (4) contain at least one callsite of any API related to the five FIC issues.
These criteria ensure that our selected apps (1) are popular and well-maintained and
(2) could be liable to the selected FIC issues.
Issue detection and reproduction. For each of the five selected issues, we
encoded it as a rule in FicFinder’s API-Context Pair format. We then ran
FicFinder using these rules as input to analyze the 44 Android app subjects.
FicFinder reported warnings for 35 of these subjects. We manually inspected these
35 apps and excluded those that require special hardware/software environments to
run (e.g., a bluetooth watch or specific server setups) from this study. As a result, we
focused on reproducing 19 detected issues in 19 apps using Amazon Device Farm [8]
and WeTest [94].
Among the 19 detected issues, we successfully reproduced ten in ten different apps.
We failed to reproduce the remaining nine due to three major reasons. First, some
apps did not exhibit inconsistent behavior as they only use the minor functionalities
of the issue-inducing APIs. For example, some apps use the Camera API to turn
on the flash light to use the phone as a torch. In such cases, the camera preview
was not displayed and the issues cannot be observed. Second, FicFinder generated
false positives because it failed to recover issue workarounds that already exist in
77
Section 5.4 Lili Wei
the apps. Third, we failed to reach the API callsites of several apps because they
crashed prematurely.
Table 5.2 gives the information of the ten apps, whose FIC issues were successfully
reproduced. These apps (1) contain thousands of lines of code (large-scale), (2)
cover different app categories (diversified), (3) receive over 50 stars on GitHub and
thousands of downloads on Google Play (popular), and (4) are rated at least 3.8 on
Google Play (decent quality). We reported the 10 reproducible issues to the original
app developers to seek their feedback. The issue IDs of our reports are provided in
the last column of Table 5.2.
Our reported FIC issues received due attention from app developers. Among
the ten reported issues, seven have been acknowledged and four were immediately
fixed. For example, the issue 92 of Calendula would crash the app when invoking
the built-in DatePicker API on some popular Samsung devices running Android 5.0
(e.g., Galaxy Note 5). Upon receiving our report, the app developer quickly fixed
the issue with a workaround documented in our issue archive [63].
These results show that API-device correlations can effectively help detect po-
tential FIC issues in Android apps and reproduce the issues on the corresponding
issue-triggering device models. The information provided by our FIC issue archive
(Section 5.3.2) is actionable and can facilitate the detection and diagnosis of FIC is-
sues. Specifically, the device identifiers and APIs specified in API-device correlations
can help reduce the search space of FIC issues. With the information, developers
can design tests to cover the app components that call FIC issue-inducing APIs and
execute these tests on the device models that can trigger the FIC issues to expose
potential issues. Compared with an unguided testing process, such a guided process
saves much testing effort (Section 5.1.2).
5.4 Discussions
Quality of input app corpora. Pivot can only identify valid API-device
correlations from apps that contain code snippets handling FIC issues. It is unlikely
that such code snippets would exist in apps that are not well-maintained. As such,
the quality of app corpora can affect the performance of Pivot. It is suggested to
78
Section 5.4 Lili Wei
build an app corpus with popular apps on app stores as we did in our experiments.
Pivot can then extract valid and useful API-device correlations.
We did not apply Pivot to the apps used by the empirical study in FicFinder [163,
164] to learn API-device correlations because the empirical study involved only five
apps, making it difficult for Pivot to distinguish valid API-device correlations from
noises. Alternatively, we compared the FIC issues in FicFinder’s dataset with
those learnt by Pivot. Ten of the 49 distinct valid API-device correlations learned
by Pivot are related to FIC issues in FicFinder’s dataset. Among the other 39
valid API-device correlations that were not included in FicFinder’s dataset, 14 of
them concern FIC issues that emerged in 2017 (FicFinder’s dataset was published
in 2016). This confirms that Pivot can identify new valid API-device correlations.
In Section 5.3.2, we have also shown that such automatically learned knowledge can
help detect undiscovered FIC issues in real-world Android apps.
79
Chapter 6
6.1 Approach
6.1.1 Overview
We propose Pixor based on the key observation that the large number of existing
Android apps, which contain a great variety of Android platform API use cases, can
80
Section 6.1 Lili Wei
Subroutine
Subroutine
Extractor
APK Subroutine Test App Test App Platform-Specific
APK
APK Reports
API Graph Selector Generator Runner Behavior Analyzer
API Graph
Constructor
be viewed as test cases to exercise diverse system functionalities and expose platform-
specific behaviors. This motivates us to leverage the large number of existing Android
apps to generate test cases and expose platform-specific behaviors by comparing the
behavior of the same test cases on different Android versions. Android apps contain
real and diverse use cases of Android APIs, which may not be easily covered by test
cases prepared by system testers. As a result, test cases generated from apps in
the wild are likely to expose platform-specific behaviors that escaped from Android
system testing procedure.
While the existing Android apps contain prolific Android API use cases, many of
them can be equivalent in exercising system functionalities and exposing platform-
specific behaviors. Blindly testing platform-specific behaviors can cause wasted
testing efforts on repeatedly exercising the same system behaviors. This poses the
key challenge in this work: how to guide the exploration of system functionalities
so that diverse system behaviors are exercised while redundant testing efforts are
minimized?
• Apps can be large, containing a large amount of code. Directly selecting apps
to conduct system testing can be to coarse-grained. The first challenge faced
81
Section 6.1 Lili Wei
by Pixor is: How to break down the input apps into subroutines so that the
subroutines can be easily executed while preserving their ability to trigger
UI-related platform-specific behaviors?
• How to estimate the system functionalities that can be exercised by the app
subroutines to identify equivalent subroutines and avoid exercising the same
system functionalities?
• Providing with the extracted subroutines and their exercised system func-
tionalities, the core challenge faced by Pixor is: how to leverage them to
effectively select subroutines to exercise diversified system functionalities to
expose UI-related platform-specific behaviors?
Subroutine Extractor first breaks down the input apps into subroutines, which
are later served as units for test selection. Desirable subroutines should be easy to
execute and their ability to trigger UI-related platform-specific behaviors should be
preserved.
82
Section 6.1 Lili Wei
To address this challenge, we made a key observation that Android apps use
system functionalities via Android API calls. We thereby propose to model API
use cases to estimate their exercisable system functionalities. Pixor models API
use cases as directed graphs, denoted as API Graphs. API Graph Constructor first
constructs an API Graph for each of the extracted subroutines and then merges
all the individual to construct a universal API Graph. The individual API Graphs
and universal API Graph abstracts the API use cases of individual subroutines and
the whole input app corpus respectively. In this section, we will (1) introduce our
definition of API Graph, (2) discuss the reasons why we model API use cases as
API Graphs, (3) introduce how API Graphs can be constructed from the input app
corpus, and (4) briefly discuss the relationship between individual API Graphs and
the universal API Graph.
83
Section 6.1 Lili Wei
API: findViewById()
Weight: 1 Weight: 2
Weight: 1 Weight: 1
viewPager != null
(2) call setupViewPager();
setupViewPager():
Log.i("TAG", "Setting View Pager");
(3) Adapter adapter = new
Adapter(getSupportFragmentManager());
…
viewPager.setAdapter(adapter);
84
Section 6.1 Lili Wei
of API Graphand Figure 6.2b shows the corresponding code snippet from which the
API Graph is built. A node in an API Graph represents an Android API. Each
distinct Android API will have only one node in the API Graph. An edge in an
API Graph represents invocation order between the APIs. Specifically, an edge from
AP I1 to AP I2 indicates that AP I1 is invoked before AP I2 and there is not other
Android API invocations between them. For example, the edge from “findViewById”
to “ViewPager.setAdapter” indicates that there exist paths in the program where
“findViewById” is invoked before “ViewPager.setAdapter” and no other APIs of
interest are invoked between them.
Why API Graph? There can be many alternative ways to model API use
cases. We chose to leverage such a directed graph to model API use cases due to the
flexibility it brings: API use case representations at different granularities can be
further derived from the graph-based model. For example, a simplest representation
can be leveraging only the node set in the graph (i.e., invoked API set) to represent
API use cases. On the contrary, we can also derive complicated paths from the
graph to capture interdependencies between the APIs. Since different API use case
representations can be suitable for different application scenarios, our API Graph
provides a great flexibility to fit in different testing scenarios.
To sum up, Gu abstracts the API use cases in all the apps in the input app
corpus, capturing the invoked APIs and their invocation order relationships. If the
app corpus is large enough, Gu can estimate the real-life API use cases in the wild.
85
Section 6.1 Lili Wei
On the contrary, Gi is a subgraph of Gu and abstracts the API use cases that can be
covered by subroutine Si . This information can be leveraged to guide the selection
of subroutines to expose platform-specific behaviors.
86
Section 6.1 Lili Wei
In our context, Gu can be viewed as the universal set of available API use cases.
API Graphs of individual subroutines can be viewed as its subsets. Our subroutine
selection problem can thereby be formalized as selecting a set of subroutines, whose
corresponding API Graphs can maximize the coverage of the universal graph Gu .
We can thereby apply the greedy algorithm solving set cover problem to design
our subroutine selection strategies. Algorithm 6.1 shows the greedy algorithm for
typical set cover problems. Intuitively, in each iteration, the greedy algorithm picks
the subset Si that can cover the most number of uncovered elements. The algorithm
stops when all the elements in U are covered by the selected subsets.
Nodes in API Graphs: invoked APIs. One simple and intuitive represen-
tation is to simply treat nodes in API Graphs (i.e., the invoked APIs) as elements
in sets. The universal set will be defined as the set of all the APIs in the universal
API Graph constructed from the whole input Android app corpus. Each subset,
Si , of a subroutine will be defined as the set of APIs invoked in the subroutine.
This representation only models single API invocations but cannot capture any
dependencies between APIs.
87
Section 6.1 Lili Wei
Test app generator finally transforms the selected subroutines into runnable test
apps. In general, Test App Generator modifies the configuration file of an app so
that the selected subroutine can be directly executed once the apps are launched.
Note that when transforming the apps, Pixor does not carve or change the code of
the input apk files.
The generated test apps are fed into test app runner.
Test App Runner performs differential testing. It executes the generated apps
on both the reference Android version and the Android version under test. Test
App Runner will execute the test apps on each Android version for three times and
produce a screenshot for each run of the test app. The generated screenshots will be
later analyzed to identify the triggered platform-specific behaviors. We run the test
apps on each version for three times to mitigate the noise induce by non-deterministic
elements (e.g., animations) of the user interfaces. In the next section, we will detail
our analysis to identify platform-specific behaviors from the generated screenshots.
88
Section 6.2 Lili Wei
89
Section 6.2 Lili Wei
6.2 Evaluation
Data Collection
To evaluate the effectiveness of Pixor, we collected 3,243 apk files from Google
Play [38]. These apps are all top free apps of each category on Google Play collected
in April 2019. The number of apk files is not in hundreds since we failed to download
some of the apps due to server errors. We further filtered out those apk files that
cannot be successfully executed on the emulators. We also filtered out those apk files
whose target SDK versions are lower than 23 since their behavior are changed on
Android Q [2]. As a result, 2,479 apk files are left as the evaluation subjects and
were input to Pixor.
90
Section 6.2 Lili Wei
subroutines to exercise Android systems. We ran the baseline method three times
and it selected 5000 subroutines each time. We reported the average results of the
three runs as the final result of this baseline method.
Ground Truth
91
Section 6.2 Lili Wei
Identified Bugs
In our experiments, Pixor has identified 44 distinct subroutines that can trigger
platform-specific behaviors. We manually inspected all the instances and identified
seven distinct platform-specific behaviors in Android Q Beta version.
We filed bug reports for all of these identified platform-specific behaviors in the
Android Issue Tracker to seek for feedback from Android system developers. Out
of the seven bugs, three of them are confirmed bugs and have been fixed in the
development of the beta versions. One bug was identified as an intended behavior
change. One of our reports was changed to private so we cannot follow the updates on
it any more. The other two bugs are still left pending at the time of the submission
of this thesis.
92
Section 6.2 Lili Wei
In addition, while both Greedy Edge and Greedy Node subroutine selection
strategies identified six distinct platform-specific behaviors, Greedy Edge subroutine
selection strategy identified a larger number of platform-specific behavior instances.
This shows that this finer-grained subroutine selection strategy may induce more
redundancy. This is reasonable since some platform-specific behaviors may not
require specific API invocation order to trigger. When leveraging edges in API
Graphs to guide subroutine selection, considering invocation order between APIs may
93
Section 6.3 Lili Wei
tend to select multiple subroutines that can trigger such platform-specific behaviors
and result in redundancy.
6.3 Discussions
6.3.1 Limitations
Not able to locate the specific APIs that can trigger the identified platform-specific
behaviors. Currently, Pixor can only leverage existing Android apps to trigger
platform-specific behaviorof different Android versions. It cannot locate the specific
APIs that can trigger the platform-specific behaviors. It is also non-trivial to manually
identify the specific APIs that can trigger the identified platform-specific behaviors.
As a result, we cannot derive API-context pairs to detect FIC issues in Android apps
from the triggered platform-specific behaviors of Pixor. In the future, we plan to
study how to locate the specific APIs that can trigger the platform-specific behaviors
and automatically derive API-context pairs from the subroutines. Specifically, we may
try to cluster the subroutines that can trigger the same platform-specific behaviors
and identify their invoked APIs in common.
API parameters are not considered in API Graph. Our API Graph does not
contain any information about the values of the parameters, which may also be
triggering conditions of platform-specific behaviors. We do not consider the parameter
values at this stage since (1) it is non-trivial to precisely infer the parameter values
from static analysis and (2) taking parameter values into account may further
complicate the representation and much enlarge the search space.
94
Chapter 7
Future Work
95
Section 7.2 Lili Wei
correlations may not be included in any clusters. We will also investigate how to
distinguish issue-inducing and issue-fixing API-device correlations in the clusters
and derive API-context pairs and issue fixing patterns from the clusters.
Guide subroutine selection with other information. Pixor now leverages API
invocations and their invocation orders to guide subroutine selection. Other informa-
tion may also be used to guide the selection such as the arguments passed to the
APIs, the user interface layout files, etc. Leveraging such information can help model
Android API use cases at finer-granularities and may improve the effectiveness of
Pixor in selecting subroutines.
Trigger user events after launching apps. Some platform-specific behaviors can
only be exposed after triggering user events. For example, some platform-specific
behaviors may result in broken functionalities of specific widgets such as ScrollView
may not be scrollable on some specific Android versions. Pixor now does not
trigger any user events after launching the selected Activities. In the future, we
plan to further integrate user event triggering components in Pixor to expose such
platform-specific behaviors.
96
Section 7.2 Lili Wei
97
Chapter 8
Conclusion
Based on the findings of our study, we proposed to use API-context pairs that
capture issue-inducing APIs and issue-triggering contexts to model FIC issues.
With this model, we further designed and implemented a static analysis technique
FicFinder to automatically detect FIC issues in Android apps. Our evaluation
of FicFinder on 53 large-scale subjects showed that FicFinder can detect many
previous-unknown FIC issues in the apps and provide useful information to help
developers diagnose and fix the issues.
98
Section 8.0 Lili Wei
strategy to prioritize them. The evaluation results show that our ranking strategy
can effectively identify valid API-device correlations and significantly outperform
an existing technique. Based on the learned API-device correlations, we built an
archive of FIC issues and further conducted a case study to show the usefulness of
API-device correlations.
Our latest effort aimed to actively expose platform-specific behaviors that can
induce FIC issues. We proposed Pixor, which breaks down existing Android apps
into subroutines and leverages them to perform differential testing for different
Android versions. Pixor effectively selects the minimized number of subroutines
that can maximize the exercised Android system functionalities. The evaluation
results show that Pixor can outperform the random baseline methods and effectively
expose platform-specific behaviors.
Our future work will focus on improving both Pivot and Pixor . Currently,
there is still noise in the top-ranked API-device correlations produced by Pivot.
In the future, we plan to further improve Pivot by improving its precision and
reducing duplications in its results by leveraging clustering techniques. We may also
study how to automate the validation of the API-device correlations and reduce
human efforts required by Pivot. As for Pixor, we may explore the possibility to
locate the exact APIs that induced platform-specific behaviors and further derive
API-context pairs from the subroutines.
99
PhD’s Publications
Thesis-related publications
• Lili Wei, Yepang Liu, S.C. Cheung, Huaxun Huang, Xuan Lu, and Xuanzhe
Liu. Understanding and Detecting Fragmentation-Induced Compatibility Issues
for Android Apps. in TSE 2018: IEEE Transactions on Software Engineering.
24 pages. Accepted, to appear.
• Lili Wei, Ming Wen, Yepang Liu, Shing-Chi Cheung, Yuanyuan Zhou. Pixor:
Exposing Platform-Specific Bugs in Android Systems via Differential App
Executions. Under submission.
Other publications
• Cong Li, Chang Xu, Lili Wei, Jue Wang, Jun Ma, and Jian Lu ELEGANT:
Towards Effective Location of Fragmentation-Induced Compatibility Issues for
Android Apps in APSEC 2018: Proceedings of the 25th Asia-Pacific Software
Engineering Conference, pp. 278–287 Nara, Japan, Dec 2018.
100
• Huaxun Huang, Lili Wei, Yepang Liu, and S.C. Cheung Understanding and
Detecting Callback Compatibility Issues for Android Applications in ASE
2018:The 33rd IEEE/ACM International Conference on Automated Software
Engineering, pp 702–713, Montpellier, France, Sep 2018.
• Jiajun Hu, Lili Wei, Yepang Liu, and S.C. Cheung, and Huaxun Huang. A
Tale of Two Cities: How WebView Induces Bugs to Android Applications
in ASE 2018:The 33rd IEEE/ACM International Conference on Automated
Software Engineering, pp 702–713, Montpellier, France, Sep 2018.
• Lili Wei, Yepang Liu, and S.C. Cheung. OASIS: Prioritizing Static Analysis
Warnings for Android Apps Based on App User Reviews in ESEC/FSE 2017:
The 11th joint meeting of the European Software Engineering Conference and
the ACM SIGSOFT Symposium on the Foundations of Software Engineering,
pp 672–682, Paderborn, Germany, Sep 2017
101
References
102
[14] Android’s Fragmentation Problem. http://www.greyheller.com/Blog/androids-
fragmentation-problem/, 2019-01-17.
103
[33] GitHub. https://github.com/, 2019-01-17.
104
[53] OctoDroid. https://github.com/slapperwan/gh4a, 2019-01-17.
105
[74] Smartphone OS Market Share, 2015 Q2.
http://www.idc.com/prodserv/smartphone-os-market-share.jsp, 2019-01-17.
[82] There are now more than 24,000 different Android devices. https://qz.com/
472767/there-are-now-more-than-24000-different-android-devices/,
2019-01-17.
106
[93] Weechat Android. https://github.com/ubergeek42/weechat-android,
2019-01-17.
[99] Yousra Aafer, Xiao Zhang, and Wenliang Du. Harvesting inconsistent secu-
rity configurations in custom android roms via differential analysis. In 25th
{USENIX} Security Symposium ({USENIX} Security 16), pages 1153–1168,
2016.
[100] Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. Mining API Patterns as
Partial Orders from Source Code: From Usage Scenarios to Specifications. In
ESEC/FSE, pages 25–34. ACM, 2007.
[101] Frances E Allen. Control flow analysis. In ACM Sigplan Notices, volume 5,
pages 1–19. ACM, 1970.
[104] Kai Chen, Peng Liu, and Yingjun Zhang. Achieving Accuracy and Scalability
Simultaneously in Detecting Application Clones on Android Markets. In ICSE,
pages 175–186, 2014.
[105] Yang Chen, Alex Groce, Chaoqiang Zhang, Weng-Keen Wong, Xiaoli Fern,
Eric Eide, and John Regehr. Taming compiler fuzzers. In ACM SIGPLAN
Notices, volume 48, pages 197–208. ACM, 2013.
107
[106] Yuting Chen, Ting Su, and Zhendong Su. Deep differential testing of jvm
implementations. In Proceedings of the 41st International Conference on
Software Engineering, pages 1257–1268. IEEE Press, 2019.
[107] Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Jianjun Zhao.
Coverage-directed differential testing of jvm implementations. In ACM SIG-
PLAN Notices, volume 51, pages 85–99. ACM, 2016.
[108] Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. Automated
Test Input Generation for Android: Are We There Yet? In ASE, pages 429–440,
2015.
[109] John W. Creswell. Qualitative Inquiry and Research Design: Choosing Among
Five Approaches (3rd Edition). SAGE Publications, Inc., 2013.
[110] Emilio Cruciani, Breno Miranda, Roberto Verdecchia, and Antonia Bertolino.
Scalable approaches for test suite reduction. In Proceedings of the 41st In-
ternational Conference on Software Engineering, pages 419–429. IEEE Press,
2019.
[111] Lingling Fan, Ting Su, Sen Chen, Guozhu Meng, Yang Liu, Lihua Xu, Geguang
Pu, and Zhendong Su. Large-Scale Analysis of Framework-Specific Exceptions
in Android Apps. In ICSE, pages 459–472, 2018.
[113] Jeanne Ferrante, Karl J Ottenstein, and Joe D Warren. The program depen-
dence graph and its use in optimization. TOPLAS, 9(3):319–349, 1987.
[114] Jaroslav Fowkes and Charles Sutton. Parameter-Free Probabilistic API Mining
across GitHub. In FSE, pages 254–265, 2016.
[115] John C Gower and Gavin JS Ross. Minimum Spanning Trees and Single
Linkage Cluster Analysis. Applied statistics, pages 54–64, 1969.
[116] David Grove, Greg DeFouw, Jeffrey Dean, and Craig Chambers. Call Graph
Construction in Object-Oriented Languages. In OOPSLA, pages 108–124,
1997.
[117] Matthew Halpern, Yuhao Zhu, Ramesh Peri, and Vijay Janapa Reddi. Mo-
saic: Cross-Platform User-Interaction Record and Replay for the Fragmented
Android Ecosystem. In ISPASS, pages 215–224, 2015.
108
[118] Hyung Kil Ham and Young Bom Park. Designing Knowledge Base Mobile
Application Compatibility Test System for Android Fragmentation. IJSEIA,
8(1):303–314, 2014.
[119] Dan Han, Chenlei Zhang, Xiaochao Fan, Abram Hindle, Kenny Wong, and
Eleni Stroulia. Understanding Android Fragmentation with Topic Analysis of
Vendor-Specific Bugs. In WCRE, pages 83–92, 2012.
[120] Edith Hemaspaandra and Gerd Wechsung. The Minimization Problem for
Boolean Formulas. In Foundations of Computer Science, 1997. Proceedings.,
38th Annual Symposium on, pages 575–584, 1997.
[121] Andreas Holzinger, Peter Treitler, and Wolfgang Slany. Making Apps Useable
on Multiple Different Mobile Platforms: On Interoperability for Business
Application Development on Smartphones. In CD-ARES, pages 176–189. 2012.
[122] Mona Erfani Joorabchi, Ali Mesbah, and Philippe Kruchten. Real Challenges
in Mobile App Development. In ESEM, pages 15–24, 2013.
[123] Jouko Kaasila, Denzil Ferreira, Vassilis Kostakos, and Timo Ojala. Testdroid:
Automated Remote UI Testing on Android. In MUM, number 28, 2012.
[125] Hammad Khalid, Meiyappan Nagappan, Emad Shihab, and Ahmed E Hassan.
Prioritizing the Devices to Test Your App on: A Case Study of Android Game
Apps. In FSE, pages 610–620, 2014.
[126] Taeyeon Ki, Chang Min Park, Karthik Dantu, Steven Y Ko, and Lukasz Ziarek.
Mimic: Ui compatibility testing system for android apps. In Proceedings of the
41st International Conference on Software Engineering, pages 246–256. IEEE
Press, 2019.
[127] Yunho Kim, Youil Kim, Taeksu Kim, Gunwoo Lee, Yoonkyu Jang, and Moonzoo
Kim. Automated Unit Testing of Large Industrial Embedded Software Using
Concolic Testing. In ASE, pages 519–528, 2013.
109
[129] Stephen Kyle, Hugh Leather, Björn Franke, Dave Butcher, and Stuart Monteith.
Application of domain-aware binary fuzzing to aid android virtual machine
testing. In ACM SIGPLAN Notices - VEE ’15, volume 50, pages 121–132.
ACM, 2015.
[130] Patrick Lam, Eric Bodden, Ondrej Lhoták, and Laurie Hendren. The Soot
Framework for Java Program Analysis: A Retrospective. In CETUS, 2011.
[131] Vu Le, Mehrdad Afshari, and Zhendong Su. Compiler validation via equivalence
modulo inputs. In ACM SIGPLAN Notices, volume 49, pages 216–226. ACM,
2014.
[132] Owolabi Legunsen, Farah Hariri, August Shi, Yafeng Lu, Lingming Zhang,
and Darko Marinov. An extensive study of static regression test selection in
modern software evolution. In Proceedings of the 2016 24th ACM SIGSOFT
International Symposium on Foundations of Software Engineering, pages 583–
594. ACM, 2016.
[133] Huoran Li, Xuan Lu, Xuanzhe Liu, Tao Xie, Kaigui Bian, Felix Xiaozhu Lin,
Qiaozhu Mei, and Feng Feng. Characterizing Smartphone Usage Patterns from
Millions of Android Users. In IMC, pages 459–472, 2015.
[134] Huoran Li, Xuan Lu, Xuanzhe Liu, Tao Xie, Kaigui Bian, Felix Xiaozhu Lin,
Qiaozhu Mei, and Feng Feng. Characterizing Smartphone Usage Patterns from
Millions of Android Users. In IMC, pages 459–472, 2015.
[135] Li Li, Tegawendé F Bissyandé, Jacques Klein, and Yves Le Traon. An In-
vestigation into the Use of Common Libraries in Android Apps. In SANER,
volume 1, pages 403–414, 2016.
[136] Li Li, Tegawendé F Bissyandé, Haoyu Wang, and Jacques Klein. CiD: Au-
tomating the Detection of API-Related Compatibility Issues in Android Apps.
In ISSTA, pages 153–163, 2018.
[137] Li Li, Jun Gao, Tegawendé F Bissyandé, Lei Ma, Xin Xia, and Jacques Klein.
Characterising deprecated android apis. In MSR 2018, 2018.
110
[139] Mario Linares-Vásquez, Gabriele Bavota, Carlos Bernal-Cárdenas, Massimil-
iano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. API Change and Fault
Proneness: A Threat to the Success of Android Apps. In FSE, pages 477–487,
2013.
[143] Yepang Liu, Chang Xu, and Shing-Chi Cheung. Characterizing and Detecting
Performance Bugs for Smartphone Applications. In ICSE, pages 1013–1024,
2014.
[145] Long Lu, Zhichun Li, Zhenyu Wu, Wenke Lee, and Guofei Jiang. Chex:
Statically Vetting Android Apps for Component Hijacking Vulnerabilities. In
CCS, pages 229–240. ACM, 2012.
[146] Xuan Lu, Xuanzhe Liu, Huoran Li, Tao Xie, Qiaozhu Mei, Dan Hao, Gang
Huang, and Feng Feng. PRADA: Prioritizing Android Devices for Apps by
Mining Large-Scale Usage Data. In ICSE, pages 3–13, 2016.
[147] Tyler McDonnell, Bonnie Ray, and Miryung Kim. An Empirical Study of API
Stability and Adoption in the Android Ecosystem. In ICSM, pages 70–79,
2013.
[149] Atif Memon, Zebao Gao, Bao Nguyen, Sanjeev Dhanda, Eric Nickell, Rob
Siemborski, and John Micco. Taming google-scale continuous testing. In
111
Proceedings of the 39th International Conference on Software Engineering:
Software Engineering in Practice Track, pages 233–242. IEEE Press, 2017.
[150] Fatih Nayebi, Jean-Marc Desharnais, and Alain Abran. The State of the Art
of Mobile Application Usability Evaluation. In CCECE, pages 1–4, 2012.
[151] Hoan Anh Nguyen, Robert Dyer, Tien N Nguyen, and Hridesh Rajan. Mining
Preconditions of APIs in Large-Scale Code Corpus. In FSE, pages 166–177,
2014.
[152] Abhinav Pathak, Y Charlie Hu, and Ming Zhang. Bootstrapping Energy
Debugging on Smartphones: A First Look at Energy Bugs in Mobile Devices.
In HotNets, number 5. ACM, 2011.
[153] Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. Deepxplore: Auto-
mated whitebox testing of deep learning systems. In Proceedings of the 26th
Symposium on Operating Systems Principles, pages 1–18. ACM, 2017.
[154] Charles Prud’homme, Jean-Guillaume Fages, and Xavier Lorca. Choco Docu-
mentation. TASC - LS2N CNRS UMR 6241, COSLING S.A.S., 2017.
[155] Murali Krishna Ramanathan, Ananth Grama, and Suresh Jagannathan. Static
Specification Inference Using Predicate Mining. In PLDI, pages 123–134, 2007.
[156] John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun
Yang. Test-case reduction for c compiler bugs. In ACM SIGPLAN Notices,
volume 47, pages 335–346. ACM, 2012.
[157] Martin P Robillard, Eric Bodden, David Kawrykow, Mira Mezini, and Tristan
Ratchford. Automated API Property Inference Techniques. TSE, 39(5):613–637,
2013.
[159] Ian F Spellerberg and Peter J Fedor. A Tribute to Claude Shannon (1916–2001)
and a Plea for More Rigorous Use of Species Richness, Species Diversity and
the “Shannon–Wiener” Index. Global Ecology and Biogeography, 12(3):177–179,
2003.
[160] Yuan Tian, Meiyappan Nagappan, David Lo, and Ahmed E Hassan. What
are the Characteristics of High-Rated Apps? A Case Study on Free Android
Applications. In ICSME, pages 301–310, 2015.
112
[161] Sergiy Vilkomir and Brandi Amstutz. Using Combinatorial Approaches for
Testing Mobile Applications. In ICSTW, pages 78–83, 2014.
[162] Jue Wang, Yingnong Dang, Hongyu Zhang, Kai Chen, Tao Xie, and Dongmei
Zhang. Mining Succinct and High-Coverage API Usage Patterns from Source
Code. In MSR, pages 319–328. IEEE Press, 2013.
[163] Lili Wei, Yepang Liu, and Shing-Chi Cheung. Taming Android Fragmentation:
Characterizing and Detecting Compatibility Issues for Android Apps. In ASE,
pages 226–237, 2016.
[164] Lili Wei, Yepang Liu, and Shing-Chi Cheung. Understanding and detection
fragmentation-induced compatibility issues in android apps. In TSE, 2018.
[165] Lili Wei, Yepang Liu, and Shing-Chi Cheung. Pivot: Learning api-device
correlations to facilitate android compatibility issue detection. In ICSE, 2019.
[166] Lili Wei, Yepang Liu, and Shing-Chi Cheung. PIVOT: Learning API-Device
Correlations to Facilitate Android Compatibility Issue Detection. In Proceedings
of the 41th International Conference on Software Engineering, ICSE 2019,
page 11, 2019.
[167] Lei Wu, Michael Grace, Yajin Zhou, Chiachih Wu, and Xuxian Jiang. The
Impact of Vendor Customizations on Android Security. In CCS, pages 623–634,
2013.
[168] Tao Xie and Jian Pei. MAPO: Mining API Usages from Open Source Reposi-
tories. In MSR, pages 54–57. ACM, 2006.
[169] Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and under-
standing bugs in c compilers. In ACM SIGPLAN Notices, volume 46, pages
283–294. ACM, 2011.
[170] Shin Yoo and Mark Harman. Regression testing minimization, selection and
prioritization: a survey. Software Testing, Verification and Reliability, 22(2):67–
120, 2012.
[171] Dongsong Zhang and Boonlit Adipat. Challenges, Methodologies, and Issues
in the Usability Testing of Mobile Applications. IJHCI, 18(3):293–308, 2005.
[172] Hang Zhang, Dongdong She, and Zhiyun Qian. Android ion hazard: The
curse of customizable memory management system. In Proceedings of the 2016
113
ACM SIGSAC Conference on Computer and Communications Security, pages
1663–1674. ACM, 2016.
[173] Lingming Zhang. Hybrid regression test selection. In 2018 IEEE/ACM 40th
International Conference on Software Engineering (ICSE), pages 199–209.
IEEE, 2018.
[174] Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, and Hong Mei. MAPO: Mining and
Recommending API Usage Patterns. In ECOOP, pages 318–343. Springer,
2009.
[175] Xiaoyong Zhou, Yeonjoon Lee, Nan Zhang, Muhammad Naveed, and XiaoFeng
Wang. The Peril of Fragmentation: Security Hazards in Android Device Driver
Customizations. In SP, pages 409–423, 2014.
114
ProQuest Number: 29098967
This work may be used in accordance with the terms of the Creative Commons license
or other rights statement, as indicated in the copyright statement or in the metadata
associated with this work. Unless otherwise specified in the copyright statement
or the metadata, all rights are reserved by the copyright holder.
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346 USA