You are on page 1of 6

2017 Computer Science and Software Engineering (CSSE)

A Quantitative Evaluation of Usability in Mobile


Applications: An Empirical Study
Fatemeh Zahra Ghazizadeh Shiva Vafadar
MS.C. Student Assistant Professor
Engineering Department Engineering Department
Golestan University Golestan University
Gorgan, Golestan, Iran Gorgan, Golestan, Iran
Mhs.ghazizade@gmail.com Sh.vafadar@gu.ac.ir

Abstract— Developing high quality applications, in adequate time decades, using mobile devices were increasing enormously, and
and low cost is the goal of using software engineering methods and mobile applications are popular these days. Presenting a usable
processes. Over the years, a focus on usability has shown itself to mobile application serving various users is an open issue in
be one of the cheapest and easiest ways to improve a system’s mobile applications.
quality (or more precisely, the user’s perception of quality). This
quality factors is an important issue in user experience of mobile Over the years, a focus on usability has shown itself to be
applications. one of the cheapest and easiest ways to improve a system’s
quality (or more precisely, the user’s perception of quality). This
In this paper, our research question is if using various types of user quality factors is an important issue in user experience of mobile
guide affects the usability of mobile applications. Our contribution applications [1].
in this paper is focusing on quantitative evaluation via logging the
user interaction automatically. Based on this point of view, the According to the ISO definition, usability is “Extent to which
research question is defined in more details: if using animation or a product can be used by specified users to achieve specified
text user guide results a significant difference on timing factors of goals with effectiveness, efficiency and satisfaction in a
usability including task completion time, time spent using help, specified context of use“[2]. Software usability is evaluated via
and time used to search a button to perform a specific function. different methods. Various researches evaluate it by using
questionnaire and by using qualitative approaches. But these
To answer this question, we design an empirical study. In this methods are not usually precise and they may suffer from
study, a mobile application has been selected and two types of user inconsistency between user answers and real usage the
guide (animation and text) are generated. To run the study, 68 applications.
audiences are participated that are divided into two independent
groups. They use two versions of the mobile application: One In this research, we concentrate on quantitative evaluation of
contains an animated user guide for a functionality, while the usability based on the user practices in the application. Based on
other application has a text user guide that describes the same this point of view, we consider usability as a general quality
functionality. The activities of the users are logged and used to attribute that should be divided into more detailed and
measure selected time factors. measurable factors. We select time and measure it based on
duration to complete a task by user. In this paper, three types of
Applying the Mann-Whitney U Test on the results shows that time are measure; task completion time [3, 4, 5, 6, 7], time spent
animation user guide does not improve the time used to search a using help [3, 5, 7], and time used to search a button to perform
button to perform a specific function significantly, but it makes a specific function [6].
76.9% improvement in time spent using help and 32.4% in task
completion time. In the other words, the result of this study shows Usability of software is achieved by applying various
that animation user guide decreases the total time of task techniques. Improving software learning is one of them.
completion, and consequently it increases the usability of the Providing user manuals in software system is a method to help
function in our case study. learning and consequently usability of the software [8]. In
traditional software, it was in the form of text documents, while
Keywords-component; usability; mobile; user guide; empirical new generations usually utilize graphical user guides.
evaluation
In this paper, we have a research question. If using various
I. INTRODUCTION types of user guide affects the usability of the mobile
Software quality is always a challenge in developing applications. Based on our quantitative approach, the research
software applications. Improving quality, in adequate time and question is specified more precisely: if using animation or text
low cost is an important issue software engineering. Mobile user guide results a significant difference on timing factors of
applications are not exceptions in software industry. In past the usability including task completion time, time spent using

978-1-5386-1302-3/17/$31.00 ©2017 IEEE


1
2017 Computer Science and Software Engineering (CSSE)

help, and time used to search a button to perform a specific task completion time, time spent using help, time used to search
function. a button to perform a specific function.
To answer this question, in this paper we design an empirical [8] presented a new technique called ToolClips that complete
study. In this study, we select an android calendar application. A traditional tooltips. This technique is provided for users to have
popular functionality of the application is chosen, and quick access to video guidance in addition to the text guide.
interactions of the each user and the application are logged
automatically (by adding the required source code to the In this paper, according to the results of the comparison of
application). Two types of the user manual are obtained for the heuristic evaluation and user testing methods in [17], evaluation
application. Animated user guide demonstrates how the user can is conducted in the real scenario, real environment and real
find the target functionally. At the other hand, the text user guide users, which makes the data more realistic and does not have the
describes the same context by words. problems that in laboratory controlled environments. According
to the results of the comparison of the methods in [17], heuristic
To run the study, 68 audiences are participated. They are evaluation is more cost effective because of finds a wide range
divided into two independent groups and they use two versions of usability issues and costs less, but user testing can collect
of the mobile application: One contains an animated user guide more real data due to the use of real users.
for a functionality while the other has a text user guide describes
the same functionality. The activities of the users are logged and III. EVALUATION METHOD
the information is used to measure selected time factors. The In this research, an empirical study was designed and
results of the study show that users with animated guide spend implemented. The purpose of this empirical study is to evaluate
less time in the help (to learn the functionality), in comparison the effectiveness of the user guide on usability. In more details,
with the users with text guide. But both groups approximately we compare the influence of animation and text guide on the
need same time to find the functionality and execute it. This timing factors of usability. In this study three types of time are
means that they don’t have significant difference to run the measure, task completion time, time spent using help, time used
application after they learn it. Finally, the results show that the to search a button to perform a specific function. Our
total run time of two groups are different. Because this time is contribution in this paper is focusing on quantitative evaluation
affected by the time is spent in the user guide. via logging the user interaction automatically. Data are
The remainder of this paper is organized as follow: In automatically collected and then analyzed by using Mann-
Section 2, the related researches are described. In this section, Whitney U Test. This empirical study is based on the guidelines
we describe the usability models and evaluation in mobile and rules outlined in [18] and [19].
applications. In Section 3, the details of the empirical study are To ensure that destructive and intrusive variables which may
specified. In this section, variables, participants, tools and destroy test results, do not affect the test, we have tried to control
process of the study are described. In Section 4, the results of the test conditions, including the work done in this area:
running the empirical study are demonstrated. We describe the
tests and our method to analyze them. This section also contains • All users perform the same task with the same details so
the results of the study in more details. Finally, in Section 5, we that things like user's manner do not affect the test result.
conclude and present our agenda to pursue our research for • Users who were selected in this experiment did not have
evaluating usability of the mobile applications quantitatively. the Previous experience of working with our
II. RELATED STUDY
application.
So far, several models have been developed to evaluate the • Because each person feels more comfortable with his
usability of mobile applications. For example, the usability phone, the application is installed on the user's phone
evaluation framework presented in [11], models the usability and they perform tasks on their phone.
factors in a hierarchical manner and evaluates usability through
• Written explanations about the task are given to users,
a set of checklist and rating approach. An empirical evaluation
to ensure that the explanations are the same for all users.
in [12] is performed on 10 applications that have been selected
based on their popularity. In this qualitative evaluation, a • Data are collected automatically to reduce the amount
questionnaire is given to a number of users, eventually their of data errors.
scores are reviewed. The evaluation performed in [13] just for
iOS applications and is based on identifying the sequence of A. Participants
interactions and their pattern. Also, the Compound Evaluation In this experiment 68 audiences are participated, all of whom
Model [14] evaluated qualitatively three types of Android were students aged 20 to 24 years. Participants were divided into
mobile applications, which uses Aspect-Oriented programming two independent groups of 34 participants. A similar task is
to inject the code into the application and extract the required given to each group. Each group must first observe the user
interactions. also, the tools presented in [13], [15], [16], [20], guide that provided in the application, before to start using the
record events associated with user interactions, by injecting code application. The first group performed the task with the
into the application. animation guide and the second group performed the same task
using the text guide. The research hypothesis is that the mean of
In this paper, by referring to the factors that set out in [4], total runtime for these two groups are different.
[5], [6], [3], [7], three types of time are selected that including

2
2017 Computer Science and Software Engineering (CSSE)

B. Variables • After the task is completed, a text file is taken from the
In this experiment, the impact of two types of user guide on user's phone and required time factors are extracted
three usability factors is evaluated. The factors are: from it.

• Task completion time


• Time spent using help IV. METHOD OF EVALUATION OF RESULTS

• Time used to search a button to perform a specific To analyze the results, it is necessary to first determine the
function statistical test for the study. In this study, because the users are
in two independent groups, and these two samples do not have
These time factors are measured and calculated based on any dependencies, as well as the factors are quantitative and
units of seconds. continuous, and as well as regarding the result of the normality
test on observations is that the data are not normal (will be
Task completion time factor is the sum of two other factors
explained in following section), and since the Mann-Whitney
and other user responses during the task completion path.
test is also suitable for quantitative non-normal data, so we
In the remainder of this paper, for convenience, task chose the Mann-Whitney test for data analysis. In this test, the
completion time is called the first factor, time spent using help mean of the factors is compared in two samples. There are two
is called the second factor, and time used to search a button to main assumptions in this test:
perform a specific function is called the third factor.
H0: There is no difference between two groups
C. instrumentation
H1: There is a difference between the two groups
In this study, an android calendar application has been used
for evaluation. A popular functionality of the application is After the test, if the significance value (sig.) of the test is
chosen. The task as told to users is register a reminder in greater than 0.05, then the assumption H0 is accepted, which
application. This task, along with its duties and details, is similar means that the two groups do not differ, and if the significance
for all users. value is less than 0.05, then the assumption H0 will be rejected.
Also, because the number of samples in this study is high,
The manual shows the user how and from which path they another value called Z should be considered in addition to the
can access the reminder registration page. This content is sig. value. With confidence value of 95%, referring to the
identical for both manual, in the text guide it is provided as a distribution table Z, if the value of Z obtained from the Mann-
plain text and in animation guide as a short animation. Whitney test is greater than 1.64, the H0 hypothesis will be
Data collection in this study is done automatically. For this rejected. This comparison is done using the SPSS software, and
purpose, we have generated codes and added to the application determines whether there is a significant difference between the
to automatically log all user interaction with application with the mean of the factors in two groups.
date and time of the event, including the buttons that they have A. Statistical analysis of evaluation results
chosen, the pages that were visited, the functions that were
In this section, the statistical analysis of the study results is
recalled. These data are stored in a text file.
described. First, we examine the normality of the data in order
D. procedure to select the appropriate analysis test. Table I. shows the results
The following steps have been taken to run the study: of normality test for first factor for both group. Table II. shows
the results of normality test for second factor and Table III.
• Initially, a form was given to users and they should shows the results of normality test for third factor. For normal
explain personal characteristics, such as their age and data, sig. value should be greater than 0.05.
level of education, and their field of study, in order to
determine whether users have same features. To summarize, we showed the normality histogram only
for the third factor, but we set the tables completely for all
• application was installed on users' phone to avoid factors. Figure 1 and Figure 2 show the normality test
interrupted variables that have a negative effect on the histogram of the third factor (time used to search a button to
study. perform a specific function) in each group. As the figures show,
the distribution of data are not normal.
• The process that was told to all users is that: First, go to
the main menu of the application, find the guide and see
it, then with the help of the observations in the guide, try
to register a reminder with the predefined title and
specifications that given to them. These specifications
are same for all users.
• Users perform the task and each interactivity that the
user has with the application is automatically recorded
and stored in the form of a text file.

3
2017 Computer Science and Software Engineering (CSSE)

TABLE III. RESULT OF NORMALITY TEST FOR THIRD FACTOR

Shapiro-Walk
Group
Sig.
Search AnimationGroup 0.00
time TextGroup 0.00

According to the results, the distribution of data in groups is


not normal, so we compare the mean of factors in groups by
applying Mann-Whitney test.
The next step is applying the Mann-Whitney test on the
collected information. Table IV the mean rank and sum of ranks
value for the first factor, broken down by group, are shown.

TABLE IV. MEAN RANK AND SUM OF RANKS FOR FIRST FACTOR

Ranks
Group
N Mean Rank Sum of Ranks
Figure 1. Histogram of normality test result for third factor in group wich AnimationGroup 34 29.68 1009.00
used text guide TextGroup 34 39.32 1337.00
Total 68

Table V. also shows the Z value and the significance value


of the test (sig.) for first factor. According to the results, |Z| =
2.012 which is greater than 1.64 and the significance value is
less than 0.05, so the assumption H0 is rejected, it means that,
there is a significant difference in the mean of task completion
time between two groups. Referring to Table IV., can find that
the mean of task completion time in the group using the text
guide is greater than the group that used the animation guide.

TABLE V. SIGNIFICANT VALUE OF COMPARISON OF MEANS IN TWO


GROUPS FOR FIRST FACTOR

Test Statistics
Total Time
Mann-Whitney U 414.000
Wilcoxon W 1009.000
Z -2.012
Asymp. Sig. (2-tailed) 0.044

TABLE VI. MEAN RANK AND SUM OF RANKS FOR SECOND FACTOR
Figure 2. Histogram of normality test result for third factor in group wich
used animation guide Ranks
Group
N Mean Rank Sum of Ranks
TABLE I. RESULT OF NORMALITY TEST FOR FIRST FACTOR AnimationGroup 34 24.91 847.00
TextGroup 34 44.09 1499.00
Shapiro-Walk Total 68
Group
Sig.
AnimationGroup 0.276
Total time
TextGroup 0.005 TABLE VII. SIGNIFICANT VALUE OF COMPARISON OF MEANS IN TWO
GROUPS FOR SECOND FACTOR

Test Statistics
TABLE II. RESULT OF NORMALITY TEST FOR SECOND FACTOR
Total Time
Shapiro-Walk Mann-Whitney U 252.000
Group Wilcoxon W 847.000
Sig.
Z -4.102
AnimationGroup 0.00 Asymp. Sig. (2-tailed) 0.000
Help time
TextGroup 0.001

4
2017 Computer Science and Software Engineering (CSSE)

Table VII. shows the result of test for second factor. It V. CONCLUSION AND FUTURE WORK
shows that Z value is large negative, that equal to |Z|=4.102 In this paper, a quantitative evaluation was performed on
which is greater than 1.64 and the value of significance is equal usability attribute. Three type of time factors were selected as
to zero, which is less than 0.05, therefore, in this factor, the usability factors. The evaluation focused on the impact of user
assumption H0 is rejected, it means that there is a significant guides on these usability factors. In this study, an android
difference in the means of time spent using help in the two calendar application has been used as our case study. Two types
groups. By referring to Table VI. the Mean Rank of this factor, of user guides, e.g. animation and text user guides, were
we can find that the mean time spent using help, in the group obtained for a selected functionality of the application. 68 users
using the text guide, is greater than the group that used the were participated that were divided into two independent
animation guide. groups. One group performed the task by utilizing the animation
guide, while the other group performed the same task by using
TABLE VIII. MEAN RANK AND SUM OF RANKS FOR SECOND FACTOR
the text guide. All the interactions were recorded automatically
Ranks by the application, in terms of the exact time of each event
Group
N Mean Rank Sum of Ranks generated by the user. This empirical study was designed and
AnimationGroup 34 33.82 1150.00 executed to help us for answering our research question, which
TextGroup 34 35.18 1196.00 was if using animation or text user guide results a significant
Total 68 difference on the timing factors of usability including task
completion time, time spent using help, and time used to search
a button to perform a specific function. Based on our study, our
TABLE IX. SIGNIFICANT VALUE OF COMPARISON OF MEANS IN TWO answer is that the users with animated guide spend less time in
GROUPS FOR SECOND FACTOR
the user guide (to learn the functionality) in comparison to the
Test Statistics users with text guide. But both groups approximately needed
same time to find the functionality and execute it. Finally, total
Total Time
run time of the groups are different and users with animated
Mann-Whitney U 555.000
Wilcoxon W 1150.000
guide have less total time.
Z -0.284 This paper reports our preliminary results on evaluating the
Asymp. Sig. (2-tailed) 0.77
usability of mobile applications quantitatively. It is important we
consider the scope and limitations of the study. These issues
Table IX. shows the result of test for third factor. It shows clearly show that our results are just valid in the scope of this
that Z value is equal to |Z|=0.284 which is smaller than 1.64 and study. For more general results we need more studies in this area.
the sig. value is equal to 0.77, which is greater than 0.05, so there Our future work focuses on the limitations and try to overcome
is no reason to reject the hypothesis H0, it means that, in the them as follow:
mean time used to search a button to perform a specific
1- Our study was designed and executed for an android
function, there is no meaningful difference between two groups.
application. In order to make the results more reliable,
To demonstrate differences between evaluated factors, we need to generate empirical studies considering
figure 3 is used. This figure shows the Mean Rank of three different applications with different levels of
timing factors in two groups. The data from this Chart have complexity in a variety of applications domains.
been extracted from the results of the previous Mann-Whitney
2- In this paper, the experiment was performed on one a
test. As can see, the average of task completion time (total time)
functions of the selected application. As the
and time spent using help (learning time), in the group using the
functionality was almost an easy task, it may affect our
text guide, has a significant difference with the group that used
results. In the future work, we are going to extend the
the animation guide.
empirical study to contain all functionality of the
application or functions that have particular level of the
50
complexity.
40
3- In this study, we compared animation and text user
30 guides. Other types of user guides can be the subject of
20 evaluation; such as videos that use voice and graphic
10 simultaneously or tooltip helps and etc.
0 REFERENCES
Task completion Time spent using Time used to search [1] Hartmut Hoehle, Ruba Aljafari, Viswanath Venkatesh, "Leveraging
time help a button to perform Microsoft's mobile usability guidelines: Conceptualizing and developing
a specific function scales for mobile application usability", International Journal of Human-
Computer Studies, 2016
Animation guide Text guide
[2] ISO 9241: Ergonomics Requirements for Office Work with Visual
Display Terminals (VDTs) International Standards Organisation, Geneva
Figure 3. Comparison of Mean rank in two groups 1997

5
2017 Computer Science and Software Engineering (CSSE)

[3] Kasper Hornbæk, "Current practice in measuring usability: Challenges to [13] Wolfgang Kluth, Karl-Heinz Krempels, and Christian Samsel,
usability studies and research", International Journal of Human-Computer "Automated Usability Testing for Mobile Applications", 10th
Studies, 2006 International Conference on Web Information Systems and Technologies,
[4] Azham Hussain, Maria Kutar, "Usability Metric Framework for Mobile 2014
Phone Applications", The 10th Annual PostGraduate Symposium on The [14] Artur H. Kronbauer, Celso A. S. Santos, Vaninha Vieira, "Smartphone
Convergence of Telecommunications, 2009 Applications Usability Evaluation: A Hybrid Model and its
[5] Chun Wah JO, "Usability metrics for application on mobile phone", Implementation", 4th international conference on Human-Centered
Dissertation, 2005 Software Engineering, 2013
[6] D. Zhang and B. Adipat, "Challenges, Methodologies, and Issues in the [15] Florian Lettner and Clemens Holzmann, "Usability Evaluation
Usability Testing of Mobile Applications," International Journal of Framework Automated Interface Analysis for Android Applications",
Human-Computer Interaction, 2005 international Conference of Computer Aided Systems Theory, 2011
[7] Ashraf Saleh, Roesnita Binti Isamil, Norasikin Binti Fabil, "Extension of [16] Babita Shivade, Meena Sharma, "Usability Analyzer Tool: A Usability
PACMAD Model For Usability Evaluation Metrics Using Goal Question Evaluation Tool For Android Based Mobile Application“, International
Metrics (Gqm) Approach", Journal of Theoretical and Applied Journal of Emerging Trends & Technology in Computer Science, 2014
Information Technology, 2005 [17] Enlie Wang, Barrett Caldwell, "An empirical study of usability testing:
[8] Tovi Grossman, George Fitzmaurice, " ToolClips: An investigation of heuristic evaluation VS. user testing", 2002
contextual video assistance for functionality understanding", 28th [18] Barbara A. Kitchenham, Shari Lawrence Pfleeger, Lesley M. Pickard,
International Conference on Human Factors in Computing Systems, 2010 Peter W. Jones, David C. Hoaglin, Khaled El Emam, Jarrett Rosenberg,
[9] Knabe, K. (1995). Apple guide: a case study in useraided design of online "Preliminary Guidelines for Empirical Research in Software
help. ACM CHI. 286-287. Engineering", IEEE Transactions on Software Engineering, 2002,
[10] Plaisant, C. and Shneiderman, B. (2005). Show Me! Guidelines for [19] Steve Easterbrook, "Empirical Research Method in Requirements
Producing Recorded Demonstrations. Visual Languages and Human- Engineering", Turorial T1, 15th IEEE International Requirements
Centric Comp. 171-178. Engineering Conference, India, 15-19 Octobr 2007.
[11] Jeongyun Heo, Dong-Han Ham, Sanghyun Park, Chiwon Song, Wan Chul [20] Xiaoxiao Ma, Bo Yan, Guanling Chen, Chunhui Zhang, Ke Huang, and
Yoon, "A framework for evaluating the usability of mobile phones based Jill Drur, "A Toolkit for Usability Testing of Mobile Applications", Third
on multi-level, hierarchical model of usability factors", journal of International Conference, MobiCASE 2011
Interacting with Computers, 2009 [21] Gholamreza Jandaghi, "Which statistical test should we choose? ",
Culture of management- No. 6- Summer and fall 2004, in persian
[12] Philip Kortum, Mary Sorber, "Measuring the Usability of Mobile
Applications for Phones and Tablets", International Journal of Human-
Computer Interaction, 2015

You might also like