You are on page 1of 8

Automated Logging of Mobile Phones Failures Data

Paolo Ascione Marcello Cinque, Domenico Cotroneo

Laboratorio ITEM Dipartimento di Informatica


Consorzio Interuniversitario e Sistemistica - Università
Nazionale per L’Informatica degli Studi di Napoli Federico II
Via Diocleziano 328, 80124 Naples, Italy Via Claudio 21, 80125 Naples, Italy
pascione@napoli.consorzio-cini.it {macinque, cotroneo}@unina.it

Abstract banking, ticket booking, and e-trading. Depend-


ability will become even more critical as the
The increasing complexity of mobile phones di- use of mobile phones is hypothesized in critical
rectly affects their reliability, while the user toler- application scenarios, e.g., robot control [12, 9],
ance for failures becomes to decrease, especially traffic control [1] and telemedicine [3]. Finally,
when the phone is used for business- or mission- the possibility of exchanging data increases the
critical applications. Despite these concerns, there smart phone’s sensitivity to malicious attacks. One
is still little understanding on how and why these example is the recently reported virus for mobile
devices fail and no techniques have been defined to phones, called Cabir and affecting smart phones
gather useful information about failures manifesta- with the Symbian OS.
tion from the phone. This paper presents the design Despite these concerns, there is still little under-
of a logger application to collect failure-related in- standing of how and why mobile phones fail or
formation from mobile phones. Preliminary failure of the methods/techniques needed to gain such
data collected from real-world mobile phones con- understanding. A well established methodology to
firm the proposed logger is a useful instrument to evaluate the dependability of operational systems
gain knowledge about mobile phone failure’s dy- and to identify its dependability bottlenecks is
namics and causes. represented by field failure data analysis [8] (see
section 2.2). However, today’s smart phones do
not offer any mean to detect failure manifestations
1 Introduction and collect failure-related information.
This paper offers a solution to this problem by
proposing an automated failure data logger for
Today’s market demand for enhanced mobile
smart phones. In order to understand which fail-
and embedded devices, such as smart phones
ures are to be detected, we first define, in section 3,
and PDAs, is causing these devices to become
a high level failure model for mobile phones based
increasingly complex. While innovative features
on everyday user’s experiences. Then, in section
are attractive and meet customer demands, the race
4, we detail the design of a heartbeat technique
toward innovation increases the risk of delivering
to detect the most important failure occurrences.
less reliable devices, since new mobile phones are
Upon failure detection, the logger gathers many
often put on the market without comprehensive
useful information, such as the phone’s activity,
testing. As a result, more and more users’ com-
the list of running applications, and error condi-
plaints on mobile phone failures can be read on
tions signaled by system/application modules. The
several web forums dedicated to mobile phones.
technique has been implemented over Symbian OS
These failures range from simple value failures to
smart phones. We chosen the Symbian OS because
freeze and self-reboot.
of i) its open programmability features with C++
Although users complaints do not represent a
and Java programming languages, and ii) its wide
major issue, dependability of mobile phones
spread use at the time of writing. The needed
becomes an important concern as novel critical-
Symbian OS background is given in section 2.1.
data-driven applications are being introduced
The logger has been deployed on 16 phones from
on the mobile phone market, e.g, phone-based

Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
Symbian Application (or server)
tive scheduler. In other terms, AOs cooperatively
Non-preemtive scheduling,
Event driven synchronization
multitask using an event-driven model: when an
mechanisms AO requests a service, it leaves the execution to an-
other AO. When the requested service completes,
Active
it generates an event that is detected by the ac-
Scheduler
Kernel tive scheduler, which in turn inserts the requesting
space
Thread Scheduler AO in the queue of the AOs to be activated. Non-
Time-sharing preemption was chosen to meet light-weight con-
Preemptive
Thread Active Object
Priority based scheduler straints, avoiding synchronization primitives such
as mutexes or semaphores. Moreover, AOs belong-
ing to the same thread all run in the same address
Figure 1. Symbian multitasking model space, so that a switch between AOs incurs a lower
overhead than a thread context switch. AOs non-
preemption characteristics make them not suitable
September 2005. Since phone’s failures are not for real-time tasks. On Symbian OS, real-time
frequent events, the data collected so far does not tasks should be rather implemented using threads
have the statistical significance needed to draw directly. The whole design constitute a good com-
conclusions. Nevertheless, looking into portions promise between real-time and light-weight design
of the log files produced by the logger, in section requirements.
5 we show the logger capabilities to detect failure A crucial aspect of interest for our activity is rep-
occurrences and to relate them with the state of the resented by panics. In the Symbian OS world, a
device. The logged data also allows to pinpoint panic represents a non-recoverable error condition
dependability bottlenecks and failures root causes, notified to the Kernel by either user or system ap-
to re-build failure dynamics, and to measure their plications. The panic information associated with
temporal characteristics. the event is a record composed of its category and
type. Once this event has been notified, the appli-
cation is killed by the kernel. As for panics no-
tified by system servers, the kernel might decide
2 Background and Related Research to reboot the phone to recover them, based on the
panic’s severity.
2.1 Symbian OS fundamentals
2.2 Related Research
Symbian [7] is a light-weight operating system
designed for mobile phones and carried out by The field failure data analysis of operating sys-
several leader mobile phone’s manufacturers. It tems is a well established research area. Exam-
is based on a hard real-time, multithreaded ker- ples are analysis of Windows NT [8, 16], Windows
nel that is designed according to the microker- 2000 [14], and Linux [6, 13]. Other studies char-
nel approach. Specifically, the microkernel pro- acterized failures of networked systems, network
vides simple, supervisor-mode threads, along with of workstations [15] and more recently, large-scale
their scheduling and synchronization operations. heterogeneous server environments [11]. Less
Moreover, the kernel offers basic abstractions, i.e., work has been profused in the field of mobile
address spaces and message passing interprocess distributed systems. An architecture for gather-
communication. All system services are provided ing and analyze failure data for the Bluetooth dis-
by server applications. Clients access servers using tributed systems has been proposed in [5], whereas
message passing kernel’s mechanisms. Examples in [10], data collection and processing for a cellular
of servers are the File Server, for files’ manage- telecommunication system have been addressed.
ment, the Window Server, for user interface draw- All these works exploit failure information stored
ing, and the Message Server for the Short Message into system event logs, automated reportings, or
Service (SMS) management. failure reports provided by specialized mainte-
The Symbian OS defines two levels of multitask- nance staff. In the case of smart phones devices,
ing: threads and Active Objects (AOs). Threads logging facilities are limited and not fully exploited
are scheduled by the OS thread scheduler, which is yet. In particular, the Symbian OS offers a par-
a time-sharing, preemptive, priority-based sched- ticular server (the flogger) allowing an application
uler. Moving up a level, multiple AOs run within to log its information. Yet, to access the logged
a thread (see figure 1). They are scheduled by a data of a generic X system/application module it
non-preemptive, event-driven scheduler, called ac- is necessary to create a particular directory, with

Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
a well defined name (e.g. Xdir). The problem is Although this classification is defined according to
that the names of such directories are not made a high level of abstraction, it represents an impor-
publicly available to developers, and are used by tant initial step towards the definition of the logger.
manufacturers during the development. Recently, a It is indeed necessary to know the nature of the fail-
tool called D EXC1 has been proposed to register ures in order to detect them and to relate their oc-
all panic events generated on a phone. However, currences with other information available on the
the tool does not relate panic events with failure phone. More insights about failure dynamics and
manifestations, running applications, and phone’s causes can be gained once a significant amount of
activity at the time of the failure. failure data will be available. As shown later, in
section 5, the logger represents a valid instrument
to collect such failure data.
3 Smart Phones’ Failure Model

We propose a high level classification of failures


4 Automated Failure Data Logging
based on the only freely available data source on
mobile phone failures: the users. Thus, we found 4.1 Assumptions and Goals
several web forums2 where mobile phone users
post information on their experiences using differ- We concentrate on freeze and self-reboot fail-
ent devices. The posted information has been care- ures, since they are easy to detect yet severe failure
fully studied in order to consider only those posts manifestations. Some unstable behavior failures,
that signal a failure of the device. In the following, such as repeated self-reboots, can be captured as
we categorize such failures into five classes. well. As for input and output failures, we do not
pay attention to them for their less important sever-
• Freeze: the device delivers a constant output, ity and for the fact that the automatic detection of
and it does not respond to the user’s input. value failures would require the implementation of
The only way to restore proper operation is to a perfect observer which has a complete knowledge
pull out the battery from the phone. This kind of the system specification [4].
of failure is also know as halting failure [2]. The main objective of the logger is to detect and
record the occurrences of freezes and self-reboots.
• Self-reboot: the phone reboots itself, and no Other than this, it is important to catch the status
output is delivered at all during the reboot. It of the phone during the failure. For example, let
is also called silent failure [2]. As already us assume that a phone freezes when a text mes-
mentioned in section 2.1, on Symbian OS sage is received. Probably, detecting the freeze is
smart phones this might be caused by unre- not enough, but it is desirable to answer questions
coverable and high severe system panics. like: 1) are we able to know that a text message
was being received? 2) Do we know whether some
• Unstable behavior, or erratic failure [2]: the user/system module failed? And 3) are we con-
device exhibits erratic behavior without any scious of the other running applications running
input inserted by the user. Examples are con- during the failure (which may cause interferences)?
tinuous self-reboots, and self-activation of ap-
plications or modes of operation. 4.2 Logger High Level Architecture
• Output failure: The device delivers an out- Based on the previous considerations, we de-
put in response to a certain input that devi- signed a logger application as a set of AOs, each
ates from the expected one. This is also called one responsible of a particular task. The logger ar-
value failure [4]. Examples reported by users chitecture is shown in figure 2. Each AO interacts
include inaccuracy in charge indicator, ring or with a particular OS server to perform its task, and
music volume different from the set one, and all AOs use File Server facilities to store their data.
event reminders going off at wrong times. The logger is conceived as a daemon application
that starts at the phone start-up and that executes
• Input failure, or omission value failure [4]:
in background. The AOs building the logger are
User inputs have no effect on device behav-
detailed in the following.
ior, e.g. soft keys do not work.
1D
• Heartbeat: it implements the technique to
EXC is a Symbian project, available at
http://www.symbian.com/developer/downloads/tools.html
detect both freezes and self-reboots. More
2 We considered www.howardforums.com cellphonefo- details about the heartbeat technique can be
rums.net, www.phonescoop.com, and www.mobiledia.com. found in the next subsection.

Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
Files answers the questions posed in previous section.
Log
beats activity File runapp power
Question 1 is answered by asking the phone’s ac-
tivity to the Log Engine. The panic notifications
captured by the Panic Detection allow to identify
Kernel Db Log File
Appl. System user/system modules responsible for a self-reboot
Arch. Agent
System
or freeze, thus answering question 2. Finally, ques-
Servers tion 3 is answered by means of the Running Appli-
cation Detector.

4.3 Detecting Freezes and Self-


Panic Log HeartBeat Running Power reboots
Detector Engine App Manager
Detector

Freezes and self-reboot detection is accom-


Logger plished by means of the heartbeat technique. This
Application
ACTIVE SCHEDULER is a well known approach for crash detection. The
Heartbeat AO periodically writes a heartbeat item
Figure 2. Overall architecture on the beats file. The item is composed of a time-
stamp and a status info, i.e. ALIVE, REBOOT,
MAOFF, and LOWBT. During normal execution,
• Running Applications Detector: this AO pe- the Heartbeat writes an ALIVE item. When a shut-
riodically stores on the runapp file the list down is performed either by the user or automati-
of IDs of all the applications running on the cally undertaken by the kernel, the Heartbeat writes
phone. The list is obtained by requesting it to a REBOOT item, since it is capable to capture the
the Application Architecture Server. phone shutdown event. It is worth to mention that
when the phone is rebooted the OS leaves a certain
• Log Engine: it is responsible to collect the time to applications to complete their tasks. This
smart phone activity (e.g. calls, messages, and time is sufficient for the Heartbeat to write the RE-
browsing). The information is gathered from BOOT item. When the user deliberately turns off
the Database Log Server, and it is stored into the logger application, a MAOFF (Manual OFF)
the activity file. item is written. This helps to distinguish manual
shut downs of the logger from freezes. Finally, if
• Power Manager: it provides information
a shutdown is due to low battery (the battery sta-
about the battery status, in order to distinguish
tus is requested to the Power Manager), a LOWBT
self-reboots due to failures from those due to
(LOW BaTtery) item is written.
low battery. The battery status is gathered
When the phone is turned on and the logger starts,
from the System Agent Server, and it is stored
the Panic Detector checks the last written item by
into the power file.
the Heartbeat. When an ALIVE is found, the phone
• Panic Detector: collecting panics as soon as has been shut down by pulling out the battery. In
they are launched is one of the main objectives all other cases (i.e., a shutdown due to low battery,
of the logger. In order to gather panics and the user, or kernel) the Heartbeat would have written
related information (e.g. panic category and REBOOT or LOWBT. This means that the phone
type), the Panic Detector exploits the services was frozen, coherently with the fact that pulling out
provided by the RDebug object, offered by the the battery is the only reasonable user-initiated re-
Symbian OS Kernel Server. covery action for a freeze. Therefore, a freeze is
registered by the Panic Detector, along with the in-
Other than collecting panics, the Panic Detector is formation gathered by the Log Engine and the Run-
also responsible of putting all the information pro- ning Applications Detector.
duced by the other components together into one On the other hand, a REBOOT can be found for
Log File. This operation is performed either when three reasons. First, the phone rebooted itself. Sec-
a panic is detected or when the logger applica- ond, it was rebooted by the user to recover a fail-
tion starts (i.e., when the phone starts). A draw- ure (e.g., output failure). Third, it was regularly
back of the logger is that it cannot store data about shut down. Hence, the problem of distinguishing
File Server’s failures. Nevertheless, this cannot be this three cases arises. Unfortunately, we are not
avoided in that there is no way to permanently store able to systematically distinguish phone induced
any information when the File Server fails. reboots from manual ones, because the generated
It is worth noting that the proposed architecture event i.e., the one captured by the Heartbeat AO,

Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
is the same in both the cases. However, they can Uncertainty Zones

be distinguished by looking at the off time of the

High
Medium

Low
phone, or reboot duration. It is reasonable to state:

TSR ≤ TM R < TSH

where TSR is the duration of a self reboot, TM R


is the duration of a manual reboot, and TSH is the
duration of a regular shutdown. In other terms, the
duration of a shutdown (e.g., when the phone is
shut down over the night) is greater than the du-
ration of a user initiated reboot, which is in turn
greater than or equal to the duration of a self-
reboot. A manual reboot requires the user to press
the on button, which generally requires more time
than a self-reboot. The Panic Detector registers a
self-reboot event and its duration. This way, the Figure 3. Δw as a function of fh , with
reboot duration can be analyzed a posteriori. It re- different application workloads (up-
mains hard to distinguish manual reboots and self- per plot), and I as a function of fh
reboots, but, in case of self-reboot, the Panic De- (lower plot)
tector also registers the panic which caused the re-
boot. This is a further way to distinguish between
the two cases.
respect to different application workloads running
on the phone: SMS, phone call, video call, listen-
4.4 Heartbeat Frequency
ing of an audio clip, and Bluetooth file transfer. We
also performed measurements with an “idle work-
The hertbeat frequency fh is a logger’s crucial
load”, i.e., when the phone is in stand-by mode.
parameter since it determines the time granularity,
The measurements were performed on two differ-
or uncertainty, at which the above mentioned dura-
ent Symbian smart phones: Nokia 6630 and Mo-
tions, i.e., TSR , TM R , and TSH , can be measured.
torola A1000.
It would be desirable to choose an arbitrary big fh
As far as Δw measurements are concerned, we run
to decrease the uncertainty of the measurements.
the logger concurrently with one of the mentioned
However, this is not possible for two practical rea-
workloads, for each fixed fh . As an effect of the
sons: i) the battery consumption induced by the
writing delay, the timestamps on the beats file are
logger increases as fh increases, and ii) the heart-
written with a period Th + Δw . Hence, from the
beat precision decreases as fh increases, as will be
timestamps we can evaluate the average Δw .
shown later by our experiment. The heartbeat pre-
As for I measurements, we connected the phones’
cision can be defined as:
power supply pins with a stabilized voltage genera-
precision = Th
, Th = 1 tor (set to 3.7 V , i.e., the nominal voltage provided
Th +Δw fh
by phone’s batteries). We also connected an am-
where Δw is the write delay induced by the File perometer in series with the supply circuit in order
Server. In other terms, once fixed a heartbeat fre- to measure the average I with different fh . For
quency fh , and thus a heartbeat period Th , the each fh , I has been measured for ten minutes. Be-
heartbeat items will not be written exactly each Th , ing the current absorbed by the logger low as com-
but they will be written each Th plus the time Δw pared with the application workloads, we adopted
needed to invoke the File Server, transfer to it the the idle workload to perform the current measure-
information to write, access the file, and actually ments.
write it. The greater is fh , the smaller is the pre- Figure 3 shows the results of the experiment. As
cision, because as Th decreases, it becomes com- we expect, both the average Δw and the average
parable with Δw . Moreover, as Th decreases, Δw I are increasing functions of fh , independently
increases because the File Server starts to be over- from the workload. With the idle workload, we
loaded with requests. As for the battery consump- experienced that fh = 2 Hz (Th = 0.5 s.) is
tion, the greater is fh , the greater is the absorbed a physical upper bound after which Δw starts to
current I. This is confirmed by our experiment. increase almost exponentially. For this reason,
We evaluated the average Δw and the average I as the experiments shown in figure 3 have been run
a function of fh . The average Δw is evaluated with with fh ranging from 0.1 Hz to 2 Hz (for the cur-

Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
Smartphone under
observation
Gateway
Workstation
Database
Server size (20 KB in the current implementation). When
(GW) the user is ready, the midlet can be used to send
the Log File to the tier 2, the Gateway Worksta-
tion (GW), via a Bluetooth connection. However,
logger, midlet database
GW software if the user’s phone or GW does not provide Blue-
tooth connection facilities, he or she can avoid to
use the midlet and can transfer the file via the serial
Figure 4. Distributed Data Collection cable usually used to synchronize the phone with a
Architecture computer.
The GW (tier 2) is a user’s computer connected
to the Internet. It runs our software to receive
the Log File via Bluetooth, and to send it to our
rent, fh starts from 0.05 Hz). The figure shows Database node (tier 3) using the Internet. To this
that the most critical application is the video call. aim, the user must authenticate himself/herself to
This could be expected as the video call use a the Database node. Again, if Bluetooth connec-
wide range of phone’s resources. From figure 3 tions are not available, the GW allows the user to
one could conclude that the best choice to meet select the Log File to send from his/her computer’s
precision and battery consumption requirements file system.
would be fh = 0.1 Hz or even lower. On the Finally, the tier 3 stores the received files on a cen-
other hand, low frequencies increase the uncer- tralized database, after checking the Log File for-
tainty. For example, fh = 0.1 Hz introduces an mat. The data collected on the database can then
uncertainty of 10 seconds on the measured TSR , be used to perform the field failure data analysis.
TM R , and TSH , which makes it hard to distinguish
one from the other. For this reason, on figure 3
we draw three qualitative uncertainty zones: high 5 Preliminary Results: Logger Capa-
uncertainty (fh ≤ 0.1 Hz), medium uncertainty bilities
(0.1 Hz < fh ≤ 0.33 Hz), and low uncertainty
(fh > 0.33 Hz). For the logger we deployed so far The logger has been implemented for
on actual phones, we chose fh = 0.33 Hz, in the several Symbian OS phones with differ-
medium uncertainty zone, since it represents an ac- ent user interfaces and APIs. It can be
ceptable trade-off between uncertainty (3 s.), pre- downloaded from the project’s web site:
cision (in the worst case, the average Δw is 28 ms., http://www.mobilab.unina.it/symdep.htm, along
hence the precision is 0.99), and battery consump- with the java midlet and the GW software for data
tion (the average I is 6.6 mA, hence, being for collection. The logger has a low memory footprint
example the battery capacity of the Nokia 6630 (16.1 KB) and, as for the files it produces, they
equals to 900 mAh, the stand-by time with the log- occupy at most 30 KB on the phone internal
ger running on the phone would be 136 hours, that memory.
is almost 6 days. This is acceptable if we consider At the time of writing, the logger was running
that the manufacturer declare a stand-by time from on 16 phones from September 2005. However,
6 to 11 days for the Nokia 6630). the data collected so far (i.e., about 100 failure
points) are not enough to achieve the needed
4.5 Distributed Data Collection statistical significance to perform dependability
Architecture measurements. More time, and more phones are
needed to achieve this goal. Therefore, in this
The logger has been deployed on actual phones section we prefer to focus on some significant
used by students, faculty and staff of our Univer- portions of the collected Log Files in order to show
sity. In order to allow them to easily transfer to the logger’s capabilities.
us their Log Files, without requiring them to spend The first Log File portion shown in figure 5 comes
money (e.g., by avoiding the use of SMS services from a Nokia 6680 device. Entries in the log
or data connections), we developed a data collec- are not timely ordered. This is due to the fact
tion architecture. The architecture is structured ac- that the Panic Detector starts to log from either a
cording to a 3-tier model (see figure 4). panic indication or a FREEZE or PWOFF event,
The first tier is the phone. In particular, we devel- and then it gathers and writes on the Log File all
oped a Java midlet for the phone using the Java 2 the related information from the other files (i.e.,
Micro Edition technology. The logger requests the activity and runapps), which could be registered
user to send the Log File when it reaches a certain before the panic (or FREEZE or PWOFF) event.

Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
Portion 1
19/10/2005 12:12:17
Category: KERN-EXEC
Panic Type: 3
19/10/2005 12:12:09 SysAp ScreenSaver Telefono Menu Messaggi Autolock SymDep
19/10/2005 12:12:15 Short message
19/10/2005 12:15:52 FREEZE Days 0 Hours 0 Mins 3 Secs 15
19/10/2005 12:12:09 SysAp ScreenSaver Telefono Menu Messaggi Autolock SymDep
19/10/2005 12:12:22 SysAp ScreenSaver SymDep

Portion 2
01/10/2002 15:36:51
Category: irSec
Panic Type: 666
01/10/2002 15:36:49 Total irRemote SysAp ScreenSaver Telefono Menu Autolock
SymDep
01/10/2005 15:41:03
Category: irSec
Panic Type: 666
Category: KERN-EXEC
Panic Type: 3
01/10/2002 15:40:52 SysAp ScreenSaver Telefono Menu Orologio Autolock SymDep
01/10/2005 15:42:24 PWOFF Days 0 Hours 0 Mins 1 Secs 16
01/10/2002 15:40:52 SysAp ScreenSaver Telefono Menu Orologio Autolock SymDep
01/10/2005 15:41:04 SysAp ScreenSaver SymDep

Figure 5. Two sample portions of a Log File

The portion reported in the figure is an example only 3 applications were active. We can thus argue
of a freeze. The log allows us to pinpoint the that one of the absent applications provoked the
Messages service (Messaggi in Italian) to be panic. Aided by the information provided by the
responsible of the panic, and thus, of the failure. Log Engine, we can conclude that the responsible
At the 12:12:17 of October 19, 2005, the logger application was the Messages service.
captured a type 3, Kern-Exec panic. Browsing the We can rebuild the failure dynamic as follows:
list of panic types and categories on the Symbian at 12:12:15 a short message is sent or received.
OS web site, we can obtain more details about the This causes the Messages service to fail and to
panic: “This panic is raised when an unhandled signal a type 3, Kern-Exec panic. The OS recovery
exception occurs. Exceptions have many causes, mechanism killed the failing service and caused a
but the most common are access violations caused, chain of events that brought the phone to freeze.
for example, by dereferencing NULL. Among The second portion in figure 5 is a further example
other possible causes are: general protection of the type of information that can be captured. It
faults, executing an invalid instruction, alignment shows a self-reboot caused by a third-party appli-
checks, etc.”. A few seconds before the panic, the cation, called irRemote, used to turn the phone into
Running Application Detector registered seven a universal, infra-red remote control. Data comes
active applications (SymDep is our logger). One from a Nokia 6600 phone. At 15:36:51, a type
of those applications signaled the panic which in 666, irSec panic is detected. This panic is specific
turn caused the phone to freeze. A FREEZE entry of the irRemote application, that is thus killed
is logged by the Panic Detector when the phone by the kernel. After a few minutes, at 15:41:03,
finished the reboot, at the 12:15:52; the reboot another irSec panic is signaled. This indicates
took several minutes, specifically 3 minutes and that, after the first failure, the irRemote application
15 seconds (TSR = 210s.), probably due to the has been launched again by the user. The second
fact that the user had to pull out the battery. By time, however, the failure is more severe, and also
subtracting TSR from 12:15:52, we infer that the causes a type 3, Kern-Exec panic, perhaps signaled
last ALIVE item was written at 12:12:37, i.e., 20 by a service used by irRemote. 6 seconds after
seconds after the panic. This could be the time the panic, at 15:41:08, the phone reboots itself
needed by the user to realize the phone were frozen (the reboot took 1 minutes and 16 seconds), as
and to decide to pull out the battery. Moreover, can be observed looking at the PWOFF entry. In
the Log Engine registered that 2 seconds before other terms, this time the OS recovery mechanisms
the panic, a short message was sent or received. killed irRemote and successfully performed a self-
Another interesting information is given by the last reboot in response to the unrecoverable Kern-Exec
line, which shows that 5 seconds after the panic panic.

Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE
6 Conclusions and Future Work [3] A. A. Aziz and R. Besar. Application of Mobile
Phone in Medical Image Transmission. Proc. of
This paper presented the design of a logger ap- the 4th National Conference on Telecommunication
plication to automatically gather failure-related in- Technology, January 2003.
[4] A. Bondavalli and L. Simoncini. Failures Classifi-
formation from Symbian-OS-based smart phones. cation with Respect to Detection. Proc. of the 2nd
Even if tailored for the Symbian OS, the described IEEE Workshop on Future Trends in Distributed
technique can also be adopted on other platforms. Computing Systems, 1990.
Although the volume of data collected so far are [5] M. Cinque, F. Cornevilli, D. Cotroneo, and
not enough to achieve statistical significance, the S. Russo. An Automated Distributed Infrastructure
examples discussed in section 5 allow us to draw for Collecting Bluetooth Field Failure Data. to ap-
pear in Proc. of the 8th IEEE International Sym-
the following conclusions about the logger capa-
posium on Object-oriented Real-time distributed
bilities:
Computing (ISORC’05), May 2005.
• The logger enables the definition of a detailed [6] W. Gu, Z. Kalbarczyk, R. K. Iyer, and Z. Yang.
Characterization of Linux Kernel Behavior under
failure model for mobile phones, in that it al-
Errors. Proc. of the 2003 International Conference
lows to pinpoint causes, i.e., panics, that lead
on Dependable Systems and Networks (DSN’03),
to failure manifestations, and to rebuild fail- June 2003.
ures dynamics. [7] R. Harrison. Symbian OS C++ for Mobile Phones
Volume 2. Symbian Press, 2004.
• Since all failure manifestations come with a [8] R. K. Iyer, Z. Kalbarczyk, and M. Kalyanakrish-
timestamp, the logger makes it possible to nam. Measurement-Based Analysis of Networked
quantify the dependability of current smart System Availability. Performance Evaluation Ori-
phones, in terms of Mean Time Between Fail- gins and Directions, Ed. G. Haring, Ch. Linde-
ures (MTBF) and Mean Time To Recover mann, M. Reiser, Lecture Notes in Computer Sci-
(MTTR). More in detail, parameters such as ence 1769, Springer Verlag, 2000.
[9] T. Kubik and M. Sugisaka. Use of a Cellular Phone
the mean time between freezes or reboot, and in mobile robot voice control. Proc. of the 40th
the propagation time between causes (panics) SICE Annual Conference, July 2001.
and failures could also be measured. [10] S. M. Matz, L. G. Votta, and M. Malkawi. Analysis
of Failure Recovery Rates in a Wireless Telecom-
• The logger allows to pinpoint application or munication System. Proc. of the 2002 International
servers responsible for failures. In other Conference on Dependable Systems and Networks
terms, it allows to identify dependability bot- (DSN’02), June 2002.
tlenecks. [11] R. K. Sahoo, A. Sivasubramaniam, M. S. Squil-
lante, and Y. Zhang. Failure Data Analysis of
Future work will be devoted to the deployment of a Large-Scale Heterogeneous Server Environment.
the logger over more terminals and to the analysis Proc. of the 2004 International Conference on De-
of the collected failure data, in order to fully pendable Systems and Networks (DSN’04), June
exploit the logger capabilities and define a detailed 2004.
[12] A. Sekman, A. B. Koku, and S. Z. Sabatto. Hu-
failure model for smart phones.
man Robot Interaction via Cellular Phones. Proc.
of the 2003 IEEE Int. Conf. on Systems, Man and
Acknowledgments Cybernetics, October 2003.
This work has been partially supported by the fund for mo- [13] C. Simache and M. Kaâniche. Measurement-Based
bility of researchers, sponsored by the University of Naples
Availability Analysis of Unix Systems in a Dis-
Federico II - Ufficio Programmi Internazionali, and by the Ital-
ian Ministry for Education, University, and Research (MIUR) tributed Environment. Proc. of the 12th Inter-
in the framework of the FIRB Project “Middleware for ad- national Symposium on Software Reliability Engi-
vanced services over large-scale, wired-wireless distributed sys- neering (ISSRE’01), November 2001.
tems (WEB-MINDS)”. [14] C. Simache, M. Kaâniche, and A. Saidane. Event
Log based Dependability Analysis of Windows NT
and 2K Systems. Proc. of the 2002 Pacific Rim In-
References ternational Symposium on Dependable Computing
(PRDC’02), December 2002.
[1] V. Astarita and M. Florian. The use of Mobile [15] A. Thakur and R. K. Iyer. Analyze-NOW - An En-
Phones in Traffic Management and Control. Proc. vironment for Collection and Analysis of Failures
of the 2001 IEEE Intelligent Transportation Sys- in a Network of Workstations. IEEE Transactions
tems Conference, August 2001. on Reliability, 45(4):560–570, 1996.
[2] A. Avizienis, J. Laprie, B. Randell, and [16] J. Xu, Z. Kalbarczyc, and R. K. Iyer. Networked
C. Landwehr. Basic Concepts and Taxonomy Windows NT System Field Data Analysis. Proc.
of Dependable and Secure Computing. IEEE of the 1999 Pacific Rim International Symposium
Transactions on Dependable and Secure Comput- on Dependable Computing (PRDC’99), December
ing, 1(1):11–33, 2004. 1999.

Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing
0-7695-2561-X/06 $20.00 © 2006 IEEE