You are on page 1of 7

Towards Secure E-Services:

Risk Analysis of a Home Automation Service

Almut Herzog, Nahid Shahmehri


Department of Computer and Information Science, Linköpings universitet
{almhe, nahsh}@ida.liu.se

Abstract data communication between one or more devices in the


This paper deals with the assessment of threats and house and a service provider company. An e-service could
vulnerabilities of service software that targets the home e.g. be used by a health care provider to remotely check on
automation market. Specifically, the investigated service is network-enabled medical equipment of patients at home.
used as a low-cost alarm system that can notify its end An energy company could use an e-service for automated
users of alarms by way of Internet technology. The service meter reading.
uses a given e-home infrastructure, an Ericsson-developed From the application areas mentioned here it is easily
commercial system that builds on the OSGi platform for seen that security is an issue:
electronic services for the home market.
We use the methodology of fault-tree analysis to • No one but authorised people should be able to read
explore causes of events that could damage the user trust or reset the medical equipment mentioned above; or –
in the service. The purpose of this work is to raise the more generally speaking– only authorised users shall
security awareness of the software engineers developing be allowed to configure and use a service.
this service as well as identifying the amount of trust this • If services shall be usable, they must be available. A
particular service has to put on its underlying service that is crashed and down is of no use to the
infrastructure that it is obliged to use. This work is the user and will only stand slight chances on the market.
starting point for working on the improvement of security • A service must not –accidentally or intentionally–
of this infrastructure. open up the private home net for intruders. It must be
designed to guard the privacy of the end users.
Keywords: E-Service; Security; Home Automation;
Residential Gateway; Risk Analysis; Risk Analysis The e-service Monitor and Control that we subject to a
Methodologies; Fault-Tree Analysis. risk analysis allows residents to check on their home:
There is the functionality of remotely switching on or off
1. Introduction electrical devices. However, the main idea is to act as a
low cost alarm system that will notify residents if an alarm
Internet connections for private users are becoming occurs in the house.
cheaper and faster. With such connections for private As this is an especially security sensitive area, there
homes or apartments, the Internet has more to offer than was a need to do an in-depth risk analysis that would
web surfing, e-mailing, news groups or exchange of MP3 uncover the weak points of this particular service, sensitise
files. the developers and make clear the security assumptions
E-services (electronic services) are the next frontier. An that this service makes on its underlying infrastructure(s).
e-service can be any kind of software that offers a service This last point makes our risk analysis relevant even for
to its end users by using Internet connectivity. It relies on a other e-home implementations. Our findings of e-service
permanent Internet connection and uses network-enabled dependability on e.g. electricity, bandwidth and the user’s
smart devices as its endpoints and points of user security awareness are directly applicable to other home
interaction. market e-services.
In this paper, we focus on e-services for the home In the following sections, we describe more deeply the
market and start by giving a few examples of home investigated e-service and the infrastructure it builds on.
oriented e-services. An e-service allows e.g. home owners We also give an introduction to existing risk analysis
or tenants to remotely control their house or apartment by methodologies and a motivation of our choice of fault-tree
checking the alarm system or peeking into the refrigerator analysis. In the main part, we will describe the threats and
through a Web interface. More advanced services offer vulnerabilities we found, especially those of general
interest, explore countermeasures and finally relate the logging, configuration and user management as well as the
overall findings to our future work. storage of user preferences.
All home net nodes that participate in an e-service are
2. Background assumed to run OSGi compliant software. However, JES
with the OSGi framework on the residential gateway is the
In this section, we give more details on the network and access point for any e-service in the home.
software infrastructure that is used by our e-service. We In Ericsson’s implementation, the residential gateway
start by describing the residential gateway architecture and and nodes on the home net are accessible by e-service
network connectivity and continue with a short clients from the Internet only after authorisation at the
introduction of the OSGi framework that enables system service provider site (the so-called e-service
e-services from different vendors to communicate. We centre). All users wishing to access a service on the
also describe the functionality of the analysed e-service residential gateway have thus to connect to the system
Monitor and Control. service provider first to authenticate. After a successful
authentication, they are then granted access to the desired
2.1 Residential Gateway service. Consequently, as shown in Figure 1, it is not
The e-service Monitor and Control, which we are possible for the end users to directly access a service on
investigating, runs on a so-called residential gateway (c.f. the residential gateway. The residential gateways are
Figure 1) that connects a private home to the Internet. The configured to reject all other service traffic than that from
residential gateway is basically a computer with two the system service provider, but they will allow Web
network cards that connects devices on the home net to the browsing and ftp download originating from the home net
without system service provider interference.
Internet. In addition, it runs a Java Embedded Server (JES)
[1], which implements the OSGi framework [2] for 2.2 E-Service Monitor and Control
e-service software. JES is a small server, quite similar to a
web server. It allows the connection of http clients, i.e. Monitor and Control is an e-service that runs its
Web browsers. These software on the
clients can download residential gateway only
servlet-generated html- Detector1 Detector2 DetectorN and not on other
pages that represent the network connected
Switch1 SwitchM
user interface of the devices on the home
available e-services. End User Residential
Residential Ser ial P ort net. Monitor and
End User Ga
Gateway
teway Base Station Control relies on an
With e-services
targeting the home Private Home external base station
market it is especially that connects to the
important to unify all serial port of the
developers under a E-S
E-Service
ervice
residential gateway (c.f.
common API so as to Center
Center Figure 1). This base
make services station receives signals
System Service from detectors (motion,
comparable and of Provider
uniform interface for fire, etc.) and can send
customers but also for Figure 1: Network Architecture of the signals to switches for
Monitor and Control Service turning electrical
allowing services to
interact with each other. The specification and reference devices on or off. The
implementation of such an API, based on Java interfaces base station uses a radio network for communication with
and classes, is the goal of the OSGi (Open Service its detector or switch clients. The fact that the base station
Gateway Initiative) [2], an organisation of software and is connected to the serial port is an implementation
telecom companies. artefact. To properly fit into the overall concept of
JES implements the OSGi API and thus allows for residential gateway the base station should rather be
version management and resolution of interdependencies connected to the home network with a network interface
of e-services. An e-service consists of one or more and run JES. This would have been the case, if such a base
bundles. A bundle is a jar-file with servlets, their libraries, station had existed at the time of development.
html pages, image files, and a description file of its As mentioned earlier, Monitor and Control has two
dependencies. Through the OSGi framework, bundles can functions: (1) notify the user of alarm situations in the
notify other bundles of services they advertise. A number house, and (2) switch electrical devices in the house
of security permissions are supported for giving bundles according to a schedule or remote user commands.
access to the administrative routines within the OSGi When an alarm situation is noticed by the detectors in
framework. In the next release, there will be support for the house, the detectors notify their base station which will
then communicate this alarm via the serial port to the The risk analysis we aimed at doing was to focus on
Monitor and Control e-service on the residential gateway. one e-service and not on the whole system. The goal was
Monitor and Control will then prepare to send messages to to explore causes for making this particular e-service fail
pre-configured telephone numbers. The actual dialling is and to investigate how much the reliability of Monitor and
not done by the residential gateway (as it is not connected Control depends on other system components. In addition,
to telephone lines); it will pass messages for the end users we wanted to examine if Monitor and Control could
on to the e-service centre where interfaces for sending compromise the e-service system as a whole by letting
SMS to mobile phones or mail to mail addresses exist. hackers take advantage of a security hole in the software
When the user wants to turn on a device at home, she of Monitor and Control.
must first authenticate at the system service provider side
to get access to JES on the residential gateway. Her web 3.1 Risk Analysis Methodologies
browser will then display the start page on the residential For our analysis, we considered the following
gateway (much like the index.html page of any web methodologies.
server) which contains links to the pages of all installed Hazards and Operability Analysis (HAZOP) is a
services. After browsing to the Monitor and Control page, component-based methodology from the chemical
she can then turn on or off configured switches, set a timer industry. HAZOP is greatly concerned with flows and tries
for switches or make them turn on in a semi-random to find causes for e.g. no flow, too much or too little flow,
fashion. partial or reverse flow between components of a system.
Configuration of Monitor and Control is also done These “chemical” criteria are directly translated into
through a web interface but with an administrator computer terms [4, 5] as flows of data.
authentication of the end user (not as plain user). It is then Another component-based method is Failure Modes
possible to modify the configuration of existing switches, and Effects Analysis (FMEA). It looks at individual
add or delete switches, and to modify the behaviour in components or functions of the system and investigates
alarm situations (e.g. which telephone numbers or mail their possible modes of failure. It then considers possible
addresses to notify). causes for each failure mode and estimates their likely
A small service component on the system service consequences. The effects of the failure are determined for
provider side makes sure that the residential gateway and this component or function and for the complete system.
the Monitor and Control service are alive. If the Countermeasures can be suggested [4].
connection is broken this is reported to the end user as an Event Tree Analysis (ETA) [6] assumes the failure of
alarm situation. This is one example that shows that one single component and explores which major events
e-services are distributed by nature: they run pieces of can result from this. This method is suggested for complex
communicating code on various platforms and locations. systems (typically nuclear power plants) where it is too
The Monitor and Control service is closely interwoven vast a task to identify a complete fault tree for the failure
with its underlying software and hardware infrastructure. It “accidental release of radioactivity”. It is handier to try to
must assume that the base station is on the serial port; it identify all consequences of the failure of one valve.
must be a Java bundle; it must communicate with other Component-based methods turned out to be too specific
network nodes using certain classes; it must communicate for our purpose. Such methods would not give us the
through the system service provider site. We will therefore overall picture but keep us busy identifying small
mention shortly some properties of the software platform, components and the impact of their possible failure. Using
especially of the OSGi platform. such methods in software project leads to an extensive
code inspection which would have been too time- and
3. Risk Analysis resource-consuming for our project.
Cause-Consequence Analysis (CCA) starts with a
A confidential security analysis for the complete critical event and determines the causes of the event (using
Ericsson e-service system has been done previously [3]. top-down or backward search) and the consequences that
The applied methodology was a proprietary method called could result from it (forward search, as in ETA) [6]. CCA
“process oriented security analysis” developed by results in a complex diagram that is not easily understood
AerotechTelub AB. Through interviews with project without a deep introduction into CCA.
managers and developers, a number of use cases, covering The more general way of doing a risk analysis is to
all parts of the infrastructure, were explored. To begin identify assets and their values, threats to these assets,
with, threats for each use case were investigated. The their vulnerabilities and then to suggest the most economic
threats were then ranked by impact and probability and protection [7, 8, 9]. This procedure aims more at the risk
finally countermeasures suggested. The focus was on the assessment for a whole company where employees,
vulnerabilities of the infrastructure, not on the impact of a offices, equipment, knowledge, reputation etc. are typical
specific service. assets that are threatened by floods, fires, sickness,
burglars, hackers, power failures, software crashes and so
on. This procedure is too broad for our purposes and does not depicted in the fault tree. If wanted they can be shown
not well focus on software. as a comment as in “Insert new bulb”. The diamond
“Internal error” means that this event is undeveloped. It
3.1.1 Fault Tree Analysis
can be split in more causes but it does not belong to the
After studying risk analysis methodologies it became scope of this fault tree to do it. We use this to mark the
clear that Fault Tree Analysis (FTA) [6] was the scope of our risk analysis. The triangle-shaped event
methodology that best served our purpose. FTA is “Battery fault” means that the tree under this event is too
concerned with identifying the causes of named hazards. big to fit here but that it is developed elsewhere.
The first step in FTA is to find these named hazards, called In deployed systems, each branch of a tree can be
top events, one wants to explore. In our case, we decided assigned a probability value thus leading to probabilities
to take a close look at the top events that would lead to for top events and allowing for a qualitative evaluation. As
customers not wanting to use Monitor and Control any our application is not deployed but under development,
longer. The next step in FTA is then to identify and this step was not interesting due to the “emotional”
explore causes for the top event. Causes are refined and estimation of the assigned probabilities (there were simply
split in sub-events until a basic event is reached that no reliable facts to base probability estimates on).
cannot be refined further. Countermeasures shall be Software tools, e.g. [10], are available that help with the
suggested for the basic events, so as to prevent the start of layout of the tree and especially with the calculation of
the causal chain that leads to the top event. probabilities. For our ends, a drawing tool was sufficient.
Even though we rejected Event Tree Analysis, we Even without probability calculations, FTA leads to a
always tried to make a simple forward (ETA) analysis of good understanding of which basic events can lead to a
the basic events we identified through FTA, just to be sure serious failure. The basic events can then be explored and
that we had identified all the top events. protective measures can be designed to prevent the
In Figure 2, a simple fault tree is depicted for the top occurrence of the basic event and consequently the
event that a flashlight does not shine. The fault tree should occurrence of the top event.
then be read from the top to get to know the causes for this One advantage of FTA is that it is easy to grasp. No
top event. In this figure, only OR-gates are shown. All lengthy introduction is needed. The basic symbols of the
other Boolean operators are also available as well as a gate fault tree are easily explained and not numerous. Thus, the
that outputs if the input events occur in sequential order diagram is quite self-explanatory, also to people not
from left to right. previously acquainted with FTA. By the simplicity of the
Events can be depicted in four basic ways. A rectangle method, work focuses on the actual risk analysis and not
describes an event that is not fully described and can be on the method. In fact, the method guides in the risk
split up in more causes. In Figure 2 “Bulb fault” is such an analysis and ensures that one stays focused on the top
event. Basic events are denoted by circles such as “Not event that is currently being explored.
switched on”. It is assumed that basic events cannot be
split in more causes. Countermeasures should only be 3.2 Scope of the Analysis
considered for basic events. Usually countermeasures are We chose to conduct a risk analysis focusing on the
software and the functionality of the service. We have thus
Flash light does identified a number of worst cases and tried to find out as
not shine many causes for them as possible. We aimed at finding out
what could lead to the most disastrous events in the life
cycle of our service and what can be done to prevent these
OR

events. As a by-product we also identified dependencies of


Monitor and Control on other components and examined
Internal
possible threats of Monitor and Control to the e-service
Battery Bulb fault Not
error switched
system.
fault
on
4. Analysis
OR

The risk analysis began with the identification and


Bulb
Bulb
Bulb
ordering of the worst events that could happen within
out of Monitor and Control. The ordering should reflect the
Insert new bulb missing loose
order severity of the situation with the most hazardous situation
first. Severity was judged by the impact the event would
have on the trust of the user in the e-service. We
concentrated on top events that would lead to customers
Figure 2: Example Fault Tree
removing Monitor and Control from their systems. The Alarm does
following six top events have been investigated. not reach user
T1
1. There is an alarm situation in the house, but the
users are not reached by any notification.

OR
2. There is an alarm situation in the house, but the
Alarm reaches
users are notified only after an unacceptable delay. Alarm system Alarm does not E-sc fails
device that
3. The alarm system is – unintentionally or does not reach e-sc to contact
user is not
generate alarm user
maliciously – turned off without users noticing it. E1 checking
4. A hacker exploits a weakness in Monitor and

OR
Control and gains access to the home net.
5. There is no alarm situation in the house but users M&C supplies
Allow user to test
wrong contact E-sc failure
are notified of a false positive alarm. contact information
message
6. The switching of electrical home devices does not when adding or
changing the text

OR
work reliably.

As can be seen from this list, five of the six events are User M&C message
related to the alarm functionality of Monitor and Control. entered is not correctly
wrong information formatted
This is not surprising, as an alarm system is by definition a
critical system that the end user must be able to trust. Any Figure 3: Top event tree for
malfunction of the alarm system is a potential trigger for “Alarm does not reach user“
the user not to use this service any longer. checking this device any longer or at least not at the
The most critical point, an alarm in the home does not moment of alarm.
reach the end user, will now be investigated in depth by The diamond shape around “Alarm system does not
showing the complete fault tree. Only confidential events generate alarm” denotes that this issue is not further
are not shown in detail. investigated in this study as the alarm system is outside the
The first level below the top event (c.f. Figure 3) software control of the Monitor and Control project.
makes clear on which platforms the alarm may be lost. It However, it was noticed that the dependability of Monitor
could either be a failure of the detectors and base station and Control on the correct functioning of the detector
(in the figure called “Alarm system”). It could also be a system is high and that the alarm system company should
failure at the residential gateway (“gw”) or at the system be encouraged to do a similar risk analysis.
service provider (“e-sc”). It could even be lost at the user The triangle “Alarm does not reach e-sc” denotes that a
side when the alarm reaches a device but the user is not sub-tree is available for this event. This sub-tree is shown

Alarm does not reach


e-service center
E1
OR

Alarm does not Alarm does not


E-sc failure
reach gw leave gw
OR

OR

M&C has not


gw not ready to gw turned Serial cable Serial cable received alarm Network cable Network cable M&C does not send
receive message off not attached severed (lost between network severed not attached alarm message
card and M&C)
OR

OR

OR

OR

Any errors
gw in Serial port General User Failure in gw in detected by a
User uninstalls M&C gw
non-responding driver is power turned off communication non-responding code inspection
M&C crashes crashes
system state. unavailable failure gw system system state. from “receive”
to “send”
Figure 4: Sub-tree for “Alarm does not reach user”. Events in grey
can be detected by the is-alive checker.
in Figure 4 and explained later. The one developed event
in Figure 3 is “e-service centre fails to contact user”. This gw busy
event can either result from a failure at the site of the
system service provider (and is therefore not of primary

OR
interest for this risk analysis) or it can result from Monitor
and Control supplying a wrong contact message to the M&C
M&C must
system service provider. The message can be “wrong” in wait for system
too busy
resources
two ways. Firstly, it may be incompatible with the format
expected at the receiving end (due to version mix-up

OR

OR
between the gateway software and the receiving software
at the system service provider). Secondly – and more Busy sending Other e-service
probably – incorrect contact information was entered when other message uses them
Busy in File
the user configured the system. This can be a simple configuration up/download
typing error, but it can also mean that a telephone number mode in home net
was supplied that no one is ever checking. It can also Busy receiving Hacker steals
denote the problem of information ageing: At the time of other message cycles
configuration, the contact information was correct, but at
the time of alarm, the supplied mobile telephone number is Figure 5: Fault tree for “gateway busy”
no longer in use.
Typing errors in the configuration data can be activity from the Monitor and Control service or because
effectively detected by allowing the user to test the other processes (e-services, protocols or users) are using
supplied information (as suggested by the countermeasure cycles. The fault tree makes clear that one e-service is not
comment in Figure 3). only dependent on the reliable and timely functioning of
However, the main part of the top event tree develops itself and its hard- and software platform but also of the
under the triangle “Alarm does not reach e-service centre”, correct and resource-considerate behaviour of other
elaborated in Figure 4. The causes for an alarm not e-services. If one e-service–maliciously or accidentally–
reaching the e-service centre can arise at different misbehaves with respect to scarce resources (bandwidth,
locations (as shown on the level below the event E1 in disk, CPU, etc.), this has consequences for other
Figure 4). An alarm may not even reach the gateway; an e-services.
alarm may not leave the gateway; or an alarm may not Another top event is that a hacker gains access to the
reach the e-service centre due to a failure on the e-service home net by violating the Monitor and Control service. It
centre. The latter event is not further developed in this risk turned out that the vulnerability of Monitor and Control
analysis but is covered in [3]. The other events are further was low; other ways of attacking the system seemed much
split up in sub-events until basic events are reached. more likely. The easiest and probably most successful way
Many of the basic events have to do with physical of attack is through social engineering. In an environment
problems of cables not being properly attached or devices where end users are not supposed to have deep knowledge
being switched off or restarted. However, due to the of the network infrastructure or even of the running
methodology of FTA it was straightforward to identify the software, a telephone attacker stands excellent chances of
sections of the code that are critical for the functioning of succeeding at almost no cost. She would simply claim to
Monitor and Control in regards to the top event. It was call from the system service provider and routinely ask for
then possible to do a more targeted code inspection on that administrator identification and password. Also chances
code. are reasonable that an attack from the layer below [11], in
The tree of Figure 4 makes it clear that platform and this case the network layer, would succeed although the
software stability is of great importance. The e-service gateway is only open on a few ports and not all residential
software itself must not crash but also the platform must gateways have public IP-addresses. Another general threat
not all of a sudden reboot or be switched off. The is-alive is that a malicious e-service gets installed on the home net
checker running on the system service provider site is a or on the gateway. Today, only proprietary e-services are
great help in detecting components that are down. All grey installed on the gateway and only by approval and
events in Figure 4 can be detected (but not remedied) with intervention of the system service provider. In the future,
the is-alive checker. this has to change – or end users will change it to make the
Of the other five top events we investigated, two are residential infrastructure more like the Internet. When this
especially worth mentioning because of their general time comes, the platform must support firewalling between
interest. e-services so that one malicious e-service cannot
In the top event that an alarm reaches the user with an eavesdrop on or modify another possibly safety-critical
unacceptable delay, the residential gateway was identified e-service.
as a bottleneck for traffic. The gateway can cause a delay
because it is busy (c.f. Figure 5). It can be busy either with
4.1 Conclusion level (access to a service by a user). All these tasks must
be executed without impairing the platform they run on. In
FTA gave us a good understanding of the critical parts the case of the residential gateway, the platform must even
of the e-service. It made us point directly to the pieces of be highly available, allowing only for very little down
code that needed a critical inspection and also gave us time. Having a stable and reliable policy engine is a must
ideas about what to look for in the code inspection. As top for the gateway and other e-service network nodes so as to
events were well defined prior to the actual analysis, it was guarantee that all active e-services reliably serve the end
easy to stay on track and not get lost in details when users.
analysing pieces of code or details of the architecture.
Software developers felt more confident after this
analysis because it made them see the weak points in their 6. Acknowledgements
work, but also the limitations of their responsibilities.
Some security problems simply cannot be solved within We would like to thank Ericsson Radio Systems AB,
the scope of the Monitor and Control service but must be Center for Wireless Internet Integration for their co-
addressed by the gateway operating system, the OSGi or operation and especially Håkan Klevebrant, Stefan
Ericsson libraries developed outside the project of Monitor Alsterlid and Peter Halvarsson for their administrative and
and Control. This realisation was good for the developers technical support.
and encouraged them to put down their requirements for
hard- and software they depend on. However, we are 7. References
aware of the fact that a tiny bug somewhere or an unlucky
combination of circumstances can lead to a serious event [1] SUN MICROSYSTEMS, INC. Java Embedded Server.
that we have not foreseen in our scenarios. http://www.sun.com/software/embeddedserver/index.
We also wanted to explore how much a single e-service html
depends on the underlying hard- and software. Not [2] OSGI. OSGi Service Gateway Specification.
surprisingly, this dependability is high. Other e-services http://www.osgi.org/about/spec1.html
can profit from this study and make analogous calculations [3] TELUB AB. Security Analysis of the Ericsson
based on our findings: An e-service depends on electricity, e-service system 1.0. Ericsson Radio Systems, Center
hardware, network and Internet connection, the availability for Wireless Internet Integration. Feb. 15, 2001.
of the authentication service at system service provider site Confidential report.
but also on the reasonable behaviour of other active
services on the residential gateway. Especially this latter [4] STOREY, NEIL. Safety-Critical Computer Systems.
point is not much discussed at the moment and leads to our Addison-Wesley. 1996.
intended future work. [5] MCDERMID J.A. ET AL. Experience with the
application of HAZOP to computer-based systems.
5. Future Work COMPASS '95. Proceedings of the 10th Annual
Conference on Computer Assurance, 1995, Page(s):
According to the internal risk analysis for the whole 37-48
system [3], third party applications are the most evident [6] LEVESON, NANCY. Safeware – System Safety and
threat to the e-service infrastructure (consisting of system Computers. Addison Wesley. 1995.
service provider and residential gateways). This threat [7] FINK, DIETER. Information Technology Security –
should be addressed. Partly this can be done by requiring Managing Challenges and Creating Opportunities.
developing companies to subscribe to guidelines. CCH Australian Limited. 1997.
However, if an e-service is accidentally deployed although
[8] FORCHT, KAREN A. Computer Security Management.
it does not conform to certain guidelines, customer anger
Course Technology. 1994.
will not be directed at the developers of the malicious
e-service but on the system service provider that runs [9] PAUL, BROOKE. Risk-Assessment Strategies. Network
“such an insecure system”. Computing, Vol. 11, Issue 21. Oct. 2000.
The residential gateway of today ensures simple [10] ISOGRAPH RELIABILITY SOFTWARE. FaultTree+.
security policies by implementing a packet filter. In the http://www.isograph.com/faulttree.htm
future, it shall also be used for more advanced security [11] GOLLMANN, DIETER. Computer Security.
features that are controlled by a policy engine–our targeted Wiley&Sons. 1999.
area of research. The policy engine shall have the
additional task of a resource manager. It must ensure that
no e-service starves another e-service by using too much
CPU, RAM, disk space or bandwidth. In addition, it must
enforce access control policies on the application level
(access between different e-service applications) and user