Professional Documents
Culture Documents
Program, one of the 21st Century Frontier R&D Programs funded by the plans are eventually identified by analysis and design process with human
Ministry of Knowledge Economy (MKE). intervention
77
of the system, then we can say the system have a set of the accumulation[9, 10]. In the selection step, the system au-
best plans to the environment. This is can be formulated by: tonomously chooses a configuration which is suitable for
the current situation. Generally, the configuration is cho-
P :S→C (1) sen by the greedy selection strategy in which the best-so-far
configuration is chosen. However, this strategy may lead to
the problem of local optima. Hence, the system must adjust
V (si , P (si )) = max {V (si , cj )}, ∀si ∈ S (2) its strategy between exploitation and exploration. Exploita-
∀cj ∈C
tion represents that the system chooses a greedy configura-
where P in equation (1) is a planner which maps each tion while exploration means that the system intentionally
situation (s ∈ S) into an appropriate configuration (c ∈ chooses an suboptimal or random configuration to find a
C) during the system’s execution. V in equation (2) is a better solution. It is important to adjust the ratio of two
value function which calculates the value of a configuration strategies because it determines the planning performance.
based on the given situation si ∈ S. Equation (2) indicates In the evaluation step, the system must estimate the ef-
if the configuration that the planner P chosen is the best fectiveness of the configuration which is taken in the selec-
configuration, P is the best planner in the environment. One tion step. The key issue in the evaluation step is to define
of the primary goals of researchers in self-management is the way to determine the numerical values which represent
to find an optimal planner which can autonomously make the reward of the configuration because the numerical rep-
plans even in dynamically changing environments. The rest resentation enables the accumulation and comparison of the
of this section discusses two planning approaches to find an rewards. In the accumulation step, the system stores the nu-
optimal planner in architecture-based self-management. merical values identified in the evaluation step. The system
must adjust the accumulation ratio between already accu-
2.1. Off-line Planning mulated knowledge and newly incoming experience. If the
system uses accumulated knowledge too much in the accu-
In architecture-based self-management, off-line plan- mulation step, it may slow down accumulation speed. On
ning means that decisions which define the relationship be- the other hand, if the system uses new experience too much
tween situations and plans (i.e. configurations) are made in the step, it may cause ineffective accumulation. With
prior to run-time. In other words, whenever a system en- accumulated knowledge, the on-line planner can select an
counters a specific situation si from an environment each appropriate action in the next selection step.
time, the system selects and executes exactly one identical With these steps, self-managed systems can take advan-
configuration ci . Solutions so far to self-managed systems tage of on-line planning in the presence of dynamically
focus on off-line planning[11]. For example, plans are made changing environments. The next section describes how on-
by a system administrator through console or hard-coded line planning can be applied to actual systems in detail.
by a developer[14], architectural adaptation is described by
mapping invariants and strategies in ADL description[5], or
architectural changes are triggered by utility functions[4]. 3. Q-learning-based Self-Management
These off-line approaches can be effective if developers
can identify well-defined goals, states, configurations, re- This section presents an approach to designing
wards, and their relationships along with test environments architecture-based self-managed software by applying on-
that exactly illustrate the characteristics of the actual op- line planning based on Q-learning. Generally, we need to
erating environments before deployment time. However, consider three elements to apply metaheuristics such as re-
it is very difficult to identify them due to the nature of inforcement learning; those elements are ‘representation’,
planning[9]. However, On-line planning, which gives more ‘fitness’, and ‘operators’[7]. In reinforcement learning, the
effective autonomy in architecture-based self-management, representations of states and actions (=configurations) are
presents an alternative to overcome the limitation of off-line critical to shaping the nature of the search problem. The
planning. fitness function is used to determine which solution is bet-
ter. The operators enable a system to determine neighbor
2.2. On-line Planning solutions which can accelerate the learning process. Hence,
the proposed approach provides state and action represen-
On-line planning in architecture-based self-management tations that the system can exploit, fitness function design
represents that a software system can autonomously choose for the system to determine better solutions, and operators
a configuration with respect to the current situation that to manipulate solutions. In addition to these three elements,
the system encounters. Generally, an on-line planning the approach provides an on-line evaluation process which
process has three major steps: selection, evaluation, describes the control loop of the system at run-time.
78
3.1. Representation battery’ or ‘full-battery’. While this information represents
a long-term state of the environment, a situation represents
The representations of states and actions are crucial for a transient change of the environment, e.g. ‘hit by wall’ in
designing self-managed software using on-line planning be- an autonomous mobile robot system. Situations are triggers
cause it defines the problem space and the solution space of to begin the adaptation of the system. Situations can also
the system. Their representations are usually depicted in be identified from condition information. The condition,
numerical representation or symbolic code. In the former which describes a transient event such as ‘when the system
case, it is more suitable for neural networks[16] and sup- is hit by bullet’, can be transformed into a situation.
port vector machines[3, 22]. A numerical representation is One more element which must be considered when talk-
not appropriate for shaping the problem and solution space ing about states is the current software architecture of the
of reinforcement learning which uses discrete information, system. Although the architecture is not part of the environ-
because it may cause state explosion which leads to the in- ment, it can influence the planning of the system, for exam-
crease of training time of the system. Hence, the approach ple, it can be used to find an admissible action set. An archi-
uses symbolic code to represent the problem(state) and so- tecture can be denoted by symbolic notation which consists
lution(action) space. of a vector of components and connects e.g. (component:A,
Our approach provides a goal and scenario-based dis- component:B, component:C, connector:A-B, connector:B-
covery process for more systematic state and action discov- C). To simplify them, it can be transformed into a unique
ery. Goal[20, 13] and scenario-based approaches[21, 17] architecture identifier, e.g. arch:1a.
are widely used for the elicitation of software requirements. These three pieces of information(situation, long-term
Also, goal and scenario discovery have been studied[15]. state, architecture) compose the state information. An ex-
Goals denote the objectives that a system must achieve, ample of elements of state information is depicted in Ta-
for example, ‘maintain high throughput in a communica- ble 2. The state information of the system can be denoted
tion system’ or ‘minimize response time’. Scenarios rep- in a vector form such as (situation, long-term
resent a sequence of events to achieve a specific goal, for state, architecture information). For ex-
example, ‘the system is given user input and store them into ample, (hit-by-wall, near, low, arch:1a) and (hit-by-wall,
databases’. Goals are structured hierarchically to represent far, full, arch:1a), which are combinations of elements from
the relationship of super and subgoals. A subgoal represents Table 2.
more specific objectives to achieve its supergoal, eventually Actions can be identified by extracting behavior or reac-
it supports the root goal of the goal hierarchy. A goal has tion from scenarios. As depicted in Table 1, a set of behav-
scenarios and they are used for discovering new subgoals ior(reactions) is identified in a pair of conditions(or stimuli).
by describing the related goal in detail. If these pairs between conditions and reactions are fixed be-
The proposed approach exploits goals and scenarios to fore deployment time, it can be considered off-line plan-
discover states and actions. Once goal and scenario struc- ning, i.e. static plans. The goal of this approach is on-line
ture is organized, they can be mapped to states and actions planning in self-management, the set of actions should be
by reforming them. First, scenarios must be reformed into discovered separately. Similar to state information, actions
the pair of ‘condition → behavior’ or ‘stimulus → reaction’, can be identified by discovering action elements and group-
e.g. ‘when the battery of the system is low(condition or ing the elements into a type. For example, first, identify ac-
stimulus), turn off the additional backup storage(behavior tion elements such as ‘stop moving’ or ‘enhance precision’.
or reaction)’. This reforming is depicted in Table 1. In Then, group the elements which have the same type.
this table, the discovered goals are listed in sequence (goals Examples of action elements and its type are shown in
will be used to discover fitness functions as described in Table 2. With these pieces of information, each action can
Section 3.2). The scenarios of a specific goal are listed be represented by a vector form such as (precise maneuver,
by the goal. Each scenario is reformed into two additional rich) or (stop, text-only) where the first dimension is
columns: Condition(stimulus) and Behavior(reaction). ‘Movement’ and the second one is ‘GUI’. An action ele-
States are identified from the scenarios. Conditions in ment such as ‘rich’ in GUI or ‘stop’ in Movement, implies
the scenarios can be candidates of states. A condition rep- architectural changes which include adding, removing,
resents a possible state of the system, e.g. ‘the system’s replacing a component, and changing the topology of an
battery is low’ implies ‘low-battery’ or ‘the system’s bat- architecture. Hence, each action must be mapped with a set
tery is full’ implies ‘full-battery’. These conditions depict of architectural changes. For example, the action ‘(precise
physical states of the system, i.e. what the environment has maneuver, rich)’ can be mapped with a sequence of
or shows. Thus, a group of conditions represents a dimen- ‘[add:(visionbased localizer), connect:
sion(type) of states, for example, ‘battery-level’ is a dimen- (visionbased localizer)-(pathplanner),
sion of state information and it can have value, either ‘low- replace: (normal gui)-by-(rich gui)]’
79
Table 1. Reformed goals and scenarios
Goal Scenario Condition(stimulus) Behavior(reaction)
Goal 1 Sc. 1-1 when the battery of Cond. 1-1 the battery Beh. 1-1 turn off
Maximize the system is low , turn off the of the system is low the additional backup
system additional backup storage storage
availability Sc. 1-2 . . . Cond. 1-2 . . . Beh. 1-2 . . .
Goal 2 . . . Sc. 2-1 . . . Cond. 2-1 . . . Beh. 2-1 . . .
... ... ... ...
Goal n.m . . . Sc. n.m-1 . . . Cond. n.m-1 . . . Beh. n.m-1 . . .
where (...) indicates a component name. ing shows the process to define the fitness function. This
The discovery process shown in this section identifies process must be manually conducted by system developers.
states and actions by extracting data elements from goals
and scenarios. Because goals and scenarios directly repre- 1. From the root goal, search goals which can be numer-
sent the objectives and user experiences, and also they show ically evaluated by a top-down search strategy.
how the system influences the environment, the number of
states and actions can be limited. In other words, it helps
2. If an appropriate goal is found, define a function
the system prevent a state explosion.
that represents the goal and mark all subgoals of the
goal(i.e. subtree). Then, stop the search of the subtree.
3.2. Fitness
3. Repeat the search until all leaf nodes are marked.
The fitness function of the system is crucial because it
represents the reward of the action that the system chooses.
The function provides a criterion to evaluate the system’s More than one of the fitness functions can be identified
behavior. Specifically, in autonomous systems, it is neces- by the discovery process. In this case, it is necessary to inte-
sary to know which of two actions is the better according grate the functions into one fitness function. The following
to the current state. Hence, rewards that the fitness func- equation depicts the integrated fitness function:
tion generates are usually represented by numerical num-
X
bers which can be compared to each other. The fitness rt = f (t) = wi fi (t) (3)
function is usually application specific, so it is important to i
discover the function from systems requirements. Our ap-
proach exploits the goal and scenario structure discovered where rt is the reward at time t, f (t) is the (integrated)
in Section 3.1. In particular, goals are the source of fitness P of the i-th function, fi (t)
fitness function, wi is the weight
discovery. is the i-th function of t, and i |wi | = 1. Equation (3) as-
Generally, goals, especially higher ones including the sumes that the objective of the system is to maximize the
root goal, are too abstract to define numerical functions. values of all functions fi (t). To minimize a certain value,
Thus, it is necessary to find an appropriate goal level to de- multiply −1 to the weight value of the function fi (t). Every
fine the fitness function. It is difficult to define universal function fi (t) corresponds to the observed data of the sys-
rules for choosing appropriate goals which describes nu- tem at run-time. For example, the observed network latency
merical functions of the system, but it is possible to propose 100ms at time t is transformed into 10 by the function fi (t)
a systematic process to identify the functions. The follow- and multiplied by -1 because it should be minimized.
80
3.3. Operators 3.4.2 Planning Phase
Using the state identified by the detection phase, the system
Different metaheuristics use different operators. For ex- chooses an action to adapt itself to the state. At this time,
ample, genetic algorithms use crossover and mutation. The the system uses an action selection strategy. In general, Q-
reason why algorithms use operators is to accelerate search- learning uses ‘ε-greedy’ selection strategy as an off-policy
ing better solutions. In reinforcement learning, operators strategy to choose an action from the admissible action set
are used to reduce the search space, i.e. the number of ac- A(s) of the current state s. The strategy is a stochastic pro-
tions. Operator A(s) specifies an admissible action set of cess in which the system exploits prior knowledge or ex-
the observed state s. For example, when a mobile robot plores a new action. This is controlled by a value ε deter-
collides with the wall, actions, which are related to motion mined by the developer, where 0 ≤ ε < 1. In this phase, the
control are more admissible than those of arm manipulation. system generates a random number r. If r < ε, the system
This operator is crucial because it can reduce the training chooses an action randomly from the admissible action set.
time of the system. Otherwise(r > ε), it chooses the best-so-far action by com-
paring the value of each action accumulated in the learn-
3.4. On-line Evolution Process ing phase(see Section 3.4.5). In this manner, the stochastic
strategy prevents the system from falling local optima. The
With three elements discussed through Section 3.1∼3.3, chosen action is used by the execution phase.
the system can apply on-line planning based on Q-learning.
Q-learning[23] is one of temporal-difference (TD) learn- 3.4.3 Execution Phase
ing which is a combination of Monte Carlo ideas and dy-
namic programming ideas[18]. Specifically, Q-learning is This phase applies the action, which is chosen in the previ-
an off-policy TD control algorithm. The reason why we ous phase(planning phase), to the system. As the action de-
choose Q-learning is its modeless characteristics. In con- scribes architectural changes such as adding, removing, re-
trast to model-based planners[2, 19], model-free TD meth- placing components, and reconfiguring architectural topol-
ods can learn directly from raw experience without a model ogy, the system must have architecture manipulation facil-
of the environment. Also, TD learning assumes that it up- ities, i.e. dynamic architecture. Also, it must support the
dates observed data based in part on prior knowledge, with- way to make a quiescent state[6]. The quiescent state2 in ar-
out waiting for a final outcome. TD methods also assume chitectures represents that the system(or part of the system
that events of the environment are recurring. These sat- which are relevant to the changes described by the action)
isfy the properties of on-line planning described in section is safe to execute architectural reconfiguration. If the action
2. This section presents the way to exploit those elements is not executed in the quiescent state, it may cause unsafe
in the on-line evolution process. The process consists of changes which can lead to a malfunction.
five phases: detection, planning, execution, evaluation, and Once reconfiguration is done, the system carries out its
learning phases. own functionalities by using its reconfigured architecture.
The system keeps executiing until it encounters a new sit-
uation or it terminates. Although this may overlap with
3.4.1 Detection Phase
the detection phase, the detection in this phase indicates a
In the detection phase, the system monitors the current state phase transfer from this phase to the next phase(the detec-
of the environment where the system operates. When de- tion phase of course uses this detected situation).
tecting states, the system uses the notation presented in Sec-
tion 3.1. Continual detection may cause performance degra- 3.4.4 Evaluation Phase
dation. Thus, it is crucial to monitor the change which ac-
tually triggers the adaptation of the system. Architecture After the execution phase, the system must evaluate its pre-
information is not suitable for the purpose because it does vious execution by observing a reward from the environ-
not reflect environmental changes and it is changed only by ment. As mentioned in section 3.2, the system continuously
the system. Also long-term states are not feasible because observes values previously defined by the fitness function.
they are only the result of environmental changes instead These values will be used for calculating the reward of the
of the cause. Situations can be appropriate triggers because action taken. The values must be retrieved immediately.
they describe moments that the system needs adaptation. If Otherwise, in other words, if the system is delayed in re-
a situation is detected, then the system observes long-term trieving reward values, it may influence the performance or
states and the current architecture of the system, and de- 2 The term ‘state’ is different from the term mentioned in Section 3.1.
notes them into the representation presented in 3.1. These In this context, the state indicates the execution state of the software archi-
data are passed to the next phase: the planning phase. tecture
81
i) Case 1: learning a new situation Q( s, ai ) ← Reward
e.g. <hit-by-bullet>
Examples that a system uses the above process are shown
Current Arch. Reconfig-
Situation uration
Learner
in Figure 3. i): When a system encounters a new situa-
Planner action ai
tion, the planner in the system chooses an arbitrary recon-
State
No Learning data,
(i.e. arg amax Q ( s, a ) =N/A) Reconfigured
Evaluator Reward figuration action because there is no learning data about the
e.g. (distance=near,
angle=behind, choose an arbitrary Arch. After execution, new situation. Then, The system reconfigures its architec-
energy=mid, action ai for reconfigur- Fitness evaluated rewards
enemy’s energy=low) ation Functions is used for updating ture based on the chosen action. During the system exe-
e.g. replace:(rammer)-by- Q( s, ai ) value.
(avoider) cutes its functionality, the evaluator the execution based on
Q( s, ai ) ←Q ( s, ai ) ⊕
ii) Case 2: Exploiting learning data Reward the predefined fitness function and produces rewards. The
Current Arch. Reconfig-
Situation uration
learner accumulates the rewards with respect to the prior
Planner action Learner situation (planning model construction). ii): When the sys-
a best − so − far
State Using prior experience tem encounters an already experienced situation, the plan-
(i.e. learning data), Choose
Reconfigured Evaluator Reward
best-so-far(arg max Q( s, a) ) ner exploits the best-so-far action based on learning data
Arch.
action a best − so − far After execution,
a
82
alpha=0.3, gamma=0.7, epsilon=1.0
metaheuristics[7] and to verify whether our planning ap-
3500
proach can generates plans without an initial model of the
3000 environment. We chose a robot named ‘AntiGravity 1.0’ as
y = -19.0 x + 2,721.1 the enemy. This robot has an anti-gravity algorithm which
2500 enables the robot to move as it has a reverse gravity engine.
In other words, the algorithm supposes that the wall of the
2000
Score
83
alpha=0.3, gamma=0.7, epsilon=0.5 alpha=0.3, gamma=0.7, epsilon=0.0
3500
3500
3000
y = 74.9 x + 2,361.5 y = -4.7 x + 2,813.3
3000
2500
2500 2000
Score
2000 1500
Score
1000
1500
500
1000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
500 Round(X 10)
(a) A Robot B Robot
0
1 2 3 4 5 6 7 8 9 10 alpha=0.3, gamma=0.7, epsilon=0.0
3000
Round(X 10)
y = 4.8 x + 2,234.7
2500
A Robot B Robot
2000
Score
rameters: α = 0.3, γ = 0.7, ε = 0.5. ε = 0.5 1500
Round(X 10)
(b) A Robot C Robot
This indicates that the robot uses off-line planning and it (c) A Robot C Robot
84
5. Conclusions [7] M. Harman and B. F. Jones. Search-based software engi-
neering. Information & Software Technology, 43(14):833–
839, 2001.
Planning in architecture-based self-management repre-
[8] IBM. Robocode, http://robocode.alphaworks.ibm.com/home/
sents how a system can find appropriate relationships be- home.html, 2004.
tween situations that the system encounters and possible [9] G. Klein. Flexecution as a paradigm for replanning, part 1.
configurations that the system can take in dynamically IEEE Intelligent Systems, 22(5):79–83, 2007.
changing environments. The paper has discussed two types [10] G. Klein. Flexecution, part 2: Understanding and support-
of planning: off-line and on-line planning. On-line plan- ing flexible execution. IEEE Intelligent Systems, 22(6):108–
ning must be considered to deal with dynamically chang- 112, 2007.
ing environments because off-line planning has limitation [11] J. Kramer and J. Magee. Self-managed systems: an archi-
which uses fixed relationships between situations and con- tectural challenge. In FOSE ’07: 2007 Future of Software
Engineering, pages 259–268, Washington, DC, USA, 2007.
figurations for adaptation. On the other hand, on-line plan-
IEEE Computer Society.
ning enables a system to autonomously find better relation- [12] R. Laddaga. Active software. In P. Robertson, H. E. Shrobe,
ships between them in dynamic environments. This paper and R. Laddaga, editors, Proceedings of the First Interna-
has described an approach to designing architecture-based tional Workshop on Self-Adaptive Software, IWSAS 2000,
self-managed systems with on-line planning based on Q- volume 1936 of Lecture Notes in Computer Science, pages
learning. The approach provides a discovery process of 11–26. Springer, 2000.
representations, fitness functions, and operators to support [13] J. Mylopoulos. Goal-oriented requirements engineering. In
on-line planning. The discovered elements are organized 12th Asia-Pacific Software Engineering Conference (APSEC
by an on-line evolution process. A case study has been con- 2005), 15-17 December 2005, Taipei, Taiwan, page 3, 2005.
ducted to evaluate the approach. The result of the case study [14] P. Oreizy, N. Medvidovic, and R. N. Taylor. Architecture-
based runtime software evolution. In ICSE ’98: Proceedings
shows that our approach is effective for architecture-based
of the 20th international conference on Software engineer-
self-management. ing, pages 177–186, Washington, DC, USA, 1998. IEEE
Computer Society.
A Robot Implementation [15] C. Rolland, C. Souveyet, and C. B. Achour. Guiding goal
modeling using scenarios. IEEE Trans. Software Eng.,
24(12):1055–1071, 1998.
Due to space limitation, we provides an addi- [16] F. Rosenblatt. The perception: a probabilistic model for in-
tional material about robot implementation on this URL: formation storage and organization in the brain. Neurocom-
http://seapp.sogang.ac.kr/robotimpl.pdf puting: foundations of research, pages 89–114, 1988.
[17] A. G. Sutcliffe, N. A. M. Maiden, S. Minocha, and
D. Manuel. Supporting scenario-based requirements engi-
References neering. IEEE Trans. Software Eng., 24(12):1072–1088,
1998.
[1] T. Bäck. Adaptive business intelligence based on evolution [18] R. S. Sutton and A. G. Barto. Reinforcement Learning: An
strategies: some application examples of self-adaptive soft- Introduction. Cambridge, Massachusetts, The MIT Press,
ware. Inf. Sci., 148(1-4):113–121, 2002. 1998.
[2] P. Bertoli, A. Cimatti, M. Pistore, M. Roveri, and [19] D. Sykes, W. Heaven, J. Magee, and J. Kramer. From goals
P. Traverso. Mbp: a model based planner. In IJCAI’01 to components: a combined approach to self-management.
Workshop on Planning Under Uncertainty and Incomplete In SEAMS ’08: Proceedings of the 2008 international
Information, 2001. workshop on Software engineering for adaptive and self-
[3] C. J. C. Burges. A tutorial on support vector machines for managing systems, pages 1–8, New York, NY, USA, 2008.
pattern recognition. Data Min. Knowl. Discov., 2(2):121– ACM.
167, 1998. [20] A. van Lamsweerde. Goal-oriented requirements engineer-
[4] J. Floch, S. O. Hallsteinsen, E. Stav, F. Eliassen, K. Lund, ing: A guided tour. In 5th IEEE International Symposium on
and E. Gjørven. Using architecture models for runtime Requirements Engineering (RE 2001), 27-31 August 2001,
adaptability. IEEE Software, 23(2):62–70, 2006. Toronto, Canada, page 249, 2001.
[5] D. Garlan, S.-W. Cheng, A.-C. Huang, B. R. Schmerl, and [21] A. van Lamsweerde and L. Willemet. Inferring declara-
P. Steenkiste. Rainbow: Architecture-based self-adaptation tive requirements specifications from operational scenarios.
with reusable infrastructure. IEEE Computer, 37(10):46–54, IEEE Trans. Software Eng., 24(12):1089–1114, 1998.
2004. [22] V. N. Vapnik. The Nature of Statistical Learning Theory.
[6] H. Gomaa and M. Hussein. Software reconfiguration pat- Springer-Verlag, New York., 1995.
terns for dynamic evolution of software architectures. In 4th [23] C. J. C. H. Watkins. Learning from Delayed Rewards. PhD
Working IEEE / IFIP Conference on Software Architecture thesis, Cambridge University, 1989.
(WICSA 2004), 12-15 June 2004, Oslo, Norway, pages 79–
88, 2004.
85