Professional Documents
Culture Documents
ELSE 2223 07 Self Adapting
ELSE 2223 07 Self Adapting
Engineering
Self-Adapting Systems
© Miguel Matos
Roadmap
▪ Key idea
Overview
Software system
input effect
Environment
Non-controllable software,
hardware, network, physical context
Overview
Self-adaptive software
system
Environment
Non-controllable software,
hardware, network, physical context
Large Scale Systems Engineering 8
Self-Adapting Systems
Overview
• External factors
• A self-adaptive system autonomously handles uncertainty in:
• Its own status
• Example: disk throughput is lower than usual
• The environment
• Example: user requests have increase 10x
• Goals
• Example: request latency should be below 10ms from 20ms
Overview
Software system
input effect
Environment
Non-controllable software,
hardware, network, physical context
Overview
input effect
Probes Environment
Non-controllable software,
hardware, network, physical context
monitor adapt
input effect
Probes Environment
Non-controllable software,
hardware, network, physical context
Overview
• External factors
• A self-adaptive system autonomously handles uncertainty in:
• Its own status
• The environment
• Goals
• Internal factors
• A self-adaptive system comprises two distinct parts
• First interacts with the environment and has domain concerns
• Second interacts with the first part and has adaptation concerns
Overview
• Domain concerns
• Concerned with the goals for which the system is built
• Example: database should serve 99% of requests under
50ms
• Adaptation concerns
• Concerned about how system realizes its goals under changing conditions
• Example: database has configuration parameters to optimize performance
for different workloads
monitor adapt
monitor
Managed system
Domain goals - Controllable software
input effect
Environment
Non-controllable
software, hardware, network,
physical context
Autonomic Computing
▪ Autonomic computing
– in the human body, the autonomic nervous system takes care of
unconscious reflexes, that is, bodily functions that do not require our
attention
– First use by IBM to describe computing systems that are self-managing
▪ Elasticity
– Ability to adapt to workload changes by de/provisioning resources autonomously
– At each instant, available resources match current demand as closely as possible
• New systems
• Separate domain concerns from adaptation concerns
• More flexibility to handle future changes
Parallel between current cars and fully autonomous self-driving cars Source: https://is.gd/7DI8LD
Automating Tasks
▪ Goal
▪ System manages itself based on high-level objectives
▪ Basis of the Autonomic Computing vision
▪ Prevent complexity and errors of human-based system management
Autonomic Manager
Autonomic manager
▪ MAPE-K reference model
Analyze Plan
– Monitor
– Analyze
– Plan
– Execute Monitor Knowledge Execute
– Knowledge
Managed element
Autonomic Manager
Sensors Effectors
autonomic manager
Analyze Plan
Sensors Effectors
System
Large Scale Systems Engineering 35
Self-Adapting Systems
Autonomic Manager
▪ Sensors
– Collect and measure metrics about
▪ The system
▪ The environment
▪ The user of the system
Autonomic Manager
▪ Effectors (or actuators)
– From biology: an organ or cell that acts in response to a stimulus
– Acts upon the system changing one or more of its parameters
▪ Example:
– Modify a system configuration
– Add more replicas
– Ideally, action should not require interruption of the provided service
Autonomic Manager
▪ Monitor
– Collects data from managed element and its execution context to update the
Knowledge
– Relies on the sensors for the different aspects of the system, environment and
user
Autonomic Manager
▪ Analyze
– Determines whether adaptation actions are required
– Decision is based on the monitor data and administrator goals
respTime = ...
targetTIme = ...
Autonomic Manager
▪ Plan
– Plans mitigation actions to adapt the managed element when needed
– Adapting the system impacts the system itself
– Executing the adaptation actions right away might not be a good idea
– Example:
▪ Distributed database with 10 replicas
▪ System load drops by half
▪ Shall we change the number of replicas to 5?
Autonomic Manager
▪ Plan
– In the most simple approach the plan places restrictions on how fast the systems
can adapt
– Prevents disruptions in the system caused by the adaptation itself
– For instance, adding and removing replicas to a system is not free:
▪ Time taken
▪ Resource usage of existing replicas due to
– State transfer when adding new replicas
– Increased load when removing new replicas
▪ Example
– The addition/removal of replicas to a Cassandra cluster should be separated by a few minutes
Autonomic Manager
▪ Plan
– Another simple approach, in the analysis phase is to perform hysteresis of the
monitored data
– Hysteresis
▪ Analyzed data takes into consideration not only the latest measurements but also some
of the history
▪ Delays taking action immediately upon sudden changes
Autonomic Manager
▪ Execute
– Executes the adaptation actions of the generated plan, adapting the managed
element
– Leverages the effectors
– Can be done at different levels
▪ Change system internal configuration parameters
▪ Throttle client requests
▪ Instruct external entity to modify system resources
Autonomic Manager
▪ Knowledge
– Abstraction of relevant aspects of:
▪ The managed element
▪ The environment
▪ The administrator’s goals
– Encodes the desired high-level policies and target Service Level Objectives and
Service Level Agreement
Autonomic Manager
▪ Knowledge
– Service Level Objectives
▪ Specific target for a metric
▪ Example: latency of request X should be below 50ms
– Service Level Agreement
▪ Formal (contractual) commitment made to a customer
▪ Example: if SLO X is violated for more than 1s per day then reimburse customer
Domain-
specific Aggregates
instances information and
applies adaptations
when needed
Example of a
managed system
equipped with
probes and
effectors
Example
▪ Distributed database
– Capability to add/remove replicas at runtime
– Web clients make stateless requests to server groups (set of replicas)
▪ Adaptation goal
– Response time of each client should stay below predefined maximum
– If servers are overloaded add more servers
– If bandwidth between client and server is low, move client to another
server group
Maintains model of
client-server
Executes the
system
actions of the
selected adaptation
strategy
Autonomic Manager
▪ Each component will self-manage
– Internal behavior
▪ Managed elements
– Hardware/software resources
▪ Monitor managed elements and external environment
– Relationship with the other components
Autonomic Manager
▪ Autonomic elements will function at many levels
– At the lower levels
▪ Limited range of internal behaviors
▪ Hard-coded behaviors
– At the higher levels
▪ Increased dynamism and flexibility
▪ Goal-oriented behaviors
Architectural Principles
▪ Automating tasks lays out the basic building blocks for self-adaptation
▪ Engineering perspective
– Separate mechanism from policies
Architectural Principles
▪ Need for a systematic engineering approach
▪ Separation of concerns (mehcanisms vs policies)
▪ Identify the foundations of engineering self-adaptive systems
Status
Component
r architecture is C1 C2
Control
nsists of time
ng which takes Figure 1 – Three Layer Architecture Model for
Large Scale Systems Engineering 56
of a high-level Self-Management.
to achieve that
be given the
actions requiring deliberation are at the uppermost
nal sequencing systems [10] to sets of state
level. We would emphasize that we do not consider
es. Work in the network management area has
this an implementation architecture but rather a
d languages such as Ponder [15] which
conceptual or reference architecture which identifies
a similar function to the planning languages in
the necessary functionality for self management. We
ext of systems. Ponder is essentially a language
will use it in the next section to organise and focus
execute actions in response to recognising
discussion of the research challenges present by self
Self-Adapting
e complex) events. Systems
The essential characteristic
management.
hange management layer is that it consists of a
pre-specified plans which are activated in
e to state change from the system
Architectural below. The
Principles G
an respond quickly to new situations by Goal
ng what are in essence pre-computed plans. If a Management G’ G”
n is reported for which a plan does not exist Change Plans
▪ Component
s layer must invoke the servicesControl
of the higher Plan Request
g layer. In addition, Accomplishes
– new goals for athe system
system will Change
functionality P1 P2
new plans being introduced into this layer. Management
Change Actions
– Set of interconnected
al Management components Status
– Reports on the status of the Component
permost layer of Gat’ssystem
three layer architecture
(OK, Not OK) is C1 C2
Control
iberation layer. This layer to
– Facilities consists
performofadaptions
time
ing computations such as planningtowhich
– Corresponds takes
the managed Figure 1 – Three Layer Architecture Model for
system of MAPE-K
rent state and a specification of a high-level Self-Management.
d attempts to produce a plan to achieve that
n example in robotics would be given the
Large Scale Systems Engineering 57
position of a robot and a map of its
3 Research Issues
ment produce a route plan for execution by the
ing layer. Changes in the environment, such as In the previous section we outlined a three layer
s that are not in the map, will involve re- architecture model which is intended as a form of
actions requiring deliberation are at the uppermost
nal sequencing systems [10] to sets of state
level. We would emphasize that we do not consider
es. Work in the network management area has
this an implementation architecture but rather a
d languages such as Ponder [15] which
conceptual or reference architecture which identifies
a similar function to the planning languages in
the necessary functionality for self management. We
ext of systems. Ponder is essentially a language
will use it in the next section to organise and focus
execute actions in response to recognising
discussion of the research challenges present by self
Self-Adapting
e complex) events. Systems
The essential characteristic
management.
hange management layer is that it consists of a
pre-specified plans which are activated in
e to state change from the system below. The
Architectural Principles G
an respond quickly to new situations by Goal
ng what are in essence pre-computed plans. If a Management G’ G”
n is reported for which a plan does not exist Change Plans
▪ Change
s layer must invoke Management
the services of the higher Plan Request
g layer. In addition,
– new
Set ofgoals for a system
pre-specified plans will Change
P1 P2
new plans being introduced
– Reacts tointo thischanges
status layer. Management
Change Actions
▪ Goal
s layer must invoke the Management
services of the higher Plan Request
g layer. In addition,
– new goals forfor
Responsible a system will
planning and Change
P1 P2
new plans being introduced intonew
introducing this goals
layer. Management
Change Actions
Architectural Principles
▪ Focus on architecture and change management
▪ Focus on higher-level concerns and component interactions
– Rather than micro-management of each component
– Architecture perspective provides:
▪ Generality
▪ Appropriate abstraction
▪ Leverages existing work
Runtime models
▪ Concrete realisation of architectural principles is complex
▪ How is behavior at runtime modeled?
▪ Model-driven approach
– Adaptation mechanisms should leverage software models at runtime to
reason about the system and its goals
– Extend traditional model-driven engineering approaches to encompass
runtime
Runtime models
▪ Runtime model:
– a causally connected self-representation of the system
▪ Structure
▪ Behavior
▪ Goals
Runtime models
▪ Dimensions
– Structural versus behavioral
▪ Organization of the system or parts of it versus
▪ Execution and observable activities of the system
Runtime models
▪ Dimensions
– Functional versus non-functional
▪ Models reflecting functions of the system versus
▪ Non-functional/quality properties of the system
Goal-driven Approach
▪ Focus so far on feedback loops and models
▪ What about requirements?
Goal-driven Approach
▪ Specify the requirements of a system that is exposed to
uncertainties
Control Principles
▪ MAPE-K engineering well understood
▪ Concrete solutions are quite complex
Control Principles
▪ Control theory
– Control of continuous dynamical systems
– Achieve system control without
▪ Delay Controlled
▪ Overshoot variable
Transient Steady
state state
Setting time Time
Large Scale Systems Engineering 74
Self-Adapting Systems
Control Principles
Controller keeps
output at setpoint,
regardless of
disturbances
Adaptation goal
Disturbances Uncertainties
Controlling
Setpoint Output
Target
System model is +
- System
In case of invasive
Control
changes, a new Steady-state error
system Setpoint
Overshoot
System provides
Transient state Steady state Tim e
Control Principles
▪ Control principles provide analytical guarantees for
– Stability,
– Absence of overshoot
– Settling time
– Robustnes
Open challenges
▪ Decentralized self-adaptation
▪ Dealing with changing goals
▪ Dealing with unanticipated change
▪ Exploiting AI
HBase
▪ NoSQL key-value store
▪ Multi-dimensional map (HTable) with an unbounded number of
attributes
▪ Row range horizontally partitioned into Regions
HBase
▪ Hierarchical architecture with a Master and RegionServers
▪ Assignment of Regions to RegionServers
▪ Data is stored into Hadoop file system
HBase
▪ HBase RegionServers (RS) are usually collocated with Hadoop
DataNodes (DNs)
– Hundreds of configuration parameters (factors)
▪ block size; handler count; behavior of the GC; write buffer size; etc
HBase workload
▪ Basic operations
– Put, Get, Scan over (key,value) pairs
– User-defined key
HBase workload
▪ Random assignment of Regions to RegionServers
▪ Red regions -> Hotspots
HBase workload
▪ Manual assignment of Regions to RegionServers
– Done by the Administrator
HBase workload
▪ Not all hotspot have the same nature
– Some regions are read-intensive
– Some regions are write-intensive
– Some regions are mixed
HBase workload
▪ Determine Region access type
▪ Assigned them to optimized RegionServers
Write
Write
Read
Write
Read Read
Large Scale Systems Engineering 87
Workload Aware Elasticity for HBase
5th perc.
25
20
15
10
0
Random Manual Manual
Large Scale Systems Engineering Homogeneous Homogeneous Heterogeneous 89
Workload Aware Elasticity for HBase
HBase self-adapt
▪ Monitor
▪ Periodically gathers information about
– Resource usage metrics (from the IaaS)
– NoSQL specific metrics (from HBase)
HBase self-adapt
▪ Monitor
▪ Periodically gathers information about
– Resource usage metrics (from the IaaS)
– NoSQL specific metrics (from HBase)
HBase self-adapt
▪ Decision Maker
– Addition or removal of nodes
– Region distribution
– Plan to reach the target distribution
HBase self-adapt
▪ Decision Maker
– Addition or removal of nodes
▪ Analyses overall resource usage
– If above given threshold add nodes
– If below given threshold remove nodes
HBase self-adapt
▪ Decision Maker
– Region distribution
▪ Three steps
HBase self-adapt
▪ Decision Maker
– Plan to reach target distribution
▪ Compare current assignment with target
assignment
▪ Determine set of actions that minimize Region
movement
HBase experiment
▪ Does this works?
▪ Same experimental design as before
– 5 RegionServers/DataNodes
– 6 YCSB different workloads generators
– Total read/write ratio - 65/35
– Hotspot distribution (50% of requests accessing 40% of the key space)
HBase experiment
HBase experiment
▪ Does this works?
▪ With different workloads
– 6 RegionServers/DataNodes
– TPC-C benchmark
▪ Models an e-commerce site
– 30 warehouses; 300 clients
– 45 minutes replications
HBase experiment
▪ How does it behaves with increasing load?
▪ Initial deployment with 6RegionServes/DataNodes
HBase experiment
▪ Start with more load than the system can handle
– More nodes are progressively added
25 12
MeT
Throughput (Op/s x 103)
#Nodes 11
20
Number of Nodes
10
15 9
10 8
7
5
6
0 5
0 10 20 30 40 50 60
Large Scale Systems Engineering 101
Time (min)
Workload Aware Elasticity for HBase
HBase experiment
▪ At minute 33 start shutting down clients until only one if left
25 12
MeT
Throughput (Op/s x 103)
#Nodes 11
20
Number of Nodes
10
15 9
10 8
7
5
6
0 5
0 10 20 30 40 50 60
Large Scale Systems Engineering 102
Time (min)
Workload Aware Elasticity for HBase
25 12
MeT
Throughput (Op/s x 103)
#Nodes 11
20
Number of Nodes
10
15 9
10 8
7
5
6
0 5
0 10 20 30 40 50 60
Large Scale Systems Engineering 103
Time (min)
Literature and Acknowledgements
▪ Case Study: MeT: Workload Aware Elasticity for NoSQL. F. Cruz et al.
Eurosys 2013