You are on page 1of 104

Large Scale Systems

Engineering
Self-Adapting Systems

© Miguel Matos
Roadmap

▪ What is a self-adapting system?


▪ Autonomic Computing principles
▪ How to engineer a self-adapting system

Large Scale Systems Engineering 2


Self-Adapting Systems

Why do we need adaptation?

▪ Software is becoming too complex


▪ A modern system is composed of dozens to thousands of components
– From monolithic to microservice architectures
▪ Environment is heterogeneous

▪ Hard to anticipate interactions among components

Large Scale Systems Engineering 3


Self-Adapting Systems

Why do we need adaptation?

• Systems are often subject to various types of uncertainties


• Changing conditions in the environment (resources, services,…)
• User behavior that is difficult to predict
• Goals may change dynamically

• Uncertainties affect qualities (performance, reliability, efficiency, …)


• Uncertainties difficult to anticipate before deployment
• Business continuity (24/7) requires handling uncertainties at runtime

Large Scale Systems Engineering 4


Self-Adapting Systems

Why do we need adaptation?

• Systems and uncertainties have become too complex for


human operators
• Example: A modern database has hundreds of configuration parameters
• Global changes in demand happen quickly and are hard to predict

• Infeasible to have humans manually adapt the system at


runtime

Large Scale Systems Engineering 5


Self-Adapting Systems

Why do we need adaptation?

▪ Key idea

– Let system gather new knowledge at runtime to resolve uncertainties,


reason about itself, its context and goals, and adapt to realize those goals

Large Scale Systems Engineering 6


Self-Adapting Systems

Overview

Software system
input effect

Environment
Non-controllable software,
hardware, network, physical context

Large Scale Systems Engineering 7


Self-Adapting Systems

Overview

Self-adaptive software
system

Includes input effect


uncertainties

Environment
Non-controllable software,
hardware, network, physical context
Large Scale Systems Engineering 8
Self-Adapting Systems

Overview

• External factors
• A self-adaptive system autonomously handles uncertainty in:
• Its own status
• Example: disk throughput is lower than usual
• The environment
• Example: user requests have increase 10x
• Goals
• Example: request latency should be below 10ms from 20ms

Large Scale Systems Engineering 9


Self-Adapting Systems

Overview

Software system

input effect

Environment
Non-controllable software,
hardware, network, physical context

Large Scale Systems Engineering 10


Self-Adapting Systems

Overview

Instrumentation to monitor & adapt


system
Software system

input effect

Probes Environment
Non-controllable software,
hardware, network, physical context

Large Scale Systems Engineering 11


Self-Adapting Systems

Overview Managing system

monitor adapt

monitor Instrumentation to monitor & adapt Managed


system
Software system

input effect

Probes Environment
Non-controllable software,
hardware, network, physical context

Large Scale Systems Engineering 12


Self-Adapting Systems

Overview

• External factors
• A self-adaptive system autonomously handles uncertainty in:
• Its own status
• The environment
• Goals
• Internal factors
• A self-adaptive system comprises two distinct parts
• First interacts with the environment and has domain concerns
• Second interacts with the first part and has adaptation concerns

Large Scale Systems Engineering 13


Self-Adapting Systems

Overview

• Domain concerns
• Concerned with the goals for which the system is built
• Example: database should serve 99% of requests under
50ms

• Adaptation concerns
• Concerned about how system realizes its goals under changing conditions
• Example: database has configuration parameters to optimize performance
for different workloads

Large Scale Systems Engineering 14


Self-Adapting Systems
Self-adaptive software system

Overview Managing system


Adaptation goals

monitor adapt
monitor
Managed system
Domain goals - Controllable software

input effect

Environment
Non-controllable
software, hardware, network,
physical context

Large Scale Systems Engineering 15


Self-Adapting Systems

Autonomic Computing

– Named after human autonomic


nervous system
– Systems can manage themselves
according to an administrator’s
goals
– Self-governing operation of the
entire system, not just parts of it
– New components integrate as
effortlessly as a new cell
establishes itself in the body

Large Scale Systems Engineering 16


Self-Adapting Systems

▪ Ideally systems should be able to run autonomously


▪ Reduced human involvement
▪ Systems should follow high-level policies
– Without the administrator being concerned about the low-level details
– Or how to implement the mechanisms to follow such policies

▪ Autonomic computing
– in the human body, the autonomic nervous system takes care of
unconscious reflexes, that is, bodily functions that do not require our
attention
– First use by IBM to describe computing systems that are self-managing

Large Scale Systems Engineering 17


Self-Adapting Systems

Autonomic Computing Principles


▪ Self-configuring
▪ Self-healing
▪ Self-optimizing
▪ Self-protecting

▪ Known as self-* (self-star) properties

Large Scale Systems Engineering 18


Self-Adapting Systems

Autonomic Computing Principles


▪ Self-configuring
– Automatic integration and distribution of resources by the system
– High-level policies (what is desired, not how)

– Increasingly important in cloud environments where resources can be


acquired/freed at will
– Key property of elastic systems
▪ Different than scalability

Large Scale Systems Engineering 19


Self-Adapting Systems

Autonomic Computing Principles


▪ Scalability
– Ability to handle a growing amount of work
– Potential to be enlarged in order to accommodate that growth

▪ Elasticity
– Ability to adapt to workload changes by de/provisioning resources autonomously
– At each instant, available resources match current demand as closely as possible

Large Scale Systems Engineering 20


Self-Adapting Systems

Autonomic Computing Principles


▪ A system can be scalable but not elastic
– Moving from configuration A to configuration B (ex.: different system sizes) requires
manual intervention and maybe downtime

▪ A system can be elastic but not scalable


– System can adjust to demand without downtime but within a narrow margin
– For instance, can adjust autonomously but has a large serial section or crosstalk factor

▪ A system can be scalable and elastic


– Example: low serial section and no crosstalk and ability to be adjusted autonomously
without service disruptions
– Ideal

Large Scale Systems Engineering 21


Self-Adapting Systems

Autonomic Computing Principles


▪ Self-healing
– Detection, diagnosis and reaction to system disruptions
– Faults in components of the system should not lead to failures
– Continuously analyze information from monitors and log files
– Key properties of dependable systems
▪ Availability
▪ Reliability

Large Scale Systems Engineering 22


Self-Adapting Systems

Autonomic Computing Principles


▪ Self-optimizing
– Automatically measure and tune resources to improve performance and usage
– Systems have hundreds of tunable parameters

Large Scale Systems Engineering 23


Self-Adapting Systems

Autonomic Computing Principles


▪ Self-protecting
– Anticipate, detect, identify and protect against attacks
– Known and unknown threats

Large Scale Systems Engineering 24


Self-Adapting Systems

How to make a system self-adapting?

• Existing legacy systems


• Add feedback loops to deal with adaptation concerns

• New systems
• Separate domain concerns from adaptation concerns
• More flexibility to handle future changes

Large Scale Systems Engineering 25


Self-Adapting Systems

How to engineer a self-adapting system?


▪ Several possible approaches (or steps)
▪ Large continuum between non-adapting systems and fully
autonomic systems (not yet a reality)

Large Scale Systems Engineering 26


Self-Adapting Systems

How to engineer a self-adapting system?

Large Scale Systems Engineering 27


Self-Adapting Systems

How to engineer a self-adapting system?


▪ Each of these steps represent a set of adapting capabilities
▪ Each is an evolution of the state of the art
▪ It might not be possible to have all the steps at the same time
▪ Each one complements the other(s)

▪ Parallel between current cars and fully autonomous self-driving


cars

Large Scale Systems Engineering 28


Self-Adapting Systems

Parallel between current cars and fully autonomous self-driving cars Source: https://is.gd/7DI8LD

▪ Level 1: Driver assistance


▪ At least one Advanced Driver Assistance system (ADAS). Example: Adaptive cruise control

▪ Level 2: Partial automation


▪ At least two ADAS that must coordinate with each other. Example: active lane-keep assist and
automatic emergency braking

▪ Level 3: Conditional automation


– Able of taking full control during select parts of a journey under certain operating conditions

▪ Level 4: High automation


– Able of completing an entire journey but with constraints such as speed limit or geographic areas

▪ Level 5: Full automation


– Driverless operation under all circumstances, no provisions for human control -- no steering
wheel, no pedals
Large Scale Systems Engineering 29
Self-Adapting Systems

How to engineer a self-adapting system?


▪ Most important for the course
– Automating Tasks
▪ Basis of the other approaches
▪ Still widely used today
– Control principles
▪ Most advanced
▪ Can leverage Machine Learning models for fine-grained adaptation

Large Scale Systems Engineering 30


Self-Adapting Systems

How to engineer a self-adapting system?

Large Scale Systems Engineering 31


Self-Adapting Systems

Automating Tasks

▪ Goal
▪ System manages itself based on high-level objectives
▪ Basis of the Autonomic Computing vision
▪ Prevent complexity and errors of human-based system management

– Achieved through and Autonomic Manager

Large Scale Systems Engineering 32


Self-Adapting Systems

Autonomic Manager
Autonomic manager
▪ MAPE-K reference model
Analyze Plan
– Monitor
– Analyze
– Plan
– Execute Monitor Knowledge Execute
– Knowledge

Managed element

Large Scale Systems Engineering 33


of this information, the autonomic man
relieve humans of the responsibility of direc
aging the managed element.
Self-Adapting Systems Fully autonomic computing is likely to
designers gradually add increasingly soph
autonomic managers to existing managed e
Ultimately, the distinction between the au
Relieves
manager administrators
and the managed of element may
Autonomic Manager responsibility to directly
merely conceptual rather than architectu
manage managed element
may melt away—leaving fully integrate
nomic elements with well-defined behav
interfaces, but also with few constraints
Autonomic manager internal structure.
Each autonomic element will be respon
Analyze Plan
managing its own internal state and beha
Element to be monitored
for managing andwith an envi
its interactions
controlled
that to realize
consists largely of signals and messa
Monitor Knowledge Execute administrator’s
other goals
elements and the external world. An
internal behavior and its relationships w
elements will be driven by goals that its
Managed element has embedded in it, by other elements t
authority over it, or by subcontracts to
ments with its tacit or explicit consent. The
Large Scale Systems Engineering may require assistance from34 other elem
achieve its goals. If so, it will be respon
obtaining necessary resources from other
Figure 2. Structure of an autonomic element. Elements interact with other and for dealing with exception cases, suc
elements and with human programmers via their autonomic managers. failure of a required resource.
Self-Adapting Systems

Autonomic Manager

Sensors Effectors
autonomic manager

Analyze Plan

Monitor Knowledge Execute

Sensors Effectors

System
Large Scale Systems Engineering 35
Self-Adapting Systems

Autonomic Manager
▪ Sensors
– Collect and measure metrics about
▪ The system
▪ The environment
▪ The user of the system

Large Scale Systems Engineering 36


Self-Adapting Systems

Autonomic Manager
▪ Effectors (or actuators)
– From biology: an organ or cell that acts in response to a stimulus
– Acts upon the system changing one or more of its parameters
▪ Example:
– Modify a system configuration
– Add more replicas
– Ideally, action should not require interruption of the provided service

Large Scale Systems Engineering 37


Self-Adapting Systems

Autonomic Manager
▪ Monitor
– Collects data from managed element and its execution context to update the
Knowledge
– Relies on the sensors for the different aspects of the system, environment and
user

Large Scale Systems Engineering 38


Self-Adapting Systems

Autonomic Manager
▪ Analyze
– Determines whether adaptation actions are required
– Decision is based on the monitor data and administrator goals

respTime = ...
targetTIme = ...

if respTime > targetTime:


addReplica(...)
Large Scale Systems Engineering 39
Self-Adapting Systems

Autonomic Manager
▪ Plan
– Plans mitigation actions to adapt the managed element when needed
– Adapting the system impacts the system itself
– Executing the adaptation actions right away might not be a good idea
– Example:
▪ Distributed database with 10 replicas
▪ System load drops by half
▪ Shall we change the number of replicas to 5?

Large Scale Systems Engineering 40


Self-Adapting Systems

Autonomic Manager
▪ Plan
– In the most simple approach the plan places restrictions on how fast the systems
can adapt
– Prevents disruptions in the system caused by the adaptation itself
– For instance, adding and removing replicas to a system is not free:
▪ Time taken
▪ Resource usage of existing replicas due to
– State transfer when adding new replicas
– Increased load when removing new replicas
▪ Example
– The addition/removal of replicas to a Cassandra cluster should be separated by a few minutes

Large Scale Systems Engineering 41


Self-Adapting Systems

Autonomic Manager
▪ Plan
– Another simple approach, in the analysis phase is to perform hysteresis of the
monitored data
– Hysteresis
▪ Analyzed data takes into consideration not only the latest measurements but also some
of the history
▪ Delays taking action immediately upon sudden changes

Large Scale Systems Engineering 42


Self-Adapting Systems

Autonomic Manager
▪ Execute
– Executes the adaptation actions of the generated plan, adapting the managed
element
– Leverages the effectors
– Can be done at different levels
▪ Change system internal configuration parameters
▪ Throttle client requests
▪ Instruct external entity to modify system resources

Large Scale Systems Engineering 43


Self-Adapting Systems

Autonomic Manager
▪ Knowledge
– Abstraction of relevant aspects of:
▪ The managed element
▪ The environment
▪ The administrator’s goals

– Encodes the desired high-level policies and target Service Level Objectives and
Service Level Agreement

Large Scale Systems Engineering 44


Self-Adapting Systems

Autonomic Manager
▪ Knowledge
– Service Level Objectives
▪ Specific target for a metric
▪ Example: latency of request X should be below 50ms
– Service Level Agreement
▪ Formal (contractual) commitment made to a customer
▪ Example: if SLO X is violated for more than 1s per day then reimburse customer

Large Scale Systems Engineering 45


Self-Adapting Systems

Autonomic Manager concrete implementation and example


▪ Rainbow framework
– Reusable infrastructure to support self-adaptation of software systems
– Behavioral constraints establishing an envelope of allowed changes
– External control mechanisms

Large Scale Systems Engineering 46


The Rainbow framework
Reusable core
elements

Domain-
specific Aggregates
instances information and
applies adaptations
when needed

Example of a
managed system
equipped with
probes and
effectors

Large Scale Systems Engineering 47


The Rainbow framework

Example
▪ Distributed database
– Capability to add/remove replicas at runtime
– Web clients make stateless requests to server groups (set of replicas)

▪ Adaptation goal
– Response time of each client should stay below predefined maximum
– If servers are overloaded add more servers
– If bandwidth between client and server is low, move client to another
server group

Large Scale Systems Engineering 48


The Rainbow framework
Applies adaptation strategy: Checks periodically
(i) load group too high -> whether maximum
add server; response time is
(ii) available bandwidth of violated; if so
client too low -> move trigger adaptation
client to other group engine

Maintains model of
client-server
Executes the
system
actions of the
selected adaptation
strategy

Large Scale Systems Engineering 49


Self-Adapting Systems

Autonomic Manager
▪ Each component will self-manage
– Internal behavior
▪ Managed elements
– Hardware/software resources
▪ Monitor managed elements and external environment
– Relationship with the other components

▪ Relationships can be arbitrarily complex

Large Scale Systems Engineering 50


Self-Adapting Systems

Autonomic Manager
▪ Autonomic elements will function at many levels
– At the lower levels
▪ Limited range of internal behaviors
▪ Hard-coded behaviors
– At the higher levels
▪ Increased dynamism and flexibility
▪ Goal-oriented behaviors

▪ Fully autonomic computing


– Evolve as designers gradually add increasingly sophisticated autonomic managers to existing
managed elements
▪ Hard-wired relationships will evolve into flexible relationships that are established via
negotiation
– E.g.: static number of replicas vs dynamic number of replicas accessed through a load balancer

Large Scale Systems Engineering 51


Further
Reading

Large Scale Systems Engineering


Self-Adapting Systems

How to engineer a self-adapting system?

Large Scale Systems Engineering 53


Self-Adapting Systems

Architectural Principles
▪ Automating tasks lays out the basic building blocks for self-adaptation

▪ But concepts are still mixed


– Dealing with the changes
– Dealing with the adapation objectives
– Mechanisms vs Policies

▪ MAPE-K functions are intuitive but concrete implementation is challenging

▪ Engineering perspective
– Separate mechanism from policies

Large Scale Systems Engineering 54


Self-Adapting Systems

Architectural Principles
▪ Need for a systematic engineering approach
▪ Separation of concerns (mehcanisms vs policies)
▪ Identify the foundations of engineering self-adaptive systems

▪ Focus on the architecture of a self-adapting system

Large Scale Systems Engineering 55


ement area has
this an implementation architecture but rather a
r [15] which
conceptual or reference architecture which identifies
ng languages in
the necessary functionality for self management. We
ally a language
will use it in the next section to organise and focus
to recognising
discussion of the research challenges present by self
Self-Adapting
al characteristic
management. Systems
it consists of a
e activated in
Architectural Principles
em below. The G
situations by Goal
uted plans. If a Management G’ G”
does not exist Change Plans
s of the higher Plan Request
or a system will Change
P1 P2
this layer. Management
Change Actions

Status
Component
r architecture is C1 C2
Control
nsists of time
ng which takes Figure 1 – Three Layer Architecture Model for
Large Scale Systems Engineering 56
of a high-level Self-Management.
to achieve that
be given the
actions requiring deliberation are at the uppermost
nal sequencing systems [10] to sets of state
level. We would emphasize that we do not consider
es. Work in the network management area has
this an implementation architecture but rather a
d languages such as Ponder [15] which
conceptual or reference architecture which identifies
a similar function to the planning languages in
the necessary functionality for self management. We
ext of systems. Ponder is essentially a language
will use it in the next section to organise and focus
execute actions in response to recognising
discussion of the research challenges present by self
Self-Adapting
e complex) events. Systems
The essential characteristic
management.
hange management layer is that it consists of a
pre-specified plans which are activated in
e to state change from the system
Architectural below. The
Principles G
an respond quickly to new situations by Goal
ng what are in essence pre-computed plans. If a Management G’ G”
n is reported for which a plan does not exist Change Plans
▪ Component
s layer must invoke the servicesControl
of the higher Plan Request
g layer. In addition, Accomplishes
– new goals for athe system
system will Change
functionality P1 P2
new plans being introduced into this layer. Management
Change Actions
– Set of interconnected
al Management components Status
– Reports on the status of the Component
permost layer of Gat’ssystem
three layer architecture
(OK, Not OK) is C1 C2
Control
iberation layer. This layer to
– Facilities consists
performofadaptions
time
ing computations such as planningtowhich
– Corresponds takes
the managed Figure 1 – Three Layer Architecture Model for
system of MAPE-K
rent state and a specification of a high-level Self-Management.
d attempts to produce a plan to achieve that
n example in robotics would be given the
Large Scale Systems Engineering 57
position of a robot and a map of its
3 Research Issues
ment produce a route plan for execution by the
ing layer. Changes in the environment, such as In the previous section we outlined a three layer
s that are not in the map, will involve re- architecture model which is intended as a form of
actions requiring deliberation are at the uppermost
nal sequencing systems [10] to sets of state
level. We would emphasize that we do not consider
es. Work in the network management area has
this an implementation architecture but rather a
d languages such as Ponder [15] which
conceptual or reference architecture which identifies
a similar function to the planning languages in
the necessary functionality for self management. We
ext of systems. Ponder is essentially a language
will use it in the next section to organise and focus
execute actions in response to recognising
discussion of the research challenges present by self
Self-Adapting
e complex) events. Systems
The essential characteristic
management.
hange management layer is that it consists of a
pre-specified plans which are activated in
e to state change from the system below. The
Architectural Principles G
an respond quickly to new situations by Goal
ng what are in essence pre-computed plans. If a Management G’ G”
n is reported for which a plan does not exist Change Plans

▪ Change
s layer must invoke Management
the services of the higher Plan Request
g layer. In addition,
– new
Set ofgoals for a system
pre-specified plans will Change
P1 P2
new plans being introduced
– Reacts tointo thischanges
status layer. Management
Change Actions

al Management– Executes actions based on Status


▪ Status changes
Component
permost layer of Gat’s▪ three layer architecture is
Goal changes C1 C2
Control
iberation layer. This
– If alayer
conditionconsists of time
is reported that
ing computations such as planning
cannot be handled which
by takes
the Figure 1 – Three Layer Architecture Model for
rent state and a specification
available plan of invoke
a high-level
top layer Self-Management.
d attempts to produce a plan to achieve that
n example in robotics would be given the
Large Scale Systems Engineering 58
position of a robot and a map of its
3 Research Issues
ment produce a route plan for execution by the
ing layer. Changes in the environment, such as In the previous section we outlined a three layer
s that are not in the map, will involve re- architecture model which is intended as a form of
actions requiring deliberation are at the uppermost
nal sequencing systems [10] to sets of state
level. We would emphasize that we do not consider
es. Work in the network management area has
this an implementation architecture but rather a
d languages such as Ponder [15] which
conceptual or reference architecture which identifies
a similar function to the planning languages in
the necessary functionality for self management. We
ext of systems. Ponder is essentially a language
will use it in the next section to organise and focus
execute actions in response to recognising
discussion of the research challenges present by self
Self-Adapting
e complex) events. Systems
The essential characteristic
management.
hange management layer is that it consists of a
pre-specified plans which are activated in
e to state change from the system below. The
Architectural Principles G
an respond quickly to new situations by Goal
ng what are in essence pre-computed plans. If a Management G’ G”
n is reported for which a plan does not exist Change Plans

▪ Goal
s layer must invoke the Management
services of the higher Plan Request
g layer. In addition,
– new goals forfor
Responsible a system will
planning and Change
P1 P2
new plans being introduced intonew
introducing this goals
layer. Management
Change Actions

al Management– Takes state and high-level goal


to produce plan to achieve goal Status
Component
permost layer of Gat’s three layer architecture is C1 C2
Control
iberation layer. This layer consists of time
ing computations such as planning which takes Figure 1 – Three Layer Architecture Model for
rent state and a specification of a high-level Self-Management.
d attempts to produce a plan to achieve that
n example in robotics would be given the
Large Scale Systems Engineering 59
position of a robot and a map of its
3 Research Issues
ment produce a route plan for execution by the
ing layer. Changes in the environment, such as In the previous section we outlined a three layer
s that are not in the map, will involve re- architecture model which is intended as a form of
Self-Adapting Systems

Architectural Principles
▪ Focus on architecture and change management
▪ Focus on higher-level concerns and component interactions
– Rather than micro-management of each component
– Architecture perspective provides:
▪ Generality
▪ Appropriate abstraction
▪ Leverages existing work

Large Scale Systems Engineering 60


Self-Adapting Systems

How to engineer a self-adapting system?

Large Scale Systems Engineering 61


Self-Adapting Systems

Runtime models
▪ Concrete realisation of architectural principles is complex
▪ How is behavior at runtime modeled?

▪ Model-driven approach
– Adaptation mechanisms should leverage software models at runtime to
reason about the system and its goals
– Extend traditional model-driven engineering approaches to encompass
runtime

Large Scale Systems Engineering 62


Self-Adapting Systems

Runtime models
▪ Runtime model:
– a causally connected self-representation of the system
▪ Structure
▪ Behavior
▪ Goals

Large Scale Systems Engineering 63


Self-Adapting Systems

Runtime models
▪ Dimensions
– Structural versus behavioral
▪ Organization of the system or parts of it versus
▪ Execution and observable activities of the system

– Procedural versus declarative


▪ How the system is organized/executes versus
▪ What is the purpose of adaptation

Large Scale Systems Engineering 64


Self-Adapting Systems

Runtime models
▪ Dimensions
– Functional versus non-functional
▪ Models reflecting functions of the system versus
▪ Non-functional/quality properties of the system

– Formal versus non-formal


▪ Models specified with a mathematical language versus
▪ Informal models reflect the system or parts of it

Large Scale Systems Engineering 65


Self-Adapting Systems

How to engineer a self-adapting system?

Large Scale Systems Engineering 66


Self-Adapting Systems

Goal-driven Approach
▪ Focus so far on feedback loops and models
▪ What about requirements?

▪ Languages to specify requirements for self-adaptive systems


▪ Requirements of self-adaptive systems as first-class citizens
▪ Emphasis on how requirements drive the design of the
managing system

Large Scale Systems Engineering 67


Self-Adapting Systems

Goal-driven Approach
▪ Specify the requirements of a system that is exposed to
uncertainties

▪ Define constraints on how requirements may be relaxed at


runtime
– Enabler to handle uncertainties

▪ Languages (RELAX) to specify flexible requirements


▪ Goal-driven model allows to identify different alternatives to
satisfy the overall system objectives

Large Scale Systems Engineering 68


Self-Adapting Systems

How to engineer a self-adapting system?

Large Scale Systems Engineering 69


Self-Adapting Systems

Guarantees under uncertainty


▪ Goals and runtime properly modeled
▪ What about uncertainties?

▪ Role of uncertainty in self-adaptive systems and how to handle


it
▪ Formal runtime techniques to guarantee adaptation goals under
uncertainty

Large Scale Systems Engineering 70


Self-Adapting Systems

Guarantees under uncertainty


▪ Uncertainties as first-class concerns of self-adaptive systems
– Lack of complete knowledge of the system
– Lack of complete knowledge about deployment conditions
– Lack of complete knowledge about running environment

▪ How to solve these uncertainties at runtime


– Codify high-level requirements into probabilistic temporal logic formulae
– Formulas are used at runtime to validate the system and enforce optimal
configurations

Large Scale Systems Engineering 71


Self-Adapting Systems

How to engineer a self-adapting system?

Large Scale Systems Engineering 72


Self-Adapting Systems

Control Principles
▪ MAPE-K engineering well understood
▪ Concrete solutions are quite complex

▪ Apply principles from control theory to realize self-adaptation

Large Scale Systems Engineering 73


Self-Adapting Systems

Control Principles
▪ Control theory
– Control of continuous dynamical systems
– Achieve system control without
▪ Delay Controlled
▪ Overshoot variable

▪ Stability in the steady-state


Overshoot Steady-state
Setpoint error

Transient Steady
state state
Setting time Time
Large Scale Systems Engineering 74
Self-Adapting Systems
Control Principles
Controller keeps
output at setpoint,
regardless of
disturbances
Adaptation goal
Disturbances Uncertainties
Controlling

Setpoint Output
Target
System model is +
- System

learned at system Model Managed


Building
startup through system
online experiments Controlled variable

In case of invasive
Control
changes, a new Steady-state error
system Setpoint
Overshoot

learning phase is used


(Manager)

System provides
Transient state Steady state Tim e

Large Scale Systems Engineering guarantees “by Setting tim e


75
design”
Self-Adapting Systems

Control Principles
▪ Control principles provide analytical guarantees for
– Stability,
– Absence of overshoot
– Settling time
– Robustnes

▪ Linear models work well for a variety of self-adaptive systems

Large Scale Systems Engineering 76


Self-Adapting Systems

How to engineer a self-adapting system?

Large Scale Systems Engineering 77


Self-Adapting Systems

Open challenges
▪ Decentralized self-adaptation
▪ Dealing with changing goals
▪ Dealing with unanticipated change

▪ Exploiting AI

Large Scale Systems Engineering 78


Case Study
Workload Aware Elasticity for
HBase
Cruz et. al. Eurosys’13

Large Scale Systems Engineering


Workload Aware Elasticity for HBase

HBase
▪ NoSQL key-value store
▪ Multi-dimensional map (HTable) with an unbounded number of
attributes
▪ Row range horizontally partitioned into Regions

Large Scale Systems Engineering 80


Workload Aware Elasticity for HBase

HBase
▪ Hierarchical architecture with a Master and RegionServers
▪ Assignment of Regions to RegionServers
▪ Data is stored into Hadoop file system

Large Scale Systems Engineering 81


Workload Aware Elasticity for HBase

HBase
▪ HBase RegionServers (RS) are usually collocated with Hadoop
DataNodes (DNs)
– Hundreds of configuration parameters (factors)
▪ block size; handler count; behavior of the GC; write buffer size; etc

▪ 2 factors significantly affect performance


▪ block cache size – cache dedicated to read requests
▪ memstore size – buffer dedicated to accommodate write requests
– Both use memory
▪ Increasing one implies decreasing the other

Large Scale Systems Engineering 82


Workload Aware Elasticity for HBase

HBase workload
▪ Basic operations
– Put, Get, Scan over (key,value) pairs
– User-defined key

▪ Different applications -> different data access patterns


– Access hotspots

▪ Data co-location is not a requirement


▪ no clear relationship between different entities
▪ data de-normalized
▪ no support of atomic multi-item operations

Large Scale Systems Engineering 83


Workload Aware Elasticity for HBase

HBase workload
▪ Random assignment of Regions to RegionServers
▪ Red regions -> Hotspots

Large Scale Systems Engineering 84


Workload Aware Elasticity for HBase

HBase workload
▪ Manual assignment of Regions to RegionServers
– Done by the Administrator

▪ Distribute hotspot Regions across different RegionServers

Large Scale Systems Engineering 85


Workload Aware Elasticity for HBase

HBase workload
▪ Not all hotspot have the same nature
– Some regions are read-intensive
– Some regions are write-intensive
– Some regions are mixed

Large Scale Systems Engineering 86


Workload Aware Elasticity for HBase

HBase workload
▪ Determine Region access type
▪ Assigned them to optimized RegionServers

Write
Write
Read
Write

Read Read
Large Scale Systems Engineering 87
Workload Aware Elasticity for HBase

HBase heterogeneity experiment


▪ How well does this works?
▪ Experiment
– 5 RegionServers/DataNodes
– 6 YCSB different workloads generators
– Total read/write ratio - 65/35
– Hotspot distribution (50% of requests accessing 40% of the key space)

Large Scale Systems Engineering 88


Workload Aware Elasticity for HBase

HBase heterogeneity experiment


40
th
90 perc.
35 75th perc.
50th perc.
30 25th perc.
Throughput (Op/s × 103)

5th perc.
25

20

15

10

0
Random Manual Manual
Large Scale Systems Engineering Homogeneous Homogeneous Heterogeneous 89
Workload Aware Elasticity for HBase

HBase heterogeneity experiment


▪ Exploiting heterogeneity improves performance
▪ How to self-adapt?

Large Scale Systems Engineering 90


Workload Aware Elasticity for HBase

HBase self-adapt
▪ Monitor
▪ Periodically gathers information about
– Resource usage metrics (from the IaaS)
– NoSQL specific metrics (from HBase)

▪ Exponential smoothing of measurements

Large Scale Systems Engineering 91


Workload Aware Elasticity for HBase

HBase self-adapt
▪ Monitor
▪ Periodically gathers information about
– Resource usage metrics (from the IaaS)
– NoSQL specific metrics (from HBase)

▪ Exponential smoothing of measurements

Large Scale Systems Engineering 92


Workload Aware Elasticity for HBase

HBase self-adapt
▪ Decision Maker
– Addition or removal of nodes
– Region distribution
– Plan to reach the target distribution

Large Scale Systems Engineering 93


Workload Aware Elasticity for HBase

HBase self-adapt
▪ Decision Maker
– Addition or removal of nodes
▪ Analyses overall resource usage
– If above given threshold add nodes
– If below given threshold remove nodes

Large Scale Systems Engineering 94


Workload Aware Elasticity for HBase

HBase self-adapt
▪ Decision Maker
– Region distribution
▪ Three steps

Large Scale Systems Engineering 95


Workload Aware Elasticity for HBase

HBase self-adapt
▪ Decision Maker
– Plan to reach target distribution
▪ Compare current assignment with target
assignment
▪ Determine set of actions that minimize Region
movement

Large Scale Systems Engineering 96


Workload Aware Elasticity for HBase

HBase experiment
▪ Does this works?
▪ Same experimental design as before
– 5 RegionServers/DataNodes
– 6 YCSB different workloads generators
– Total read/write ratio - 65/35
– Hotspot distribution (50% of requests accessing 40% of the key space)

▪ Start with random homogeneous configuration

Large Scale Systems Engineering 97


Workload Aware Elasticity for HBase

HBase experiment

Large Scale Systems Engineering 98


Workload Aware Elasticity for HBase

HBase experiment
▪ Does this works?
▪ With different workloads
– 6 RegionServers/DataNodes
– TPC-C benchmark
▪ Models an e-commerce site
– 30 warehouses; 300 clients
– 45 minutes replications

Setting Throughput (tpmC)


Manual and Homogeneous (base line) 25380
MT E Engineering
Large Scale Systems 33720 99
Workload Aware Elasticity for HBase

HBase experiment
▪ How does it behaves with increasing load?
▪ Initial deployment with 6RegionServes/DataNodes

Large Scale Systems Engineering 100


Workload Aware Elasticity for HBase

HBase experiment
▪ Start with more load than the system can handle
– More nodes are progressively added
25 12
MeT
Throughput (Op/s x 103)

#Nodes 11
20

Number of Nodes
10
15 9

10 8
7
5
6
0 5
0 10 20 30 40 50 60
Large Scale Systems Engineering 101
Time (min)
Workload Aware Elasticity for HBase

HBase experiment
▪ At minute 33 start shutting down clients until only one if left

25 12
MeT
Throughput (Op/s x 103)

#Nodes 11
20

Number of Nodes
10
15 9

10 8
7
5
6
0 5
0 10 20 30 40 50 60
Large Scale Systems Engineering 102
Time (min)
Workload Aware Elasticity for HBase

HBase experiment Impact


Impactofofadaptation
adaptationisisnon-
non-
negligible.
negligible.
▪ At minute 33 start shutting down clients until only
Without
Without one plan,
proper
proper ifplan,
leftsystem
system
might
mightnever
neverstabilize
stabilize

25 12
MeT
Throughput (Op/s x 103)

#Nodes 11
20

Number of Nodes
10
15 9

10 8
7
5
6
0 5
0 10 20 30 40 50 60
Large Scale Systems Engineering 103
Time (min)
Literature and Acknowledgements

▪ The Vision of Autonomic Computing. J. Kephart and D. Chess. IBM

▪ Software Engineering of Self-Adaptive Systems: An Organised Tour


and Future Challenges. D. Weyns. ICAC 2018

▪ Case Study: MeT: Workload Aware Elasticity for NoSQL. F. Cruz et al.
Eurosys 2013

▪ Slides partially based on Engineering Self-Adaptive Systems - An


Organized Tour. D. Weyns. ICAC 2018

Large Scale Systems Engineering 104

You might also like