ELSE 2223 07 Self Adapting

Large Scale Systems
Engineering
Self-Adapting Systems
© Miguel Matos
Roadmap
▪ What is a self-adapting system?

▪ Autonomic Computing principles
▪ How to engineer a self-adapting system
Large Scale Systems Engineering 2

Why do we need adaptation?
▪ Software is becoming too complex

▪ A modern system is composed of dozens to thousands of components
– From monolithic to microservice architectures
▪ Environment is heterogeneous
▪ Hard to anticipate interactions among components

• Systems are often subject to various types of uncertainties

• Changing conditions in the environment (resources, services,…)
• User behavior that is difficult to predict
• Goals may change dynamically
• Uncertainties affect qualities (performance, reliability, efficiency, …)

• Uncertainties difficult to anticipate before deployment
• Business continuity (24/7) requires handling uncertainties at runtime

• Systems and uncertainties have become too complex for

human operators
• Example: A modern database has hundreds of configuration parameters
• Global changes in demand happen quickly and are hard to predict
• Infeasible to have humans manually adapt the system at

runtime

▪ Key idea
– Let system gather new knowledge at runtime to resolve uncertainties,

reason about itself, its context and goals, and adapt to realize those goals

Overview
Software system
input effect
Environment
Non-controllable software,
hardware, network, physical context

Overview
Self-adaptive software
system
Includes input effect

uncertainties
Environment
Overview
• External factors
• A self-adaptive system autonomously handles uncertainty in:
• Its own status
• Example: disk throughput is lower than usual
• The environment
• Example: user requests have increase 10x
• Goals
• Example: request latency should be below 10ms from 20ms

Overview
Software system
input effect
Environment

Overview
Instrumentation to monitor & adapt

system
Software system
input effect
Probes Environment

Overview Managing system
monitor adapt
monitor Instrumentation to monitor & adapt Managed

system
Software system
input effect
Probes Environment

Overview
• External factors
• A self-adaptive system autonomously handles uncertainty in:
• Its own status
• The environment
• Goals
• Internal factors
• A self-adaptive system comprises two distinct parts
• First interacts with the environment and has domain concerns
• Second interacts with the first part and has adaptation concerns

Overview
• Domain concerns
• Concerned with the goals for which the system is built
• Example: database should serve 99% of requests under
50ms
• Adaptation concerns
• Concerned about how system realizes its goals under changing conditions
• Example: database has configuration parameters to optimize performance
for different workloads

Self-adaptive software system
Overview Managing system

Adaptation goals
monitor adapt
monitor
Managed system
Domain goals - Controllable software
input effect
Environment
Non-controllable
software, hardware, network,
physical context

Autonomic Computing
– Named after human autonomic

nervous system
– Systems can manage themselves
according to an administrator’s
goals
– Self-governing operation of the
entire system, not just parts of it
– New components integrate as
effortlessly as a new cell
establishes itself in the body

▪ Ideally systems should be able to run autonomously

▪ Reduced human involvement
▪ Systems should follow high-level policies
– Without the administrator being concerned about the low-level details
– Or how to implement the mechanisms to follow such policies
▪ Autonomic computing
– in the human body, the autonomic nervous system takes care of
unconscious reflexes, that is, bodily functions that do not require our
attention
– First use by IBM to describe computing systems that are self-managing

Autonomic Computing Principles

▪ Self-configuring
▪ Self-healing
▪ Self-optimizing
▪ Self-protecting
▪ Known as self-* (self-star) properties


▪ Self-configuring
– Automatic integration and distribution of resources by the system
– High-level policies (what is desired, not how)
– Increasingly important in cloud environments where resources can be

acquired/freed at will
– Key property of elastic systems
▪ Different than scalability


▪ Scalability
– Ability to handle a growing amount of work
– Potential to be enlarged in order to accommodate that growth
▪ Elasticity
– Ability to adapt to workload changes by de/provisioning resources autonomously
– At each instant, available resources match current demand as closely as possible


▪ A system can be scalable but not elastic
– Moving from configuration A to configuration B (ex.: different system sizes) requires
manual intervention and maybe downtime
▪ A system can be elastic but not scalable

– System can adjust to demand without downtime but within a narrow margin
– For instance, can adjust autonomously but has a large serial section or crosstalk factor
▪ A system can be scalable and elastic

– Example: low serial section and no crosstalk and ability to be adjusted autonomously
without service disruptions
– Ideal


▪ Self-healing
– Detection, diagnosis and reaction to system disruptions
– Faults in components of the system should not lead to failures
– Continuously analyze information from monitors and log files
– Key properties of dependable systems
▪ Availability
▪ Reliability


▪ Self-optimizing
– Automatically measure and tune resources to improve performance and usage
– Systems have hundreds of tunable parameters


▪ Self-protecting
– Anticipate, detect, identify and protect against attacks
– Known and unknown threats

How to make a system self-adapting?
• Existing legacy systems

• Add feedback loops to deal with adaptation concerns
• New systems
• Separate domain concerns from adaptation concerns
• More flexibility to handle future changes

How to engineer a self-adapting system?

▪ Several possible approaches (or steps)
▪ Large continuum between non-adapting systems and fully
autonomic systems (not yet a reality)



▪ Each of these steps represent a set of adapting capabilities
▪ Each is an evolution of the state of the art
▪ It might not be possible to have all the steps at the same time
▪ Each one complements the other(s)
▪ Parallel between current cars and fully autonomous self-driving

cars

Parallel between current cars and fully autonomous self-driving cars Source: https://is.gd/7DI8LD
▪ Level 1: Driver assistance

▪ At least one Advanced Driver Assistance system (ADAS). Example: Adaptive cruise control
▪ Level 2: Partial automation

▪ At least two ADAS that must coordinate with each other. Example: active lane-keep assist and
automatic emergency braking
▪ Level 3: Conditional automation

– Able of taking full control during select parts of a journey under certain operating conditions
▪ Level 4: High automation

– Able of completing an entire journey but with constraints such as speed limit or geographic areas
▪ Level 5: Full automation

– Driverless operation under all circumstances, no provisions for human control -- no steering
wheel, no pedals

▪ Most important for the course
– Automating Tasks
▪ Basis of the other approaches
▪ Still widely used today
– Control principles
▪ Most advanced
▪ Can leverage Machine Learning models for fine-grained adaptation


Automating Tasks
▪ Goal
▪ System manages itself based on high-level objectives
▪ Basis of the Autonomic Computing vision
▪ Prevent complexity and errors of human-based system management
– Achieved through and Autonomic Manager

Autonomic Manager
Autonomic manager
▪ MAPE-K reference model
Analyze Plan
– Monitor
– Analyze
– Plan
– Execute Monitor Knowledge Execute
– Knowledge
Managed element

of this information, the autonomic man
relieve humans of the responsibility of direc
aging the managed element.
Self-Adapting Systems Fully autonomic computing is likely to
designers gradually add increasingly soph
autonomic managers to existing managed e
Ultimately, the distinction between the au
Relieves
manager administrators
and the managed of element may
Autonomic Manager responsibility to directly
merely conceptual rather than architectu
manage managed element
may melt away—leaving fully integrate
nomic elements with well-defined behav
interfaces, but also with few constraints
Autonomic manager internal structure.
Each autonomic element will be respon
Analyze Plan
managing its own internal state and beha
Element to be monitored
for managing andwith an envi
its interactions
controlled
that to realize
consists largely of signals and messa
Monitor Knowledge Execute administrator’s
other goals
elements and the external world. An
internal behavior and its relationships w
elements will be driven by goals that its
Managed element has embedded in it, by other elements t
authority over it, or by subcontracts to
ments with its tacit or explicit consent. The
Large Scale Systems Engineering may require assistance from34 other elem
achieve its goals. If so, it will be respon
obtaining necessary resources from other
Figure 2. Structure of an autonomic element. Elements interact with other and for dealing with exception cases, suc
elements and with human programmers via their autonomic managers. failure of a required resource.
Autonomic Manager
Sensors Effectors
autonomic manager
Analyze Plan
Monitor Knowledge Execute
Sensors Effectors
System
Autonomic Manager
▪ Sensors
– Collect and measure metrics about
▪ The system
▪ The environment
▪ The user of the system

Autonomic Manager
▪ Effectors (or actuators)
– From biology: an organ or cell that acts in response to a stimulus
– Acts upon the system changing one or more of its parameters
▪ Example:
– Modify a system configuration
– Add more replicas
– Ideally, action should not require interruption of the provided service

Autonomic Manager
▪ Monitor
– Collects data from managed element and its execution context to update the
Knowledge
– Relies on the sensors for the different aspects of the system, environment and
user

Autonomic Manager
▪ Analyze
– Determines whether adaptation actions are required
– Decision is based on the monitor data and administrator goals
respTime = ...
targetTIme = ...
if respTime > targetTime:

addReplica(...)
Autonomic Manager
▪ Plan
– Plans mitigation actions to adapt the managed element when needed
– Adapting the system impacts the system itself
– Executing the adaptation actions right away might not be a good idea
– Example:
▪ Distributed database with 10 replicas
▪ System load drops by half
▪ Shall we change the number of replicas to 5?

Autonomic Manager
▪ Plan
– In the most simple approach the plan places restrictions on how fast the systems
can adapt
– Prevents disruptions in the system caused by the adaptation itself
– For instance, adding and removing replicas to a system is not free:
▪ Time taken
▪ Resource usage of existing replicas due to
– State transfer when adding new replicas
– Increased load when removing new replicas
▪ Example
– The addition/removal of replicas to a Cassandra cluster should be separated by a few minutes

Autonomic Manager
▪ Plan
– Another simple approach, in the analysis phase is to perform hysteresis of the
monitored data
– Hysteresis
▪ Analyzed data takes into consideration not only the latest measurements but also some
of the history
▪ Delays taking action immediately upon sudden changes

Autonomic Manager
▪ Execute
– Executes the adaptation actions of the generated plan, adapting the managed
element
– Leverages the effectors
– Can be done at different levels
▪ Change system internal configuration parameters
▪ Throttle client requests
▪ Instruct external entity to modify system resources

Autonomic Manager
▪ Knowledge
– Abstraction of relevant aspects of:
▪ The managed element
▪ The environment
▪ The administrator’s goals
– Encodes the desired high-level policies and target Service Level Objectives and
Service Level Agreement

Autonomic Manager
▪ Knowledge
– Service Level Objectives
▪ Specific target for a metric
▪ Example: latency of request X should be below 50ms
– Service Level Agreement
▪ Formal (contractual) commitment made to a customer
▪ Example: if SLO X is violated for more than 1s per day then reimburse customer

Autonomic Manager concrete implementation and example

▪ Rainbow framework
– Reusable infrastructure to support self-adaptation of software systems
– Behavioral constraints establishing an envelope of allowed changes
– External control mechanisms

The Rainbow framework
Reusable core
elements
Domain-
specific Aggregates
instances information and
applies adaptations
when needed
Example of a
managed system
equipped with
probes and
effectors

Example
▪ Distributed database
– Capability to add/remove replicas at runtime
– Web clients make stateless requests to server groups (set of replicas)
▪ Adaptation goal
– Response time of each client should stay below predefined maximum
– If servers are overloaded add more servers
– If bandwidth between client and server is low, move client to another
server group

Applies adaptation strategy: Checks periodically
(i) load group too high -> whether maximum
add server; response time is
(ii) available bandwidth of violated; if so
client too low -> move trigger adaptation
client to other group engine
Maintains model of
client-server
Executes the
system
actions of the
selected adaptation
strategy

Autonomic Manager
▪ Each component will self-manage
– Internal behavior
▪ Managed elements
– Hardware/software resources
▪ Monitor managed elements and external environment
– Relationship with the other components
▪ Relationships can be arbitrarily complex

Autonomic Manager
▪ Autonomic elements will function at many levels
– At the lower levels
▪ Limited range of internal behaviors
▪ Hard-coded behaviors
– At the higher levels
▪ Increased dynamism and flexibility
▪ Goal-oriented behaviors
▪ Fully autonomic computing

– Evolve as designers gradually add increasingly sophisticated autonomic managers to existing
managed elements
▪ Hard-wired relationships will evolve into flexible relationships that are established via
negotiation
– E.g.: static number of replicas vs dynamic number of replicas accessed through a load balancer

Further
Reading
Large Scale Systems Engineering


Architectural Principles
▪ Automating tasks lays out the basic building blocks for self-adaptation
▪ But concepts are still mixed

– Dealing with the changes
– Dealing with the adapation objectives
– Mechanisms vs Policies
▪ MAPE-K functions are intuitive but concrete implementation is challenging
▪ Engineering perspective
– Separate mechanism from policies

▪ Need for a systematic engineering approach
▪ Separation of concerns (mehcanisms vs policies)
▪ Identify the foundations of engineering self-adaptive systems
▪ Focus on the architecture of a self-adapting system

ement area has
this an implementation architecture but rather a
r [15] which
conceptual or reference architecture which identifies
ng languages in
the necessary functionality for self management. We
ally a language
will use it in the next section to organise and focus
to recognising
discussion of the research challenges present by self
Self-Adapting
al characteristic
management. Systems
it consists of a
e activated in
em below. The G
situations by Goal
uted plans. If a Management G’ G”
does not exist Change Plans
s of the higher Plan Request
or a system will Change
P1 P2
this layer. Management
Change Actions
Status
Component
r architecture is C1 C2
Control
nsists of time
ng which takes Figure 1 – Three Layer Architecture Model for
of a high-level Self-Management.
to achieve that
be given the
actions requiring deliberation are at the uppermost
nal sequencing systems [10] to sets of state
level. We would emphasize that we do not consider
es. Work in the network management area has
d languages such as Ponder [15] which
a similar function to the planning languages in
ext of systems. Ponder is essentially a language
execute actions in response to recognising
Self-Adapting
e complex) events. Systems
The essential characteristic
management.
hange management layer is that it consists of a
pre-specified plans which are activated in
e to state change from the system
Architectural below. The
Principles G
an respond quickly to new situations by Goal
ng what are in essence pre-computed plans. If a Management G’ G”
n is reported for which a plan does not exist Change Plans
▪ Component
s layer must invoke the servicesControl
of the higher Plan Request
g layer. In addition, Accomplishes
– new goals for athe system
system will Change
functionality P1 P2
new plans being introduced into this layer. Management
Change Actions
– Set of interconnected
al Management components Status
– Reports on the status of the Component
permost layer of Gat’ssystem
three layer architecture
(OK, Not OK) is C1 C2
Control
iberation layer. This layer to
– Facilities consists
performofadaptions
time
ing computations such as planningtowhich
– Corresponds takes
the managed Figure 1 – Three Layer Architecture Model for
system of MAPE-K
rent state and a specification of a high-level Self-Management.
d attempts to produce a plan to achieve that
n example in robotics would be given the
position of a robot and a map of its
3 Research Issues
ment produce a route plan for execution by the
ing layer. Changes in the environment, such as In the previous section we outlined a three layer
s that are not in the map, will involve re- architecture model which is intended as a form of
Self-Adapting
management.
e to state change from the system below. The
Architectural Principles G
▪ Change
s layer must invoke Management
the services of the higher Plan Request
g layer. In addition,
– new
Set ofgoals for a system
pre-specified plans will Change
P1 P2
new plans being introduced
– Reacts tointo thischanges
status layer. Management
Change Actions
al Management– Executes actions based on Status

▪ Status changes
Component
permost layer of Gat’s▪ three layer architecture is
Goal changes C1 C2
Control
iberation layer. This
– If alayer
conditionconsists of time
is reported that
ing computations such as planning
cannot be handled which
by takes
the Figure 1 – Three Layer Architecture Model for
rent state and a specification
available plan of invoke
a high-level
top layer Self-Management.
3 Research Issues
Self-Adapting
management.
e to state change from the system below. The
Architectural Principles G
▪ Goal
s layer must invoke the Management
services of the higher Plan Request
g layer. In addition,
– new goals forfor
Responsible a system will
planning and Change
P1 P2
new plans being introduced intonew
introducing this goals
layer. Management
Change Actions
al Management– Takes state and high-level goal

to produce plan to achieve goal Status
Component
permost layer of Gat’s three layer architecture is C1 C2
Control
iberation layer. This layer consists of time
ing computations such as planning which takes Figure 1 – Three Layer Architecture Model for
rent state and a specification of a high-level Self-Management.
3 Research Issues
▪ Focus on architecture and change management
▪ Focus on higher-level concerns and component interactions
– Rather than micro-management of each component
– Architecture perspective provides:
▪ Generality
▪ Appropriate abstraction
▪ Leverages existing work


Runtime models
▪ Concrete realisation of architectural principles is complex
▪ How is behavior at runtime modeled?
▪ Model-driven approach
– Adaptation mechanisms should leverage software models at runtime to
reason about the system and its goals
– Extend traditional model-driven engineering approaches to encompass
runtime

Runtime models
▪ Runtime model:
– a causally connected self-representation of the system
▪ Structure
▪ Behavior
▪ Goals

Runtime models
▪ Dimensions
– Structural versus behavioral
▪ Organization of the system or parts of it versus
▪ Execution and observable activities of the system
– Procedural versus declarative

▪ How the system is organized/executes versus
▪ What is the purpose of adaptation

Runtime models
▪ Dimensions
– Functional versus non-functional
▪ Models reflecting functions of the system versus
▪ Non-functional/quality properties of the system
– Formal versus non-formal

▪ Models specified with a mathematical language versus
▪ Informal models reflect the system or parts of it


Goal-driven Approach
▪ Focus so far on feedback loops and models
▪ What about requirements?
▪ Languages to specify requirements for self-adaptive systems

▪ Requirements of self-adaptive systems as first-class citizens
▪ Emphasis on how requirements drive the design of the
managing system

Goal-driven Approach
▪ Specify the requirements of a system that is exposed to
uncertainties
▪ Define constraints on how requirements may be relaxed at

runtime
– Enabler to handle uncertainties
▪ Languages (RELAX) to specify flexible requirements

▪ Goal-driven model allows to identify different alternatives to
satisfy the overall system objectives


Guarantees under uncertainty

▪ Goals and runtime properly modeled
▪ What about uncertainties?
▪ Role of uncertainty in self-adaptive systems and how to handle

it
▪ Formal runtime techniques to guarantee adaptation goals under
uncertainty

Guarantees under uncertainty

▪ Uncertainties as first-class concerns of self-adaptive systems
– Lack of complete knowledge of the system
– Lack of complete knowledge about deployment conditions
– Lack of complete knowledge about running environment
▪ How to solve these uncertainties at runtime

– Codify high-level requirements into probabilistic temporal logic formulae
– Formulas are used at runtime to validate the system and enforce optimal
configurations


Control Principles
▪ MAPE-K engineering well understood
▪ Concrete solutions are quite complex
▪ Apply principles from control theory to realize self-adaptation

Control Principles
▪ Control theory
– Control of continuous dynamical systems
– Achieve system control without
▪ Delay Controlled
▪ Overshoot variable
▪ Stability in the steady-state

Overshoot Steady-state
Setpoint error
Transient Steady
state state
Setting time Time
Control Principles
Controller keeps
output at setpoint,
regardless of
disturbances
Adaptation goal
Disturbances Uncertainties
Controlling
Setpoint Output
Target
System model is +
- System
learned at system Model Managed

Building
startup through system
online experiments Controlled variable
In case of invasive
Control
changes, a new Steady-state error
system Setpoint
Overshoot
learning phase is used

(Manager)
System provides
Transient state Steady state Tim e
Large Scale Systems Engineering guarantees “by Setting tim e

75
design”
Control Principles
▪ Control principles provide analytical guarantees for
– Stability,
– Absence of overshoot
– Settling time
– Robustnes
▪ Linear models work well for a variety of self-adaptive systems


Open challenges
▪ Decentralized self-adaptation
▪ Dealing with changing goals
▪ Dealing with unanticipated change
▪ Exploiting AI

Case Study
Workload Aware Elasticity for
HBase
Cruz et. al. Eurosys’13
Large Scale Systems Engineering

Workload Aware Elasticity for HBase
HBase
▪ NoSQL key-value store
▪ Multi-dimensional map (HTable) with an unbounded number of
attributes
▪ Row range horizontally partitioned into Regions

HBase
▪ Hierarchical architecture with a Master and RegionServers
▪ Assignment of Regions to RegionServers
▪ Data is stored into Hadoop file system

HBase
▪ HBase RegionServers (RS) are usually collocated with Hadoop
DataNodes (DNs)
– Hundreds of configuration parameters (factors)
▪ block size; handler count; behavior of the GC; write buffer size; etc
▪ 2 factors significantly affect performance

▪ block cache size – cache dedicated to read requests
▪ memstore size – buffer dedicated to accommodate write requests
– Both use memory
▪ Increasing one implies decreasing the other

HBase workload
▪ Basic operations
– Put, Get, Scan over (key,value) pairs
– User-defined key
▪ Different applications -> different data access patterns

– Access hotspots
▪ Data co-location is not a requirement

▪ no clear relationship between different entities
▪ data de-normalized
▪ no support of atomic multi-item operations

HBase workload
▪ Random assignment of Regions to RegionServers
▪ Red regions -> Hotspots

HBase workload
▪ Manual assignment of Regions to RegionServers
– Done by the Administrator
▪ Distribute hotspot Regions across different RegionServers

HBase workload
▪ Not all hotspot have the same nature
– Some regions are read-intensive
– Some regions are write-intensive
– Some regions are mixed

HBase workload
▪ Determine Region access type
▪ Assigned them to optimized RegionServers
Write
Write
Read
Write
Read Read
HBase heterogeneity experiment

▪ How well does this works?
▪ Experiment
– 5 RegionServers/DataNodes
– 6 YCSB different workloads generators
– Total read/write ratio - 65/35
– Hotspot distribution (50% of requests accessing 40% of the key space)


40
th
90 perc.
35 75th perc.
50th perc.
30 25th perc.
Throughput (Op/s × 103)
5th perc.
25
20
15
10
0
Random Manual Manual
Large Scale Systems Engineering Homogeneous Homogeneous Heterogeneous 89

▪ Exploiting heterogeneity improves performance
▪ How to self-adapt?

HBase self-adapt
▪ Monitor
▪ Periodically gathers information about
– Resource usage metrics (from the IaaS)
– NoSQL specific metrics (from HBase)
▪ Exponential smoothing of measurements

HBase self-adapt
▪ Monitor
▪ Periodically gathers information about
– Resource usage metrics (from the IaaS)
– NoSQL specific metrics (from HBase)
▪ Exponential smoothing of measurements

HBase self-adapt
▪ Decision Maker
– Addition or removal of nodes
– Region distribution
– Plan to reach the target distribution

HBase self-adapt
▪ Decision Maker
– Addition or removal of nodes
▪ Analyses overall resource usage
– If above given threshold add nodes
– If below given threshold remove nodes

HBase self-adapt
▪ Decision Maker
– Region distribution
▪ Three steps

HBase self-adapt
▪ Decision Maker
– Plan to reach target distribution
▪ Compare current assignment with target
assignment
▪ Determine set of actions that minimize Region
movement

HBase experiment
▪ Does this works?
▪ Same experimental design as before
– 6 YCSB different workloads generators
– Total read/write ratio - 65/35
– Hotspot distribution (50% of requests accessing 40% of the key space)
▪ Start with random homogeneous configuration

HBase experiment

HBase experiment
▪ Does this works?
▪ With different workloads
– TPC-C benchmark
▪ Models an e-commerce site
– 30 warehouses; 300 clients
– 45 minutes replications
Setting Throughput (tpmC)

Manual and Homogeneous (base line) 25380
MT E Engineering
Large Scale Systems 33720 99
HBase experiment
▪ How does it behaves with increasing load?
▪ Initial deployment with 6RegionServes/DataNodes

HBase experiment
▪ Start with more load than the system can handle
– More nodes are progressively added
25 12
MeT
Throughput (Op/s x 103)
#Nodes 11
20
Number of Nodes
10
15 9
10 8
7
5
6
0 5
0 10 20 30 40 50 60
Time (min)
HBase experiment
▪ At minute 33 start shutting down clients until only one if left
25 12
MeT
#Nodes 11
20
Number of Nodes
10
15 9
10 8
7
5
6
0 5
0 10 20 30 40 50 60
Time (min)
HBase experiment Impact

Impactofofadaptation
adaptationisisnon-
non-
negligible.
negligible.
▪ At minute 33 start shutting down clients until only
Without
Without one plan,
proper
proper ifplan,
leftsystem
system
might
mightnever
neverstabilize
stabilize
25 12
MeT
#Nodes 11
20
Number of Nodes
10
15 9
10 8
7
5
6
0 5
0 10 20 30 40 50 60
Time (min)
Literature and Acknowledgements
▪ The Vision of Autonomic Computing. J. Kephart and D. Chess. IBM
▪ Software Engineering of Self-Adaptive Systems: An Organised Tour

and Future Challenges. D. Weyns. ICAC 2018
▪ Case Study: MeT: Workload Aware Elasticity for NoSQL. F. Cruz et al.
Eurosys 2013
▪ Slides partially based on Engineering Self-Adaptive Systems - An

Organized Tour. D. Weyns. ICAC 2018

ELSE 2223 07 Self Adapting

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ELSE 2223 07 Self Adapting

Uploaded by

Copyright:

Available Formats

Large Scale Systems

▪ What is a self-adapting system?

Large Scale Systems Engineering 2

Why do we need adaptation?

▪ Software is becoming too complex

▪ Hard to anticipate interactions among components

Large Scale Systems Engineering 3

Why do we need adaptation?

• Systems are often subject to various types of uncertainties

• Uncertainties affect qualities (performance, reliability, efficiency, …)

Large Scale Systems Engineering 4

Why do we need adaptation?

• Systems and uncertainties have become too complex for

• Infeasible to have humans manually adapt the system at

Large Scale Systems Engineering 5

Why do we need adaptation?

– Let system gather new knowledge at runtime to resolve uncertainties,

Large Scale Systems Engineering 6

Large Scale Systems Engineering 7

Includes input effect

Large Scale Systems Engineering 9

Large Scale Systems Engineering 10

Instrumentation to monitor & adapt

Large Scale Systems Engineering 11

Overview Managing system

monitor Instrumentation to monitor & adapt Managed

Large Scale Systems Engineering 12

Large Scale Systems Engineering 13

Large Scale Systems Engineering 14

Overview Managing system

Large Scale Systems Engineering 15

– Named after human autonomic

Large Scale Systems Engineering 16

▪ Ideally systems should be able to run autonomously

Large Scale Systems Engineering 17

Autonomic Computing Principles

▪ Known as self-* (self-star) properties

Large Scale Systems Engineering 18

Autonomic Computing Principles

– Increasingly important in cloud environments where resources can be

Large Scale Systems Engineering 19

Autonomic Computing Principles

Large Scale Systems Engineering 20

Autonomic Computing Principles

▪ A system can be elastic but not scalable

▪ A system can be scalable and elastic

Large Scale Systems Engineering 21

Autonomic Computing Principles

Large Scale Systems Engineering 22

Autonomic Computing Principles

Large Scale Systems Engineering 23

Autonomic Computing Principles

Large Scale Systems Engineering 24

How to make a system self-adapting?

• Existing legacy systems

Large Scale Systems Engineering 25

How to engineer a self-adapting system?

Large Scale Systems Engineering 26

How to engineer a self-adapting system?

Large Scale Systems Engineering 27

How to engineer a self-adapting system?

▪ Parallel between current cars and fully autonomous self-driving

Large Scale Systems Engineering 28

▪ Level 1: Driver assistance