Professional Documents
Culture Documents
h?p://helix.incubator.apache.org
Apache
IncubaGon
Oct,
2012
@apachehelix
2
Examples
of
distributed
data
systems
3
Lifecycle
Cluster
Fault
Expansion
tolerance
• Thro?le
data
movement
MulG
• Re-‐distribuGon
• ReplicaGon
node
• Fault
detecGon
• Recovery
Single
Node
• ParGGoning
• Discovery
• Co-‐locaGon
4
Typical
Architecture
Cluster
Network
manager
5
Distributed
search
service
INDEX
SHARD
P.1
P.2
P.5
P.6
P.3
P.4
REPLICA
Node
1
Node
2
Node
3
ParGGon
Fault
tolerance
ElasGcity
management
• MulGple
replicas
• Fault
detecGon
• re-‐distribute
• Even
• Auto
create
parGGons
distribuGon
replicas
• Minimize
• Rack
aware
• Controlled
movement
placement
creaGon
of
• Thro?le
data
replicas
movement
Distributed
data
store
P.1
P.2
P.3
P.5
P.6
P.7
P.9
P.10
P.11
ParGGon
Fault
tolerance
ElasGcity
management
• MulGple
replicas
• Fault
detecGon
• Minimize
• 1
designated
• Promote
slave
downGme
master
to
master
• Minimize
data
• Even
• Even
movement
distribuGon
distribuGon
• Thro?le
data
• No
SPOF
movement
Message
consumer
group
• Similar
to
Message
groups
in
AcGveMQ
– guaranteed
ordering
of
the
processing
of
related
messages
across
a
single
queue
– load
balancing
of
the
processing
of
messages
across
mulGple
consumers
– high
availability
/
auto-‐failover
to
other
consumers
if
a
JVM
goes
down
• Applicable
to
many
messaging
pub/sub
systems
like
kada,
rabbitmq
etc
8
Message
consumer
group
9
Zookeeper
provides
low
level
primiGves.
We
need
high
level
primiGves.
ApplicaGon
• File
system
• Node
• Lock
• ParGGon
• Ephemeral
• Replica
• State
• TransiGon
ApplicaGon
Framework
Consensus
Zookeeper
System
10
11
Outline
• IntroducGon
• Architecture
• How
to
use
Helix
• Tools
• Helix
usage
12
Terminologies
Node
A
single
machine
TransiGon AcGon that lets replicas change status e.g Slave -‐> Master
13
Core
concept
State
Machine
Constraints
ObjecGves
t3 t4
O
M
COUNT=1 minimize(maxnj∈N
M(nj)
)
14
Helix
soluGon
Message
consumer
group
Distributed
search
Start
consumpGon
MAX=1
MAX
per
node=5
Offline Online
Stop consumpGon
MAX=3
(number
of
replicas)
15
IDEALSTATE
P1
P2
P3
ConfiguraGon
Constraints
Replica
placement
N2:S
N3:S
N1:S
Replica
State
16
CURRENT
STATE
N1
• P1:OFFLINE
• P3:OFFLINE
N2
• P2:MASTER
• P1:MASTER
N3
• P3:MASTER
• P2:SLAVE
17
EXTERNAL
VIEW
P1
P2
P3
N1:O
N2:M
N3:M
18
Helix
Based
System
Roles
PARTICIPANT
IDEAL STATE
SPECTATOR
Controller
Parition routing
logic
CURRENT STATE
RESPONSE COMMAND
19
Logical
deployment
20
Outline
• IntroducGon
• Architecture
• How
to
use
Helix
• Tools
• Helix
usage
21
Helix
based
soluGon
1. Define
2. Configure
3. Run
22
Define:
State
model
definiGon
• States
• e.g.
MasterSlave
– All
possible
states
– Priority
• TransiGons
– Legal
transiGons
S
– Priority
• Applicable
to
each
O
M
parGGon
of
a
resource
23
Define:
state
model
Builder = new StateModelDefinition.Builder(“MASTERSLAVE”);!
// Add states and their rank to indicate priority. !
builder.addState(MASTER, 1);!
builder.addState(SLAVE, 2);!
builder.addState(OFFLINE);!
!
//Set the initial state when the node starts!
builder.initialState(OFFLINE);
24
Define:
constraints
State
Transi)on
ParGGon
Y
Y
Resource
-‐
Y
Node
Y
Y
COUNT=2
Cluster
-‐
Y
S
COUNT=1
State
Transi)on
O
M
ParGGon
M=1,S=2
-‐
25
Define:constraints
// static constraint!
builder.upperBound(MASTER, 1);!
!
!
// dynamic constraint!
builder.dynamicUpperBound(SLAVE, "R");!
!
!
!
// Unconstrained !
builder.upperBound(OFFLINE, -1;
26
Define:
parGcipant
plug-‐in
code
27
Step
2:
configure
helix-‐admin
–zkSvr
<zkAddress>
CREATE CLUSTER
-‐-‐addCluster <clusterName>
ADD NODE
CONFIGURE RESOURCE
29
Step
3:
Run
START
CONTROLLER
run-‐helix-‐controller
-‐zkSvr
localhost:2181
–cluster
MyCluster
START PARTICIPANT
30
zookeeper
view
31
Znode
content
CURRENT
STATE
EXTERNAL
VIEW
32
Spectator
Plug-‐in
code
33
Helix
ExecuGon
modes
34
IDEALSTATE
P1
P2
P3
ConfiguraGon
Constraints
N1:M
N2:M
N3:M
• 3
nodes
• 1
Master
• 3
parGGons
• 1
Slave
• 2
replicas
• Even
• StateMachine
distribuGon
N2:S
N3:S
N1:S
Replica
Replica
placement
State
35
ExecuGon
modes
• Who
controls
what
AUTO
AUTO
CUSTOM
REBALANCE
36
Auto
rebalance
v/s
Auto
AUTO
REBALANCE
AUTO
37
In
acGon
Auto
rebalance
Auto
MasterSlave
p=3
r=2
N=3
MasterSlave
p=3
r=2
N=3
Node1
Node2
Node3
Node
1
Node
2
Node
3
P1:M
P2:M
P3:M
P1:M
P2:M
P3:M
P2:S
P3:S
P1:S
P2:S
P3:S
P1:S
On
failure:
Auto
create
replica
On
failure:
Only
change
states
to
saGsfy
and
assign
state
constraint
Node
1
Node
2
Node
3
Node
1
Node
2
Node
3
P1:O
P2:M
P3:M
P1:M
P2:M
P3:M
P2:O
P3:S
P1:S
P2:S
P3:S
P1:M
P1:M
P2:S
38
Custom
mode:
example
39
Custom
mode:
handling
failure
Custom
code
invoker
Code
that
lives
on
all
nodes,
but
acGve
in
one
place
Invoked
when
node
joins/leaves
the
cluster
Computes
new
idealstate
Helix
controller
fires
the
transiGon
without
viola)ng
constraints
P1
P2
P3
P1
P2
P3
Transi)ons
1
N1
MàS
2
N2
Sà
M
N1:M
N2:M
N3:M
N1:S
N2:M
N3:M
1
&
2
in
parallel
violate
single
master
constraint
41
Tools
• Chaos
monkey
• Data
driven
tesGng
and
debugging
• Rolling
upgrade
• On
demand
task
scheduling
and
intra-‐cluster
messaging
• Health
monitoring
and
alerts
42
Data
driven
tesGng
• Instrument
–
•
Zookeeper,
controller,
parGcipant
logs
• Simulate
–
Chaos
monkey
• Analyze
–
Invariants
are
• Respect
state
transiGon
constraints
• Respect
state
count
constraints
• And
so
on
• Debugging
made
easy
• Reproduce
exact
sequence
of
events
43
Structured
Log
File
-‐
sample
timestamp partition instanceName sessionId state
MASTER SLAVE 55
OFFLINE DROPPED 0
OFFLINE SLAVE 298
SLAVE MASTER 155
SLAVE OFFLINE 0
Outline
• IntroducGon
• Architecture
• How
to
use
Helix
• Tools
• Helix
usage
48
Helix
usage
at
LinkedIn
Espresso
49
In
flight
• Apache
S4
– ParGGoning,
co-‐locaGon
– Dynamic
cluster
expansion
• Archiva
– ParGGoned
replicated
file
store
– Rsync
based
replicaGon
• Others
in
evaluaGon
– Bigtop
50
Auto
scaling
sosware
deployment
tool
• States
Offline
< 100
51
Summary
• Helix:
A
Generic
framework
for
building
distributed
systems
• Modifying/enhancing
system
behavior
is
easy
– AbstracGon
and
modularity
is
key
• Simple
programming
model:
declaraGve
state
machine
52
Roadmap
• Features
• Span
mulGple
data
centers
• AutomaGc
Load
balancing
• Distributed
health
monitoring
• YARN
Generic
ApplicaGon
master
for
real
Gme
Apps
• Stand
alone
Helix
agent
website
h?p://helix.incubator.apache.org
user user@helix.incubator.apache.org
dev
dev@helix.incubator.apache.org
twi?er
@apachehelix,
@kishoreg1980
54