You are on page 1of 4

Introduction to Apache ZooKeeper:

Apache ZooKeeper is an effort to develop and maintain an open-source server which enables highly
reliable distributed coordination. It is a high-performance coordination server for distributed
applications.

What is Apache ZooKeeper?

Apache ZooKeeper is a centralized service for maintaining configuration information, naming, providing
distributed synchronization, and providing group services. All of these kinds of services are used in some
form or another by distributed applications. Each time they are implemented there is a lot of work that
goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of
implementing these kinds of services, applications initially usually skimp on them, which make them
brittle in the presence of change and difficult to manage. Even when done correctly, different
implementations of these services lead to management complexity when the applications are deployed.

ZooKeeper aims at distilling the essence of these different services into a very simple interface to a
centralized coordination service. The service itself is distributed and highly reliable. Consensus, group
management, and presence protocols will be implemented by the service so that the applications do not
need to implement them on their own. Application specific uses of these will consist of a mixture of
specific components of Zoo Keeper and application specific conventions.

Apache ZooKeeper is a high-performance coordination server for distributed applications. It exposes


common services -- such as naming and configuration management, synchronization, and group services
- in a simple interface, relieving the user from the need to program from scratch. It comes with off-the-
shelf support for implementing consensus, group management, leader election, and presence protocols.

Apache ZooKeeper Overview:

Apache ZooKeeper allows distributed processes to coordinate with each other through a shared
hierarchical name space of data registers (we call these registers znodes), much like a file system. Unlike
normal file systems ZooKeeper provides its clients with high throughput, low latency, highly available,
strictly ordered access to the znodes. The performance aspects of ZooKeeper allow it to be used in large
distributed systems. The reliability aspects prevent it from becoming the single point of failure in big
systems. Its strict ordering allows sophisticated synchronization primitives to be implemented at the
client.

The name space provided by ZooKeeper is much like that of a standard file system. A name is a
sequence of path elements separated by a slash ("/"). Every znode in ZooKeeper's name space is
identified by a path. And every znode has a parent whose path is a prefix of the znode with one less
element; the exception to this rule is root ("/") which has no parent. Also, exactly like standard file
systems, a znode cannot be deleted if it has any children.

The main differences between ZooKeeper and standard file systems are that every znode can have data
associated with it (every file can also be a directory and vice-versa) and znodes are limited to the
amount of data that they can have. ZooKeeper was designed to store coordination data: status
information, configuration, location information, etc. This kind of meta-information is usually measured
in kilobytes, if not bytes. ZooKeeper has a built-in sanity check of 1M, to prevent it from being used as a
large data store, but in general it is used to store much smaller pieces of data.

The service itself is replicated over a set of machines that comprise the service. These machines
maintain an in-memory image of the data tree along with a transaction logs and snapshots in a
persistent store. Because the data is kept in-memory, ZooKeeper is able to get very high throughput and
low latency numbers. The downside to an in-memory database is that the size of the database that
ZooKeeper can manage is limited by memory. This limitation is further reason to keep the amount of
data stored in znodes small.

The servers that make up the ZooKeeper service must all know about each other. As long as a majority
of the servers are available, the ZooKeeper service will be available. Clients must also know the list of
servers. The clients create a handle to the ZooKeeper service using this list of servers.

Clients only connect to a single ZooKeeper server. The client maintains a TCP connection, through which
it sends requests, gets responses, gets watch events, and sends heartbeats. If the TCP connection to the
server breaks, the client will connect to a different server. When a client first connects to the ZooKeeper
service, the first ZooKeeper server will setup a session for the client. If the client needs to connect to
another server, this session will get reestablished with the new server.

Read requests sent by a ZooKeeper client are processed locally at the ZooKeeper server to which the
client is connected. If the read request registers a watch on a znode, that watch is also tracked locally at
the ZooKeeper server. Write requests are forwarded to other ZooKeeper servers and go through
consensus before a response is generated. Sync requests are also forwarded to another server, but do
not actually go through consensus. Thus, the throughput of read requests scales with the number of
servers and the throughput of write requests decreases with the number of servers.

Order is very important to ZooKeeper; almost bordering on obsessive–compulsive disorder. All updates
are totally ordered. ZooKeeper actually stamps each update with a number that reflects this order. We
call this number the zxid (ZooKeeper Transaction Id). Each update will have a unique zxid. Reads (and
watches) are ordered with respect to updates. Read responses will be stamped with the last zxid
processed by the server that services the read.

How Apache ZooKeeper Works?

ZooKeeper runs on a cluster of servers called an ensemble that share the state of data. These may be
the same machines that are running other Hadoop services or a separate cluster or another application
server. Whenever a change is made, it is not considered successful until it has been written to a quorum
(at least half) of the servers in the ensemble. A leader is elected within the ensemble, and if two
conflicting changes are made at the same time, the one that is processed by the leader first will succeed
and the other will fail. ZooKeeper guarantees that writes from the same client will be processed in the
order they were sent by that client. This guarantee, along with other features/usage discussed below,
allow the system to be used to implement locks, queues, and other important primitives for distributed
queueing. The outcome of a write operation allows a node to be certain that an identical write has not
succeeded for any other node.

A consequence of the way ZooKeeper works is that a server will disconnect all client sessions any time it
has not been able to connect to the quorum for longer than a configurable timeout. The server has no
way to tell if the other servers are actually down or if it has just been separated from them due to a
network partition, and can therefore no longer guarantee consistency with the rest of the ensemble. As
long as more than half of the ensemble is up, the cluster can continue service despite individual server
failures. When a failed server is brought back online it is synchronized with the rest of the ensemble and
can resume service.

It is best to run ZooKeeper ensemble with an odd number of server; typical ensemble sizes are three,
five, or seven. For instance, if user runs five servers and three are down, the cluster will be unavailable
(so user can have one server down for maintenance and still survive an unexpected failure). If user runs
six servers, however, the cluster is still unavailable after three failures but the chance of three
simultaneous failures is now slightly higher. Also remember that as user adds more servers, user may be
able to tolerate more failures, but user also may begin to have lower write throughput.

Application Areas & Usages:

 Apache ZooKeeper is already used by Apache HBase, HDFS, and other Apache Hadoop projects
to provide highly-available services and, in general, to make distributed programming easier.
 Group membership and name services.

 Distributed mutexes and master election.

 Asynchronous message passing and event broadcasting.

 Centralized configuration management – this feature is leveraged by Red Hat JBoss Fuse for
maintaining Fuse Fabric Registry. In general, a fabric is set of centrally control nodes/containers
which represent JBoss Fuse instances. Here Apache ZooKeeper is used to store the cluster
configuration and node registration.

 User needs to have Java 6 installed in order to run Apache ZooKeeper however client bindings
are available in several different languages. Refer
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZKClientBindings for details.

 Who all are using Apache ZooKeeper – a list of current users/supporters and their usage areas
are briefly mentioned @ http://wiki.apache.org/hadoop/ZooKeeper/PoweredBy and
https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy.

Other Important Links & references:

 http://en.wikipedia.org/wiki/Apache_ZooKeeper
 http://zookeeper.apache.org/
 https://cwiki.apache.org/confluence/display/ZOOKEEPER/Index
 http://www.ibm.com/developerworks/library/bd-zookeeper/
 https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Fuse/6.1/html/
Fabric_Guide/files/PartBasic.html

You might also like