You are on page 1of 9

Introduction of ZooKeeper

ZooKeeper is an open source Apache project that provides a centralized service for providing
configuration information, naming, synchronization and group services over large clusters in
distributed systems. The goal is to make these systems easier to manage with improved, more
reliable propagation of changes.

If you had a Hadoop cluster spanning 500 or more commodity servers, you would need
centralized management of the entire cluster in terms of name, group and synchronization
services, configuration management, and more. Other open source projects using Hadoop
clusters require cross-cluster services. Embedding ZooKeeper means you don’t have to build
synchronization services from scratch. Interaction with ZooKeeper occurs by way of Java™ or C
interface time.
For applications, ZooKeeper provides an infrastructure for cross-node synchronization by
maintaining status type information in memory on ZooKeeper servers. A ZooKeeper server
keeps a copy of the state of the entire system and persists this information in local log files.
Large Hadoop clusters are supported by multiple ZooKeeper servers, with a master server
synchronizing the top-level servers.
Within ZooKeeper, an application can create what is called a znode, which is a file that persists
in memory on the ZooKeeper servers. The znode can be updated by any node in the cluster, and
any node in the cluster can register to be notified of changes to that znode.
Put simply, applications can synchronize their tasks across the distributed cluster by updating
their status in a ZooKeeper znode. The znode then informs the rest of the cluster of a specific
node’s status change. This cluster-wide status centralization service is critical for management
and serialization tasks across a large distributed set of servers.

You might also like