You are on page 1of 11

Kafka Cluster

A Kafka cluster is a distributed system that consists of multiple Kafka brokers working
together to provide fault tolerance, scalability, and high availability for data streaming.
Here is a description of key concepts in Kafka clustering:

​ Broker:
● A Kafka broker is a single instance of a Kafka server that stores and
manages topic partitions.
● Brokers are responsible for receiving, storing, and serving messages to
producers and consumers.
● A Kafka cluster typically comprises multiple brokers.
​ Topic:
● A topic is a category or feed name to which messages are published by
producers and from which messages are consumed by consumers.
● Topics allow for the logical organization and categorization of messages
in Kafka.
​ Partition:
● A partition is a basic unit of parallelism and scalability in Kafka.
● Each topic is divided into one or more partitions, and each partition can be
hosted on a different broker.
● Partitions allow Kafka to distribute and parallelize the processing of
messages.
​ Replication:
● Kafka uses replication to provide fault tolerance and high availability.
● Each partition has one leader and multiple followers (replicas).
● Replicas ensure that if a broker or partition leader fails, another replica
can take over.
​ ZooKeeper:
● ZooKeeper is a distributed coordination service used by Kafka for
managing and maintaining metadata and cluster state.
● It helps in leader election, broker discovery, and synchronization among
Kafka brokers.
● Kafka relies on ZooKeeper for tasks such as maintaining broker liveness
and managing topic partitions.
​ Producer:
● A Kafka producer is a client application that publishes messages to Kafka
topics.
● Producers determine to which partition a message is sent based on
partitioning strategies.
​ Consumer:
● A Kafka consumer is a client application that subscribes to topics and
processes the messages produced to those topics.
● Consumers can be part of a consumer group for parallel processing and
load balancing.
​ Consumer Group:
● A consumer group is a set of consumers that cooperate to consume
messages from one or more topics.
● Each consumer in a group processes a subset of the partitions for
parallelism.
Kafka Cluster Architecture
How To Cluster Kafka With Docker Compose

In our case we deployed kafka on 6 nodes but at least we need 3 nodes to


deploy kafka cluster. Here is the kafka sample docker-compose.yml file
that we must put on each node that we have and run them one by one.
Consider that you must install Docker & Docker Compose on each node.

Sample docker-compose.yml file for Node 1:

version: '3.6'

volumes:
zookeeper-data:
driver: local
zookeeper-log:
driver: local
kafka-data:
driver: local

services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
restart: on-failure
ports:
- "2181:2181"
- "2888:2888"
- "3888:3888"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ZOOKEEPER_INIT_LIMIT: 5
ZOOKEEPER_SYNC_LIMIT: 2
ZOOKEEPER_AUTOPURGE_SNAPRETAINCOUNT: 3
ZOOKEEPER_AUTOPURGE_PURGEINTERVAL: 24
ZOOKEEPER_MAX_CLIENT_CNXNS: 0
ZOOKEEPER_SERVER_ID: 1
ZOOKEEPER_SERVERS:
"0.0.0.0:2888:3888;kafka-2:2888:3888;kafka-3:2888:3888;kafka-4:2888:3888;kafka-5:2
888:3888;kafka-6:2888:3888"

extra_hosts:
- "kafka-1:{kafka-node-1-ip}"
- "kafka-2:{kafka-node-2-ip}"
- "kafka-3:{kafka-node-3-ip}"
- "kafka-4:{kafka-node-4-ip}"
- "kafka-5:{kafka-node-5-ip}"
- "kafka-6:{kafka-node-6-ip}"

volumes:
- zookeeper-data:/var/lib/zookeeper/data:Z
- zookeeper-log:/var/lib/zookeeper/log:Z

kafka:
image: confluentinc/cp-kafka:7.5.0
restart: on-failure
ports:
- "{kafka-port}:9092"
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
ZOOKEEPER_SASL_ENABLED: "false"
KAFKA_ZOOKEEPER_CONNECT:"kafka-1:2181,kafka-2:2181,kafka-3:2181,kafka-4:21
81,kafka-5:2181,kafka-6:2181"
KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS: 6000
KAFKA_LISTENERS: 'SASL_PLAINTEXT://:9092'
KAFKA_ADVERTISED_LISTENERS: 'SASL_PLAINTEXT://kafka-1:{kafka-port}'
KAFKA_INTER_BROKER_LISTENER_NAME: SASL_PLAINTEXT
KAFKA_SASL_ENABLED_MECHANISMS: PLAIN
KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL: PLAIN
KAFKA_OPTS: "-Djava.security.auth.login.config=/etc/kafka/server-jaas.conf"
KAFKA_AUTHORIZER_CLASS_NAME: "kafka.security.authorizer.AclAuthorizer"
KAFKA_SUPER_USERS: "User:admin"
KAFKA_MESSAGE_MAX_BYTES: 100000000
KAFKA_REPLICA_FETCH_MAX_BYTES: 10485760
KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'false'
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
KAFKA_LOG_RETENTION_HOURS: 168
KAFKA_LOG_SEGMENT_BYTES: 1073741824
KAFKA_SSL_ENABLED_PROTOCOLS: "TLSv1.2,TLSv1.1,TLSv1"

volumes:
- kafka-data:/var/lib/kafka/data:Z
- ./config/server-jaas.conf:/etc/kafka/server-jaas.conf

extra_hosts:
- "kafka-1:{kafka-node-1-ip}"
- "kafka-2:{kafka-node-2-ip}"
- "kafka-3:{kafka-node-3-ip}"
- "kafka-4:{kafka-node-4-ip}"
- "kafka-5:{kafka-node-5-ip}"
- "kafka-6:{kafka-node-6-ip}"

Docker Compose file Description

At first we have these data volumes that must be persist:

zookeeper-data
zookeeper-log
kafka-data

So we have created a docker volume for each one.

Zookeeper Service :

- ZOOKEEPER_SERVERS:
"0.0.0.0:2888:3888;kafka-2:2888:3888;kafka-3:2888:3888;kafka-4:2888:3888;kaf
ka-5:2888:3888;kafka-6:2888:3888"
For this var we must set “0.0.0.0:2888:3888” ip:port for each node
number. For example for Node 2 it will be look like this :

- ZOOKEEPER_SERVERS:
"kafka-1:2888:3888;0.0.0.0:2888:3888;kafka-3:2888:3888;kafka-4:2888:3888;kaf
ka-5:2888:3888;kafka-6:2888:3888"

We set “0.0.0.0” (which means it listens on all available network interfaces) for it’s
network interface.

The other variables definition are as below:

ZOOKEEPER_CLIENT_PORT (Default: 2181):

● The port on which ZooKeeper clients (applications interacting with


ZooKeeper) connect to the ZooKeeper ensemble. The default is 2181.

ZOOKEEPER_TICK_TIME (Default: 2000):

● The basic time unit in milliseconds used by ZooKeeper for heartbeats and
timeouts. It defines the length of a single tick. The default is 2000
milliseconds (2 seconds).

ZOOKEEPER_INIT_LIMIT (Default: 5):

● The amount of time (in ticks) that a ZooKeeper follower is allowed to


initialize (sync with leader). It is multiplied by the tick time to calculate
the total time allowed for initialization.

ZOOKEEPER_SYNC_LIMIT (Default: 2):

● The maximum amount of time (in ticks) that a follower is allowed to be


out of sync with the leader. Similar to ZOOKEEPER_INIT_LIMIT, it is
multiplied by the tick time.
ZOOKEEPER_AUTOPURGE_SNAPRETAINCOUNT (Default: 3):

● The number of snapshots to retain in the dataDir and txnLogDir.


ZooKeeper maintains snapshots of its data for backup purposes. This
setting determines how many snapshots are retained.

ZOOKEEPER_AUTOPURGE_PURGEINTERVAL (Default: 24):

● The time interval (in hours) between each purge of old snapshots.
ZooKeeper will automatically purge old snapshots to free up disk space.

ZOOKEEPER_MAX_CLIENT_CNXNS (Default: 0):

● The maximum number of client connections that a ZooKeeper server will


allow simultaneously. A value of 0 means unlimited connections.

ZOOKEEPER_SERVER_ID (Default: 1):

● A unique identifier for a ZooKeeper server in the ensemble. Each server in


the ensemble should have a unique ID.

And then we set these extra hosts for set our nodes hostnames in docker
container:

extra_hosts:
- "kafka-1:{kafka-node-1-ip}"
- "kafka-2:{kafka-node-2-ip}"
- "kafka-3:{kafka-node-3-ip}"
- "kafka-4:{kafka-node-4-ip}"
- "kafka-5:{kafka-node-5-ip}"
- "kafka-6:{kafka-node-6-ip}"
Kafka Service :

​ KAFKA_BROKER_ID: 1
● The unique identifier for the Kafka broker within the Kafka cluster. Each
broker in the cluster should have a distinct broker ID.
​ ZOOKEEPER_SASL_ENABLED: "false"
● Indicates whether SASL (Simple Authentication and Security Layer) is
enabled for communication with ZooKeeper. In this case, it is set to false,
meaning SASL is not enabled.
​ KAFKA_ZOOKEEPER_CONNECT:
"kafka-1:2181,kafka-2:2181,kafka-3:2181,kafka-4:2181,kafka-5:2181,kafka-6:2181"
● Specifies the connection string for ZooKeeper, listing the hostnames and
ports of the ZooKeeper servers.
​ KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS: 6000
● The timeout (in milliseconds) for connecting to ZooKeeper.
​ KAFKA_LISTENERS: 'SASL_PLAINTEXT://:9092'
● Defines the listener and port configuration for Kafka. In this case, it's set
to SASL_PLAINTEXT on port 9092.
​ KAFKA_ADVERTISED_LISTENERS: 'SASL_PLAINTEXT://kafka-1:{kafka-port}'
● Specifies the advertised listener, which is the address or hostname to be
given to producers and consumers for connecting to this broker.
​ KAFKA_INTER_BROKER_LISTENER_NAME: SASL_PLAINTEXT
● The listener name used for communication between Kafka brokers
(inter-broker communication).
​ KAFKA_SASL_ENABLED_MECHANISMS: PLAIN
● Specifies the SASL mechanism used for authentication. In this case, it's
set to PLAIN.
​ KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL: PLAIN
● Specifies the SASL mechanism used for inter-broker communication.
​ KAFKA_OPTS: "-Djava.security.auth.login.config=/etc/kafka/server-jaas.conf"
● Java system property setting to configure the login configuration for SASL
authentication.
​ KAFKA_AUTHORIZER_CLASS_NAME: "kafka.security.authorizer.AclAuthorizer"
● Specifies the class implementing the authorizer interface for access control.
​ KAFKA_SUPER_USERS: "User:admin"
● Defines super users who have full access to all resources, specified in the format
<UserPrincipal>:<superuser1>,<superuser2>,....
​ KAFKA_MESSAGE_MAX_BYTES: 100000000
● The maximum size in bytes for a Kafka message.
​ KAFKA_REPLICA_FETCH_MAX_BYTES: 10485760
● The maximum number of bytes of messages to attempt to fetch for each
partition from the leader during replication.
​ KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'false'
● Specifies whether automatic topic creation is enabled. If set to false, topics must
be created explicitly before they can be used.
​ KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
● The replication factor for the internal Kafka topic used to store consumer offsets.
​ KAFKA_LOG_RETENTION_HOURS: 168
● The number of hours to retain log segments for a topic.
​ KAFKA_LOG_SEGMENT_BYTES: 1073741824
● The maximum size of a log segment file for a Kafka topic.
​ KAFKA_SSL_ENABLED_PROTOCOLS: "TLSv1.2,TLSv1.1,TLSv1"
● The list of SSL/TLS protocols enabled for secure communication.

These configurations control various aspects of Kafka's behavior, security, and


resource usage within a Kafka broker. Adjustments to these values should be
made based on your specific requirements and security policies.
And our server-jaas.conf file contains:

KafkaServer {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="test"
password="test"
user_admin="test"
user_test="password"
};
KafkaClient {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="test"
password="test";
};

You might also like