Professional Documents
Culture Documents
A Kafka cluster is a distributed system that consists of multiple Kafka brokers working
together to provide fault tolerance, scalability, and high availability for data streaming.
Here is a description of key concepts in Kafka clustering:
Broker:
● A Kafka broker is a single instance of a Kafka server that stores and
manages topic partitions.
● Brokers are responsible for receiving, storing, and serving messages to
producers and consumers.
● A Kafka cluster typically comprises multiple brokers.
Topic:
● A topic is a category or feed name to which messages are published by
producers and from which messages are consumed by consumers.
● Topics allow for the logical organization and categorization of messages
in Kafka.
Partition:
● A partition is a basic unit of parallelism and scalability in Kafka.
● Each topic is divided into one or more partitions, and each partition can be
hosted on a different broker.
● Partitions allow Kafka to distribute and parallelize the processing of
messages.
Replication:
● Kafka uses replication to provide fault tolerance and high availability.
● Each partition has one leader and multiple followers (replicas).
● Replicas ensure that if a broker or partition leader fails, another replica
can take over.
ZooKeeper:
● ZooKeeper is a distributed coordination service used by Kafka for
managing and maintaining metadata and cluster state.
● It helps in leader election, broker discovery, and synchronization among
Kafka brokers.
● Kafka relies on ZooKeeper for tasks such as maintaining broker liveness
and managing topic partitions.
Producer:
● A Kafka producer is a client application that publishes messages to Kafka
topics.
● Producers determine to which partition a message is sent based on
partitioning strategies.
Consumer:
● A Kafka consumer is a client application that subscribes to topics and
processes the messages produced to those topics.
● Consumers can be part of a consumer group for parallel processing and
load balancing.
Consumer Group:
● A consumer group is a set of consumers that cooperate to consume
messages from one or more topics.
● Each consumer in a group processes a subset of the partitions for
parallelism.
Kafka Cluster Architecture
How To Cluster Kafka With Docker Compose
version: '3.6'
volumes:
zookeeper-data:
driver: local
zookeeper-log:
driver: local
kafka-data:
driver: local
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
restart: on-failure
ports:
- "2181:2181"
- "2888:2888"
- "3888:3888"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ZOOKEEPER_INIT_LIMIT: 5
ZOOKEEPER_SYNC_LIMIT: 2
ZOOKEEPER_AUTOPURGE_SNAPRETAINCOUNT: 3
ZOOKEEPER_AUTOPURGE_PURGEINTERVAL: 24
ZOOKEEPER_MAX_CLIENT_CNXNS: 0
ZOOKEEPER_SERVER_ID: 1
ZOOKEEPER_SERVERS:
"0.0.0.0:2888:3888;kafka-2:2888:3888;kafka-3:2888:3888;kafka-4:2888:3888;kafka-5:2
888:3888;kafka-6:2888:3888"
extra_hosts:
- "kafka-1:{kafka-node-1-ip}"
- "kafka-2:{kafka-node-2-ip}"
- "kafka-3:{kafka-node-3-ip}"
- "kafka-4:{kafka-node-4-ip}"
- "kafka-5:{kafka-node-5-ip}"
- "kafka-6:{kafka-node-6-ip}"
volumes:
- zookeeper-data:/var/lib/zookeeper/data:Z
- zookeeper-log:/var/lib/zookeeper/log:Z
kafka:
image: confluentinc/cp-kafka:7.5.0
restart: on-failure
ports:
- "{kafka-port}:9092"
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
ZOOKEEPER_SASL_ENABLED: "false"
KAFKA_ZOOKEEPER_CONNECT:"kafka-1:2181,kafka-2:2181,kafka-3:2181,kafka-4:21
81,kafka-5:2181,kafka-6:2181"
KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS: 6000
KAFKA_LISTENERS: 'SASL_PLAINTEXT://:9092'
KAFKA_ADVERTISED_LISTENERS: 'SASL_PLAINTEXT://kafka-1:{kafka-port}'
KAFKA_INTER_BROKER_LISTENER_NAME: SASL_PLAINTEXT
KAFKA_SASL_ENABLED_MECHANISMS: PLAIN
KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL: PLAIN
KAFKA_OPTS: "-Djava.security.auth.login.config=/etc/kafka/server-jaas.conf"
KAFKA_AUTHORIZER_CLASS_NAME: "kafka.security.authorizer.AclAuthorizer"
KAFKA_SUPER_USERS: "User:admin"
KAFKA_MESSAGE_MAX_BYTES: 100000000
KAFKA_REPLICA_FETCH_MAX_BYTES: 10485760
KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'false'
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
KAFKA_LOG_RETENTION_HOURS: 168
KAFKA_LOG_SEGMENT_BYTES: 1073741824
KAFKA_SSL_ENABLED_PROTOCOLS: "TLSv1.2,TLSv1.1,TLSv1"
volumes:
- kafka-data:/var/lib/kafka/data:Z
- ./config/server-jaas.conf:/etc/kafka/server-jaas.conf
extra_hosts:
- "kafka-1:{kafka-node-1-ip}"
- "kafka-2:{kafka-node-2-ip}"
- "kafka-3:{kafka-node-3-ip}"
- "kafka-4:{kafka-node-4-ip}"
- "kafka-5:{kafka-node-5-ip}"
- "kafka-6:{kafka-node-6-ip}"
zookeeper-data
zookeeper-log
kafka-data
Zookeeper Service :
- ZOOKEEPER_SERVERS:
"0.0.0.0:2888:3888;kafka-2:2888:3888;kafka-3:2888:3888;kafka-4:2888:3888;kaf
ka-5:2888:3888;kafka-6:2888:3888"
For this var we must set “0.0.0.0:2888:3888” ip:port for each node
number. For example for Node 2 it will be look like this :
- ZOOKEEPER_SERVERS:
"kafka-1:2888:3888;0.0.0.0:2888:3888;kafka-3:2888:3888;kafka-4:2888:3888;kaf
ka-5:2888:3888;kafka-6:2888:3888"
We set “0.0.0.0” (which means it listens on all available network interfaces) for it’s
network interface.
● The basic time unit in milliseconds used by ZooKeeper for heartbeats and
timeouts. It defines the length of a single tick. The default is 2000
milliseconds (2 seconds).
● The time interval (in hours) between each purge of old snapshots.
ZooKeeper will automatically purge old snapshots to free up disk space.
And then we set these extra hosts for set our nodes hostnames in docker
container:
extra_hosts:
- "kafka-1:{kafka-node-1-ip}"
- "kafka-2:{kafka-node-2-ip}"
- "kafka-3:{kafka-node-3-ip}"
- "kafka-4:{kafka-node-4-ip}"
- "kafka-5:{kafka-node-5-ip}"
- "kafka-6:{kafka-node-6-ip}"
Kafka Service :
KAFKA_BROKER_ID: 1
● The unique identifier for the Kafka broker within the Kafka cluster. Each
broker in the cluster should have a distinct broker ID.
ZOOKEEPER_SASL_ENABLED: "false"
● Indicates whether SASL (Simple Authentication and Security Layer) is
enabled for communication with ZooKeeper. In this case, it is set to false,
meaning SASL is not enabled.
KAFKA_ZOOKEEPER_CONNECT:
"kafka-1:2181,kafka-2:2181,kafka-3:2181,kafka-4:2181,kafka-5:2181,kafka-6:2181"
● Specifies the connection string for ZooKeeper, listing the hostnames and
ports of the ZooKeeper servers.
KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS: 6000
● The timeout (in milliseconds) for connecting to ZooKeeper.
KAFKA_LISTENERS: 'SASL_PLAINTEXT://:9092'
● Defines the listener and port configuration for Kafka. In this case, it's set
to SASL_PLAINTEXT on port 9092.
KAFKA_ADVERTISED_LISTENERS: 'SASL_PLAINTEXT://kafka-1:{kafka-port}'
● Specifies the advertised listener, which is the address or hostname to be
given to producers and consumers for connecting to this broker.
KAFKA_INTER_BROKER_LISTENER_NAME: SASL_PLAINTEXT
● The listener name used for communication between Kafka brokers
(inter-broker communication).
KAFKA_SASL_ENABLED_MECHANISMS: PLAIN
● Specifies the SASL mechanism used for authentication. In this case, it's
set to PLAIN.
KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL: PLAIN
● Specifies the SASL mechanism used for inter-broker communication.
KAFKA_OPTS: "-Djava.security.auth.login.config=/etc/kafka/server-jaas.conf"
● Java system property setting to configure the login configuration for SASL
authentication.
KAFKA_AUTHORIZER_CLASS_NAME: "kafka.security.authorizer.AclAuthorizer"
● Specifies the class implementing the authorizer interface for access control.
KAFKA_SUPER_USERS: "User:admin"
● Defines super users who have full access to all resources, specified in the format
<UserPrincipal>:<superuser1>,<superuser2>,....
KAFKA_MESSAGE_MAX_BYTES: 100000000
● The maximum size in bytes for a Kafka message.
KAFKA_REPLICA_FETCH_MAX_BYTES: 10485760
● The maximum number of bytes of messages to attempt to fetch for each
partition from the leader during replication.
KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'false'
● Specifies whether automatic topic creation is enabled. If set to false, topics must
be created explicitly before they can be used.
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
● The replication factor for the internal Kafka topic used to store consumer offsets.
KAFKA_LOG_RETENTION_HOURS: 168
● The number of hours to retain log segments for a topic.
KAFKA_LOG_SEGMENT_BYTES: 1073741824
● The maximum size of a log segment file for a Kafka topic.
KAFKA_SSL_ENABLED_PROTOCOLS: "TLSv1.2,TLSv1.1,TLSv1"
● The list of SSL/TLS protocols enabled for secure communication.
KafkaServer {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="test"
password="test"
user_admin="test"
user_test="password"
};
KafkaClient {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="test"
password="test";
};