Apache Kafka

Apache Kafka
Setup on Mac with Brew

• Install Java 8 and verify
• java -version
• Install Kafka
• brew install kafka
• Configure Kafka
• vi /usr/local/etc/kafka/server.properties
• listeners=PLAINTEXT://localhost:9092
• Start Zookeeper
• zookeeper-server-start /usr/local/etc/kafka/
zookeeper.properties
• Start Kafka Server
• kafka-server-start /usr/local/etc/kafka/server.properties
Setup on Mac with Brew
• Create Topic
• kafka-topics --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic test
• List Topics
• kafka-topics --list --zookeeper localhost:2181
• Run Producer
• kafka-console-producer --broker-list localhost:
9092 --topic test
• Run Consumer
• kafka-console-consumer --bootstrap-server
localhost:9092 --topic test --from-beginning
Setup on Ubuntu
• Connect to an Ubuntu AWS EC2 instance
• ssh -i .ssh/chef.pem ubuntu@35.175.179.37
• Install java
• apt-get update
• apt-get install <openjdk>
• Install Kafka
• sudo su
• wget https://archive.apache.org/dist/kafka/2.0.0/kafka_2.12-2.0.0.tgz
• tar -xvf kafka_2.12-2.0.0.tgz
• cd kafka_2.12-2.0.0
• Start Zookeeper
• ./bin/zookeeper-server-start,h ./config/zookeeper.properties
• Configure Kafka
• vi ./config/server.properties
• Start Kafka Server
• ./bin/kafka-server-start.sh ./config/server.properties
Setup on Ubuntu
• Create Topic
• ./bin/kafka-topics --create --zookeeper localhost:2181 --
replication-factor 1 --partitions 1 --topic test
• List Topics
• ./bin/kafka-topics.sh --list --zookeeper localhost:2181
• Run Producer
• ./bin/kafka-console-producer --broker-list localhost:9092 --
topic test
• Run Consumer
• ./bin/kafka-console-consumer --bootstrap-server localhost:
9092 --topic test --from-beginning
Confluent Cloud
• Signup with Confluent
• https://www.confluent.io/confluent-cloud/#sign-up
• Login to Confluent Cloud Console
• Create Cluster in default environment
• name: glarimy
• Install Confluent Cloud Client
• curl -L https://cnfl.io/ccloud-cli | sh -s -- -b /usr/local/bin
• Login to Confluent Cloud from terminal
• ccloud login
• Use the cluster
• ccloud kafka cluster list
• ccloud kafka cluster use <cluster-id>
• Create, make note of API Key and use it
• ccloud api-key create
• ccloud api-key use <API-KEY>
• Create and use test topic
• ccloud kafka topic create test
• ccloud kafka topic producer test
• ccloud kafka topic consume -b test
Single Node - Multiple Brokers
• Move to Kafka Configuration
• cd /usr/local/etc/kafka (or) cd kafka_2.12-2.0.0/config
• Create configurations for two more brokers
• touch server-1.properties
• broker.id=1
• log.dirs=/usr/local/var/lib/kafka-logs-1
• touch server-2.properties
• broker.id=2
• log.dirs=/usr/local/var/lib/kafka-logs-2
• Start zookeeper and servers
• zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties
• kafka-server-start /usr/local/etc/kafka/server.properties
• kafka-server-start /usr/local/etc/kafka/server-1.properties
• kafka-server-start /usr/local/etc/kafka/server-2.properties
Single Node - Multiple Brokers
• Create Topic
• kafka-topics --create --zookeeper localhost:2181 --replication-
factor 3 --partitions 1 --topic multi-broker-test
• List Topics
• kafka-topics --list --zookeeper localhost:2181
• Run Producer
• kafka-console-producer --broker-list localhost:9092 localhost:
9093 localhost:9094 --topic multi-broker-test
• Run Consumer
• kafka-console-consumer --bootstrap-server localhost:9092 --
topic multi-broker-test —from-beginning
topic multi-broker-test —from-beginning
topic multi-broker-test --from-beginning
Topic Operations
• kafka-topics.sh --zookeeper zk --create --
topic my-topic --replication-factor 1 --
partitions 1
• kafka-topics.sh --zookeeper zk --alter --
topic my-topic --partitions 16
• kafka-topics.sh --zookeeper zk --delete --
topic my-topic
• kafka-topics.sh --zookeeper zk --list
• kafka-topics.sh --zookeeper zk --describe
• kafka-topics.sh --zookeeper zk --describe --
under-replicated-partitions
Consumer Operations
• kafka-consumer-groups.sh --new-consumer --
bootstrap-server br —list
• kafka-consumer-groups.sh --zookeeper zk --
describe --group testgroup
• kafka-consumer-groups.sh --zookeeper zk --
delete --group testgroup
Config Operations
• kafka-configs.sh --zookeeper zk --alter --
entity-type topics --entity-name mytopic
--add-config <key>=<value>[,<key>=<value>…]
• kafka-configs.sh --zookeeper zk --describe --
entity-type topics --entity-name my-topic
• kafka-configs.sh --zookeeper zk --alter --
entity-type topics --entity-name my-topic --
delete-config retention.ms
Other Operations
• kafka-run-class.sh
kafka.tools.DumpLogSegments --files abc.log
• kafka-replica-verification.sh --broker-list
br1,br2 --topic-white-list 'my-.*'
Topics, Partitions and offsets
● Topics: a particular stream of Data.
− Similar to a table in database(without all the
constraints)
− You can have as many topics you want
− A topic is identified by its name
● Topics are split in partitions
− Each partition is ordered
− Each message within a partition gets an incremental
ID, called offset.
Topic example
● Say you have a fleet of trucks, each truck reports
its GPS position to Kafka.
● You can have a topic "trucks_gps” that contains
the position of all trucks.
● Each truck will send a message to kafka every 20
seconds, each message will contain the truck ID
and the truck position (latitude and longitude)
● We choose to create that topic with 10 partitions
(arbitrary number)
Topics, Partitions and offsets
● Offset only have a meaning for a specific partition.
− Eg.offset 3 in partition 0 doesn’t represent the same data as
offset 3 in partition 1
● Order is guarenteed only within a partition (not across
partitions)
● Order is kept only for a limited time (default is one week)
● Once the data is written to a partition, it cant be changed
(immutability)
● Data is assigned randomly to a partition unless a key is
provided (more on this later)
Brokers
● A kafka cluster is composed of multiple brokers (servers)
● Each broker ois identified with its ID (integer)
● Each broker contains certain topic partitons
● After connecting to any broker (called a bootstrap
broker), you will be connected to the entire cluster
● A good number to get started is 3 brokers, but big
clusters have over 100 brokers
● In these examples we choose to number brokers starting
at 100 (arbitrary)
Topic replication factor
● Topics sholud have a replication factor > 1
(usually between 2 & 3)
● This way if a broker is down, another broker can
serve the data
● Example: Topic-A with 2 partitions and replication
factor of 2
Topic replication factor
● Example: we lost Broker 102
● Result: Broker 101 & 103 can still serve the data
Concept of Leader for a Partition
● At any time only one broker can be a leader for a
given partition
● Only that leader can recieve and serve data for a
partition
● The other brokers will synchronize the data
● Therefore each partition has one ;eader and
multiple ISR (in-sync-replica)
Producers
● Producers write data to topics (which is made of
partitions)
● Producers autumatically know to which broker
and partition to write to
● In case of Broker failures, Producers will
automatically recover
Producers
● Producers can choose to recieve
acknowledgment of data writes:
− acks=0: Producer wont wait for acknowledgment
(possible data loss)
− acks=1: Producer will wait for leader
acknowledgment (limited data loss)
− acks=all: Leader+replicas acknowledgment (no data
loss)
Producers: Message Keys
● Producers can choose to send a key with the
message(string,number, etc..)
● If key=null, data is sent round robin (broker 101
then 102 then 103..)
● If key is sent then all messages for that key will
always go to the same partition
● A key is basically sent if you need message
oedering for a specific field(ex: truck_id)
Consumers
● Consumers read data from a topic (identified by
name)
● Consumers know which broker to read from
● In case broker failures, Consumers know how to
recover
● Data is read in order within each partitions
Consumer Groups
● Consumers can read data in consumer groups
● Each consumer within a group reads from
exclusive partitions
● If you have more consumers than partitions,
consumers will be inactive
Consumer Groups
What if too many consumers?
● If you have more consumers than partitions,
some consumers will be inactive
Consumer Offsets
● Kafka stores the offsets at wwhich a consumer group
has been reading
● The offset comitted live in a kafka topic named
__consumer_offsets
● When a consumer in a group has processed data
recieved from kafka, it should be committing the offset
● If a consumer dies, it will be able to read back from
where it left off thanks to the committed consumer
offsets!
Delivery Semantics for consumers
● Consumers choose when to commit offsets
● There are 3 delivery semantics:
● At most once:
− Offsets are committed as soon as the message is recieved
− If the processing goes wrong, the message will be lost (it wont be read again)
● At Least once (usually preferred):
− Offsets are committed after the message is processed
− If the processing goes wrong, the message will read again.
− This can result in duplicate processing of messages. Make sure your processing is
idempotent (i.e. processing again the messages wont impact your systems)
● Exactly once:
− Can be achieved for kafka => kafka workflows using kafka streams API
− For kafka => External system workflows, use an idempotent consumer.
Kafka Broker Discovery
● Every Kafka broker is also called a "bootstrap
server"
● That means that you only need to connect to one
broker, and you will be connected to the entire
cluster.
● Each broker knows about all brokers, topics and
partitions (metadata)
Zookeeper
● Zookeeper manages brokers (keeps a list of them)
● Zookeeper helps in performing leader election for partitions
● Zookeeper sends notifications to Kafka in case of changes (e.g.
new topic, broker dies, broker comes up, delete topics, etc....
● Kafka can't work without Zookeeper
● Zookeeper by design operates with an odd number of servers (3,
5, 7)
● Zookeeper has a leader (handle writes) the rest of the servers are
followers (handle reads)
● (Zookeeper does NOT store consumer offsets with Kafka > v0.10)
Kafka Guarantees
● Messages are appended to a topic-partition in the order they are
sent
● Consumers read messages in the order stored in a topic-partition
● With a replication factor of N, producers and consumers can
tolerate up to N-l brokers being down
● This is why a replication factor of 3 is a good idea:
− Allows for one broker to be taken down for maintenance
− Allows for another broker to be taken down unexpectedly
● As long as the number of partitions remains constant for a topic
(no new partitions), the same key will always go to the same
partition
Security
• Man-in-the-middle Attacks
• Encrypt
• Authorization
• Identity
• Authentication
• Role
• JAAS
• Principal and Role
• SSL
• Certificates and Keys
• SASL
• Simple Authentication Service Layer
• PLAINTEXT: username and password on Kafka Brokers
• SASL SCRAM: hashes on zk (no dependency on broker)
• SASL GSSAPI : AD based Kerberos
ACL
• ACL to allow
• bin/kafka-acls.sh
• --authorizer kafka.security.auth.SimpleAclAuthorizer
• --authorizer-properties zookeeper.connect=localhost:2181
• --add
• --allow-principal User:Bob
• --allow-principal User:Alice
• --allow-host Host1,Host2
• —operation Read
• --topic Test-topic
ACL
• ACL to deny
• --add
• --allow-principal User:*
• --allow-host *
• --deny-principal User:BadBob
• --deny-host bad-host
• --operation Read
ACL
• Removing ACL
• --remove
• --allow-principal User:Bob
• --allow-principal User:Alice --allow-host Host1,Host2 --
operation Read --topic Test-topic
Security
• Listing the ACL
• --list
Security
• Convenient Methods
• bin/kafka-acls.sh --authorizer
kafka.security.auth.SimpleAclAuthorizer --authorizer-
properties zookeeper.connect=localhost:2181 --add --
allow-principal User:Bob --producer --topic Test-topic
• bin/kafka-acls.sh --authorizer
kafka.security.auth.SimpleAclAuthorizer --authorizer-
properties zookeeper.connect=localhost:2181 --add --
allow-principal User:Bob --consumer --topic test-topic --
consumer-group Group-1
JMX Metrics
• Download JMXTERM
• https://docs.cyclopsgroup.org/jmxterm
• Start the interactive shell
• java -jar jmxterm-1.0.0-uber.jar
• List the domains • Use a bean

• domains • bean
• Use a domain • List the attributes
• domain <domain- • info
name> • Collect the metric
• List the beans • get <attribute>
• beans
Kafka Metrics
• kafka.server: type=ReplicaManager,
name=UnderReplicatedPartitions
• kafka.controller:type=KafkaController,name=ActiveControllerCount
• kafka.server:type=KafkaRequestHandlerPool,name=RequestHandl
erAvgIdlePercent
• kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
• kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
• kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
• kafka.server:type=ReplicaManager,name=PartitionCount
• kafka.server:type=ReplicaManager,name=LeaderCount
• kafka.controller:type=KafkaController,name=OfflinePartitionsCount
• kafka.network:type=RequestMetrics
• kafka.log:type=Log

Apache Kafka

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Apache Kafka

Uploaded by

Copyright:

Available Formats

Apache Kafka

Setup on Mac with Brew

• List the domains • Use a bean

You might also like