You are on page 1of 2

Kafka

Kafka is a modern messaging tool, widely adopted in Event-driven systems. It allows


consumers and publishers to communicate through durable topics. It's highly scalable and
performant, being designed to run in redundant and fault-tolerant clusters.

Key concepts
Cluster
A group of Kafka Brokers.

Broker
A Kafka instance. It is intentionally very simple, responsible only for handling partitions and
consumption/production requests.

Event
An Event is something that happened in the past. In kafka, an event is a K/V pair. The value is
the state (the content/payload) and the key can be anything, although it's usually a simple
primitive type (such as an ID). Internally, Kafka handles events as byte sequences.

Kafka's events are immutable.

Topic
Kafka's fundamental unit. A topic is a stream of events represented as a Log. Unlike the
queues in traditional systems, logs are durable structures. Messages will live for as long as the
configuration states, regardless of their read state. This allows things such as replays or multiple
consumers for the same message.

Partition
Kafka is designed for scalable distributed systems, so topics are split across multiple partitions.

Producers may or may not specify which partition a message will go through. If the Event key is
null, Kafka will evenly distribute messages through all partitions. If it is not, the destination will
be determined by hashing the key and getting its mod .

The latter approach is essential for guaranteeing message delivery order across multiple
partitions.

The partitions are replicated through all brokers in the cluster for fault tolerance.

This concept is important for Scaling > Horizontal Scaling, since kafka will distribute
partitions through the available instances. This distribution is done automatically because each
node indentifies itself with a groupId property, allowing Kafka to monitor the state of potential
consumers.

Producer
A producer application has code that writes into a topic. The producer API is quite simple,
although it handles a lot fo complexity under the hood. Acknowledgement, partition choosing
and connection pooling are specifically decided by those publisher agents.

Consumer
A consumer application reads from a topic partition. They are split into groups (as detailed in
the Partition section, Kafka automatically handles this distribution).

Consumers will read one message at a time. Only after acknowledging a message or bach will
the consumer pull the next one.

Aggregate tools
Confluent, Apache and the community develop many tools for the Kafka ecosystem.

Kafka Streams: A library for stateful stream processing off-heap


Understand Kafka Streams
ksqlDB: A service for topics as queryable relational databases
Schema Registry: A service to maintain a catalog of schemas and define constraints for
Kafka topics to avoid runtime issues
Kafka Connect: A service that bridges Kafka to other systems, avoiding the need for
repetitive and generic consumer and producer code

Resources and links


Kafka 101 course by Confluent on YouTube
Comparison with its most relevant competitor, RabbitMQ: Use cases for Kafka and
RabbitMQ

You might also like