Professional Documents
Culture Documents
%20the%20Scala%20API%20(%20PDFDrive%20).pdf
file:///E:/profWork/dataEngineer_course/file/literation/data-pipelines-with-oskari-saarenmaa-
postgresql-amp-kafka.pdf
Introduction to Kafka Kafka Fundamental Concepts
- Apache Kafka is a distributed streaming platform. - Producer (1) – the producer is an application that
- Apache Kafka is a publishing and subscribing publishes a stream of records to one or more
messaging system. It is a horizontally scalable, Kafka topics
fault – tolerant system - Consumer (2) – the consumer is an application
- Kafka is used for these purposes : that consumes a stream of records from one or
1. To build real – time streaming pipelines to get more topics and processes the published streams
data between systems or applications of records
2. To build real – time streaming applications to - Consumer group (3) – consumer label
transform or react to the streams of data themselves with a consumer group name. One
consumer instance within the group will get the
- Kafka Core Concepts message when the message is published to a
1. Kafka is run as a cluster on one or more topic
servers - Broker (4) – the broker is a server where the
2. The Kafka cluster stores streams of records in published stream of records is stored. A Kafka
categories called topics cluster can contain one or more servers
3. Each record consists of a key, a value, and a - Topics (5) – topics is the name given to the feeds
timestamp of messages
- Zookeeper (6) – Kafka uses zookeeper to
- Kafka APIs maintain and coordinate Kafka brokers. Kafka is
1. Producer API : the Producer API enables an bundled with a version of Apache Zookepeer
application to publish a stream of records to
one or more Kafka topics
2. Consumer API : the Consumer API enables an
application to subscribe to one or more topics
and process the stream of records produced to
them
3. Streams API : the Streams API allows an
application to act as a stream processor; that is,
this API converts the input streams into output
streams
4. Connector API : the Connector API allows
building and running reusable producers or
consumers. These reusable producers or
consumers can be used to connect Kafka topics
to existing applications or data systems. For
example, a connector to a relational database
might capture every change to a table
Kafka architecture Setting up the Kafka cluster
- Kafka Topics
a. We now discuss the core absraction of Kafka.
In Kafka, topics are always multisubscriber
entities.
b. A topic can have zero, one, or more
consumers.
c. For each topic, a Kafka cluster maintains a
partitioned log