You are on page 1of 22

KAFKA

OVERVIEW
• Kafka introduction
• Kafka architect
• Kafka replication
• Schema Registry
• Kafka Stream
• KTable
• KStream
KAFKA INTRODUCTION
• Apache Kafka was originally developed by LinkedIn, and was
subsequently open-sourced in early 2011. 
• Apache Kafka is a distributed data store optimized for ingesting
and processing streaming data in real-time. Kafka provides
three main functions to its users:
• Publish and subscribe to streams of records
• Effectively store streams of records in the order in which records were
generated
• Process streams of records in real time
WHAT CAN I USE EVENT STREAMING FOR?

• To process payments and financial transactions in real-time,


such as in stock exchanges, banks, and insurances.
• To track and monitor cars, trucks, fleets, and shipments in real-
time, such as in logistics and the automotive industry.
• To continuously capture and analyze sensor data from IoT
devices or other equipment, such as in factories and wind
parks.
• To collect and immediately react to customer interactions and
orders, such as in retail, the hotel and travel industry, and
mobile applications.
UBER SYSTEM ARCHITECT
KAFKA ARCHITECT
• Broker nodes: Responsible for the bulk of I/O operations and durable
persistence within the cluster. A Kafka broker receives messages from
producers and stores them on disk by specific partition and keyed by
unique offset. Kafka broker allows consumers to fetch messages by topic,
partition and offset. Kafka brokers can create a Kafka cluster by sharing
information between each other directly or indirectly using Zookeeper.
• ZooKeeper nodes: is primarily used to track the status of nodes in the
Kafka cluster and maintain a list of Kafka topics and messages.
• Producers: Kafka producers handle sending message to brokers by topic
• Consumers: Client applications that read from topics.
KAFKA ARCHITECT (cont)
KAFKA ARCHITECT (cont)
KAFKA REPLICATION
• Kafka can replicate partitions across a configurable number of
Kafka servers which is used for fault tolerance.
• Each partition has a leader server and zero or more follower
servers.
•  Leaders handle all read and write requests for a partition.
• Followers replicate leaders and take over if the leader dies.
•  A follower that is in-sync is called an ISR (in-sync replica)
KAFKA ARCHITECT (cont)
SCHEMA REGISTRY
• Kafka, at its core, only transfers data in byte format. There is no
data verification that’s being done at the Kafka cluster level.
SCHEMA REGISTRY
• Schema Registry is an application that resides outside of your
Kafka cluster and handles the distribution of schemas to the
producer and consumer by storing a copy of schema in its local
cache.
• Apache Avro is an open-source binary data serialization format
that comes from the Hadoop world and has many use cases. It
offers rich data structures and offers code generation on
statically typed programming languages like C# and Java.
• Refer coding at: https://
itnext.io/howto-produce-avro-messages-to-kafka-ec0b770e1f54
SCHEMA REGISTRY
KAFKA STREAM
Kafka Streams is a client library for building applications and
microservices, where the input and output data are stored in
Kafka clusters.
KAFKA STREAM FEATURES
STREAM TOPOLOGY
• Source Processor: is a special type of
stream processor that does not have any
upstream processors. It produces an input
stream to its topology from one or multiple
Kafka topics by consuming records from these
topics and forwarding them to its down-stream
processors.
• Sink Processor: is a special type of stream
processor that does not have down-stream
processors. It sends any received records
from its up-stream processors to a specified
Kafka topic.
KAFKA STREAMS DSL
• The Kafka Streams DSL (Domain Specific Language) is built on
top of the Streams Processor API. Most data processing
operations can be expressed in just a few lines of DSL code.
• DSL supports:
• Built-in abstractions for streams and tables in the form of KStream, 
KTable, and GlobalKTable. 
• Declarative, functional programming style with stateless
transformations (e.g. map and filter) as well as stateful transformations
such as aggregations (e.g. count and reduce), joins (e.g. leftJoin)
KTABLE
• A KTable is an abstraction of a changelog stream, where each
data record represents an update. More precisely, the value in a
data record is interpreted as an "UPDATE" of the last value for
the same record key, if a corresponding key doesn't exist yet,
the update will be considered an INSERT. 
• KTable also provides an ability to look up current values of data
records by keys. This table-lookup functionality is available
through join operations
KSTREAM
• A KStream is an abstraction of a record stream.
• KStream handles the stream of records. 
SAMPLE KAFKA STREAM
SAMPLE KAFKA STREAM
//Text given
Welcome to Edureka Kafka Training.
This article is about Kafka Streams.
//Output:
Welcome(1)
to(1)
Edureka(1)
Kafka(2)
Training(1)
This(1)
article(1)
is(1)
about(1)
Streams(1)
THANK YOU

You might also like