Welcome to Scribd!

04 Kafka

Uploaded by

0% found this document useful (0 votes)

1 views14 pages

Kafka is a fast, scalable, and durable distributed publish-subscribe messaging system. It is used by many large companies for application logging, metrics, and real-time data processing. Kafka allows for high throughput by writing data directly to disk buffers and sending data efficiently from buffers to sockets. It partitions topics across brokers for parallelism and replicates partitions for fault tolerance. Producers write to partitions while consumers track offsets to read from partitions.

Original Description:

Original Title

04-Kafka

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

1 views14 pages

04 Kafka

Uploaded by

PHƯƠNG THẢO

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 14

Search inside document

Apache Kafka

CMSC 491
Hadoop-Based Distributed Computing
Spring 2016
Adam Shook
Overview
• Kafka is a “publish-subscribe messaging
rethought as a distributed commit log”
• Fast
• Scalable
• Durable
• Distributed
Kafka adoption and use cases
• LinkedIn: activity streams, operational metrics, data bus
– 400 nodes, 18k topics, 220B msg/day (peak 3.2M msg/s), May 2014
• Netflix: real-time monitoring and event processing
• Twitter: as part of their Storm real-time data pipelines
• Spotify: log delivery (from 4h down to 10s), Hadoop
• Loggly: log collection and processing
• Mozilla: telemetry data
• Airbnb, Cisco, Gnip, InfoChimps, Ooyala, Square, Uber, …

3
How fast is Kafka?
• “Up to 2 million writes/sec on 3 cheap machines”
– Using 3 producers on 3 different machines, 3x async replication
• Only 1 producer/machine because NIC already saturated
• Sustained throughput as stored data grows
– Slightly different test config than 2M writes/sec above.

4
Why is Kafka so fast?
• Fast writes:
– While Kafka persists all data to disk, essentially all writes go to the
page cache of OS, i.e. RAM.

• Fast reads:
– Very efficient to transfer data from page cache to a network socket
– Linux: sendfile() system call

• Combination of the two = fast Kafka!

– Example (Operations): On a Kafka cluster where the consumers are mostly
caught up you will see no read activity on the disks as they will be serving data
entirely from cache.

5
A first look
• The who is who
– Producers write data to
brokers.
– Consumers read data from
brokers.
– All this is distributed.

• The data
– Data is stored in topics.
– Topics are split into partitions,
which are replicated.

6
A first look

7
Topics
• Topic: feed name to which messages are published
– Example: “zerg.hydra”
Kafka prunes “head” based on age or max size or “key”

Producer A1
Kafka topic
Producer A2
…
… new
Producer An
Older msgs Newer msgs

Producers always append to “tail”

(think: append to a file)

Broker(s)

8
Topics
Consumer group C1 Consumers use an “offset pointer” to
track/control their read progress
(and decide the pace of consumption)
Consumer group C2

Producer A1
Producer A2
…
… new
Producer An
Older msgs Newer msgs

Producers always append to “tail”

(think: append to a file)

Broker(s)

9
Partitions
• A topic consists of partitions.
• Partition: ordered + immutable sequence of messages
that is continually appended to

10
Partitions
• #partitions of a topic is configurable
• #partitions determines max consumer (group) parallelism
– cf. parallelism of Storm’s KafkaSpout via builder.setSpout(,,N)

– Consumer group A, with 2 consumers, reads from a 4-partition topic

– Consumer group B, with 4 consumers, reads from the same topic

11
Partition offsets
• Offset: messages in the partitions are each assigned a unique
(per partition) and sequential id called the offset
– Consumers track their pointers via (offset, partition, topic) tuples

Consumer group C1

12
Replicas of a partition
• Replicas: “backups” of a partition
– They exist solely to prevent data loss.
– Replicas are never read from, never written to.
• They do NOT help to increase producer or consumer
parallelism!
– Kafka tolerates (numReplicas - 1) dead brokers
before losing data
• LinkedIn: numReplicas == 2  1 broker can die

13
Kafka Quickstart
• Steps for downloading Kafka, starting a server,
and creating a console-based consumer/producer
• Requires ZooKeeper to be installed and running
• https://kafka.apache.org/documentation.html#
quickstart
• https://github.com/adamjshook/hadoop-demos/t
ree/master
/kafka

Introduction To Apache Kafka: Chris Curtin Head of Technical Research Atlanta Java Users Group March 2013
Document48 pages
Introduction To Apache Kafka: Chris Curtin Head of Technical Research Atlanta Java Users Group March 2013
Robert Dennyson
No ratings yet
Kafka
Document7 pages
Kafka
Nouhaila
No ratings yet
A Visual Introduction To Apache Kafka PDF
Document84 pages
A Visual Introduction To Apache Kafka PDF
tim2421
No ratings yet
Introduction To Apache Kafka
Document18 pages
Introduction To Apache Kafka
Bhavin Bhadran
No ratings yet
Kafka For Beginners
Document77 pages
Kafka For Beginners
spjb33
No ratings yet
2012 Workshop Tues NVME Windows PDF
Document27 pages
2012 Workshop Tues NVME Windows PDF
aj
No ratings yet
Configuring Kafka For High Throughput
Document11 pages
Configuring Kafka For High Throughput
nilesh86378
No ratings yet
Real Time Analytics With Apache Kafka and Spark: Rahul Jain
Document54 pages
Real Time Analytics With Apache Kafka and Spark: Rahul Jain
Sudhanshoo Saxena
No ratings yet
Kafka My Kafka Note v67
Document55 pages
Kafka My Kafka Note v67
abhi garg
No ratings yet
Apache Kafka
Document32 pages
Apache Kafka
varam10
No ratings yet
Apache Kafka
Document37 pages
Apache Kafka
Thomas Varghese
No ratings yet
Dataeng-Zoomcamp - 6 - Streaming - MD at Main Ziritrion - Dataeng-Zoomcamp GitHub
Document30 pages
Dataeng-Zoomcamp - 6 - Streaming - MD at Main Ziritrion - Dataeng-Zoomcamp GitHub
Ashiq K
No ratings yet
(Doi 10.1007 - 978-3-319-63962-8 - 196-1) Sakr, Sherif Zomaya, Albert - Encyclopedia of Big Data Technologies - Apache Kafka
Document8 pages
(Doi 10.1007 - 978-3-319-63962-8 - 196-1) Sakr, Sherif Zomaya, Albert - Encyclopedia of Big Data Technologies - Apache Kafka
Luana Santos
No ratings yet
Adaptation For Mobile Data Access: (DM1 & PL1)
Document40 pages
Adaptation For Mobile Data Access: (DM1 & PL1)
Suraj Shah
No ratings yet
Cours - Kafka
Document72 pages
Cours - Kafka
nadir nadjem
No ratings yet
File:///E:/Profwork/Dataengineer - Course/File/Literation/Data Pipelines With Oskari Saarenmaa Postgresql Amp Kafka PDF
Document3 pages
File:///E:/Profwork/Dataengineer - Course/File/Literation/Data Pipelines With Oskari Saarenmaa Postgresql Amp Kafka PDF
Achmad Ardi
No ratings yet
Apache Kafka Description
Document36 pages
Apache Kafka Description
Roy Antonius
No ratings yet
Spark Kafkaintegration PDF
Document71 pages
Spark Kafkaintegration PDF
ehenry
100% (1)
Untitled
Document21 pages
Untitled
Supriti Vats
No ratings yet
Implementing SAN & NAS With Linux: by Mark Manoukian & Roy Koh
Document66 pages
Implementing SAN & NAS With Linux: by Mark Manoukian & Roy Koh
elies_jabri7874
No ratings yet
Spark Streaming
Document46 pages
Spark Streaming
aryandeo2011
No ratings yet
Large Scale Data Pipelines
Document91 pages
Large Scale Data Pipelines
gopinathmaruthachala
No ratings yet
BSM 461 Introduction To Big Data: Kevser Ovaz Akpınar, PHD
Document26 pages
BSM 461 Introduction To Big Data: Kevser Ovaz Akpınar, PHD
Nermin Kaya
No ratings yet
Kafka: Big Data Huawei Course
Document14 pages
Kafka: Big Data Huawei Course
Thiago Siqueira
No ratings yet
Redhat 5
Document121 pages
Redhat 5
Aharonsuryawanshi
No ratings yet
Cma Cicd
Document44 pages
Cma Cicd
Sureka Swaminathan
No ratings yet
Project Slides
Document58 pages
Project Slides
Patronum Projeto
No ratings yet
F 1330 Narkhede Kafka
Document37 pages
F 1330 Narkhede Kafka
Mark Szabo
No ratings yet
Linux Introduction
Document20 pages
Linux Introduction
Have No Idea
No ratings yet
Kafka101training Public v2 140818033637 Phpapp01
Document119 pages
Kafka101training Public v2 140818033637 Phpapp01
hotsync101
No ratings yet
Cold Box Advanced Techniques Workshop
Document31 pages
Cold Box Advanced Techniques Workshop
Yo!man
No ratings yet
The Google File System: Kenneth Chiu
Document40 pages
The Google File System: Kenneth Chiu
sushmsn
No ratings yet
Spectrum Scale Stretched Cluster Best Practices
Document61 pages
Spectrum Scale Stretched Cluster Best Practices
Alexandre Reis
No ratings yet
Benchmarking Apache Kafka - 2 Million Writes Per Second (On Three Cheap Machines) - LinkedIn Engineering
Document9 pages
Benchmarking Apache Kafka - 2 Million Writes Per Second (On Three Cheap Machines) - LinkedIn Engineering
Finigan Joyce
No ratings yet
Getting Started Guide: 1 CS Undergraduate Environment
Document6 pages
Getting Started Guide: 1 CS Undergraduate Environment
lindsay liu
No ratings yet
SVN Management Using XCode SCM
Document21 pages
SVN Management Using XCode SCM
Jayakrishnan Melethil
No ratings yet
L20 Cassandra - Fa12
Document27 pages
L20 Cassandra - Fa12
Brenda Magg
No ratings yet
Accessing The System
Document32 pages
Accessing The System
Shiv
No ratings yet
Kafka
Document10 pages
Kafka
Abhinav Singh
No ratings yet
Cloudurable Kafka Tutorial v1 PDF
Document79 pages
Cloudurable Kafka Tutorial v1 PDF
akash b
No ratings yet
AltiumI Schematics&Simulations 2017
Document37 pages
AltiumI Schematics&Simulations 2017
kiran kumar
No ratings yet
Kafka
Document23 pages
Kafka
PHƯƠNG THẢO
No ratings yet
Documentation
Document105 pages
Documentation
Sumit Kumar Awkash
No ratings yet
Kafka
Document45 pages
Kafka
Jason Gomez
No ratings yet
Version Control and Subversion (SVN) : Slides Created by Marty Stepp, Modified by Josh Goodwin
Document20 pages
Version Control and Subversion (SVN) : Slides Created by Marty Stepp, Modified by Josh Goodwin
tsitokelyko
No ratings yet
Getting Started With Apache Kafka
Document21 pages
Getting Started With Apache Kafka
ancgate
No ratings yet
Apache Kafka
Document33 pages
Apache Kafka
dharmareddyr
100% (1)
Kafka Interview Q&A
Document28 pages
Kafka Interview Q&A
Rushi Khandare
No ratings yet
5a - Streaming Data Analytics PDF
Document37 pages
5a - Streaming Data Analytics PDF
23522020 Danendra Athallariq Harya P
No ratings yet
Linux Introduction: - Dinesh Gupta - ICGEB, India
Document20 pages
Linux Introduction: - Dinesh Gupta - ICGEB, India
yamuna
No ratings yet
Kafka Netdb 06 2011 PDF
Document15 pages
Kafka Netdb 06 2011 PDF
aarishg
No ratings yet
Guide To Cadence
Document33 pages
Guide To Cadence
Chew Zheng Jun
No ratings yet
Using Kafka For Real Time Data Ingestion With .NET KevinFeasel
Document33 pages
Using Kafka For Real Time Data Ingestion With .NET KevinFeasel
Kohinata Minoru
No ratings yet
ns-3 Introduction: Advanced Network Communications
Document76 pages
ns-3 Introduction: Advanced Network Communications
Mahmoud Bersali
No ratings yet
CVS Training
Document23 pages
CVS Training
satishabbaraju
No ratings yet
Apache Kafka
Document38 pages
Apache Kafka
fjaimesilva
No ratings yet
Apache Kafka 101
Document25 pages
Apache Kafka 101
Satya Sworup Nayak
No ratings yet
Interview Question
Document24 pages
Interview Question
Anil Yarlagadda
No ratings yet
Kafka in Action
From Everand
Kafka in Action
Dylan Scott
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Sample Paper - CS403P
Document10 pages
Sample Paper - CS403P
azaimasanaullah464
No ratings yet
Lab Task 02
Document3 pages
Lab Task 02
Syed Ali Abbas Naqvi
No ratings yet
Drupal Content Entity 8.0
Document2 pages
Drupal Content Entity 8.0
Bruno Pastorelli
No ratings yet
Abitha J - Resume
Document1 page
Abitha J - Resume
Nethaji spt
No ratings yet
Unit 4
Document5 pages
Unit 4
Prince Rathore
No ratings yet
A Modified Hierarchical Attribute-Based Encryption Access Control
Document5 pages
A Modified Hierarchical Attribute-Based Encryption Access Control
MSM
No ratings yet
Data Driven CW3 COTS Solution
Document14 pages
Data Driven CW3 COTS Solution
ahsich
No ratings yet
50 SQL Query Questions You Should Practice For Interview
Document22 pages
50 SQL Query Questions You Should Practice For Interview
Raihan
No ratings yet
Mastering MariaDB Sample Chapter
Document33 pages
Mastering MariaDB Sample Chapter
Packt Publishing
No ratings yet
Database Management Systems: Dont Do Research On Any Topic
Document38 pages
Database Management Systems: Dont Do Research On Any Topic
maruthi631
No ratings yet
Power Bi Quiz
Document4 pages
Power Bi Quiz
Gutsy Innovation
No ratings yet
DP 4 3 Practice
Document4 pages
DP 4 3 Practice
Jamie Campbell
No ratings yet
Oracle 11g Murali Naresh Technology
Document127 pages
Oracle 11g Murali Naresh Technology
saket123india
83% (12)
Mkdir Command
Document3 pages
Mkdir Command
11-B ,Niranjan Dorage
No ratings yet
SAS94 M6 30SEP2021 Win X64 WRKSTN SRV
Document3 pages
SAS94 M6 30SEP2021 Win X64 WRKSTN SRV
Siti Damayanti
No ratings yet
U39 DFD
Document5 pages
U39 DFD
Anuradha Gayathri Mampitiya Arachchi
100% (1)
SQL Iq Questions
Document18 pages
SQL Iq Questions
pramod mg
No ratings yet
R5 ES AP270 Conversion Strategy Final
Document29 pages
R5 ES AP270 Conversion Strategy Final
Alfonso Fernandez
No ratings yet
Presenting and Interpreting Data in Tabular and Graphical Forms
Document13 pages
Presenting and Interpreting Data in Tabular and Graphical Forms
Biway Regala
No ratings yet
Telerik OpenAccess ORM Provides Support For PostgreSQL Database
Document3 pages
Telerik OpenAccess ORM Provides Support For PostgreSQL Database
rudolphep
No ratings yet
Unit 1 Lesson 1: Describing The Processing of ABAP Programs
Document4 pages
Unit 1 Lesson 1: Describing The Processing of ABAP Programs
Vishember D Maheshwari
No ratings yet
Diagrama Arquitectura
Document1 page
Diagrama Arquitectura
Miguel Romero
No ratings yet
Nikhil Vijay
Document35 pages
Nikhil Vijay
hv6083077
No ratings yet
Basic SQL Quiz Online Test1
Document6 pages
Basic SQL Quiz Online Test1
shrinath
No ratings yet
What Is Confidentiality, Integrity, and Availability (CIA Triad) - Definition From
Document8 pages
What Is Confidentiality, Integrity, and Availability (CIA Triad) - Definition From
kvnes.vaithi
No ratings yet
New File
Document20 pages
New File
salemail
No ratings yet
Fan Clutch
Document3 pages
Fan Clutch
Hernan
No ratings yet
Davao BIOSCI 1 PDF
Document14 pages
Davao BIOSCI 1 PDF
PhilBoardResults
0% (1)
Sample Paper 1 For Solving
Document14 pages
Sample Paper 1 For Solving
Aamir
No ratings yet
Workshop Oracle12c Rac
Document91 pages
Workshop Oracle12c Rac
Carlos Alexandre Mansur
No ratings yet