Welcome to Scribd!

Skip carousel

Bda 07

Uploaded by

HARSH NAG

0% found this document useful (0 votes)

5 views9 pages

Original Title

201070046_BDA_07

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views9 pages

Bda 07

Uploaded by

HARSH NAG

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 9

Search inside document

Experiment 07

Big Data Analysis

Harsh Suryanath Nag

6th April, 2024
Aim
Setup and install Apache Kafka and stream realitime data from any social media website like
Twitter, Facebook, instagram etc.

Introduction
Kafka is an open source event stream processing platform based on an abstraction of a
distributed commit log. The aim of Kafka is to provide a unified, high-throughput, low-latency
platform for managing these logs, which Kafka calls “topics”. Kafka combines three main
capabilities to aid the entire process of implementing various event streaming use cases -

1. To publish (write) and subscribe to (read) streams of events.

2. To store streams of events durably and reliably for as long as you want.
3. To process streams of events as they occur or retrospectively.

As previously mentioned, the founders Confluent are the original developers of Apache Kafka,
and offer an alternative and significantly more complete distribution of Kafka together with
Confluent Platform. Many of the additional features present in Confluent’s distribution of Kafka
are available to use for free, and are what Confluent refers to as “community components''. For
our example application, we will be using one of these community components by Confluent, but
rely on the standard Apache Kafka distribution for our underlying Kafka cluster, as it is more than
capable of supporting our system.

Advantages

Kafka has numerous advantages. Today, Kafka is used by over 80% of the Fortune 100 across
virtually every industry, for countless use cases big and small. It is the de facto technology
developers and architects use to build the newest generation of scalable, real-time data
streaming applications. While these can be achieved with a range of technologies available in the
market, below are the main reasons Kafka is so popular.

1. High Throughput

Capable of handling high-velocity and high-volume data, Kafka can handle millions of
messages per second.

2. High Scalability

1
Scale Kafka clusters up to a thousand brokers, trillions of messages per day, petabytes of
data, hundreds of thousands of partitions. Elastically expand and contract storage and
processing.

3. Low Latency

Can deliver these high volumes of messages using a cluster of machines with latencies as
low as 2ms.

4. Permanent Storage

Safely, securely store streams of data in a distributed, durable, reliable, fault-tolerant

cluster

5. High Availability

Extend clusters efficiently over availability zones or connect clusters across geographic
regions, making Kafka highly available and fault tolerant with no risk of data loss.

How Kafka Works

Apache Kafka consists of a storage layer and a compute layer that combines efficient, real-time
data ingestion, streaming data pipelines, and storage across distributed systems. In short, this
enables simplified data streaming between Kafka and external systems, so you can easily
manage real-time data and scale within any type of infrastructure.

2
Procedure
Start Cassandra network

Start Kafka cluster

3
Kafka-Connect REST

Setup auth using creds from docker (Kafka front end)

Adding a cluster

4
5
6
Using Confluent’s community Kafka-Cassandra connector installed in a Kafka Connect instance
that runs in a container polling both Twitter and OpenWeatherMap topics to save the data into
Cassandra

Activating producers

7
Conclusion
After successfully installing Apache Kafka, we leveraged its robust framework to stream real-time
data not only from Twitter but also from weather sources. This setup enabled efficient processing
and analysis of diverse data streams, enhancing our ability to derive insights and make informed
decisions.

Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Kafka Reference Architecture
Document12 pages
Kafka Reference Architecture
mbhangale
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Apache Kafka 101
Document25 pages
Apache Kafka 101
Satya Sworup Nayak
No ratings yet
Apache Kafka Introduction
Document21 pages
Apache Kafka Introduction
Umer Farooq
No ratings yet
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
Document23 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
Mahesh VP
No ratings yet
Understanding Apache Kafka White Paper
Document7 pages
Understanding Apache Kafka White Paper
Виталий Туткевич
No ratings yet
01 - Chapter Introduction To AMQ Streams
Document10 pages
01 - Chapter Introduction To AMQ Streams
Martin Bassi
No ratings yet
Instaclustr Understanding Apache Kafka White Paper
Document8 pages
Instaclustr Understanding Apache Kafka White Paper
serge tango
No ratings yet
Apache Kafka - Introduction - Tutorialspoint
Document3 pages
Apache Kafka - Introduction - Tutorialspoint
local geek
No ratings yet
Learning Apache Kafka - Second Edition - Sample Chapter
Document12 pages
Learning Apache Kafka - Second Edition - Sample Chapter
Packt Publishing
No ratings yet
BDA Lab A7
Document10 pages
BDA Lab A7
the.quote.villa
No ratings yet
Documentation
Document105 pages
Documentation
Sumit Kumar Awkash
No ratings yet
Apache Kafka Confluent Enterprise Ref Architecture
Document17 pages
Apache Kafka Confluent Enterprise Ref Architecture
Mouhamadou Naby DIA
No ratings yet
Fast Lane - RH-EX482
Document3 pages
Fast Lane - RH-EX482
Nemanja Protić
No ratings yet
Apache Kafka Long Polling
Document20 pages
Apache Kafka Long Polling
Kaustubh Negi
No ratings yet
Apache Kafka Documentation
Document419 pages
Apache Kafka Documentation
deal catcher rye
No ratings yet
Assignment 0
Document5 pages
Assignment 0
Aayush Kumar
No ratings yet
Kafka: Big Data Huawei Course
Document14 pages
Kafka: Big Data Huawei Course
Thiago Siqueira
No ratings yet
Building A Replicated Logging System With Apache Kafka
Document2 pages
Building A Replicated Logging System With Apache Kafka
Luana Santos
No ratings yet
Sponsored Dzone Refcard 254 Apache Kafka Essential
Document8 pages
Sponsored Dzone Refcard 254 Apache Kafka Essential
KP S
No ratings yet
Kafka
Document50 pages
Kafka
Emanuele Parente
No ratings yet
Real-Time Streaming in Big Data: Kafka and Spark With Singlestore
Document23 pages
Real-Time Streaming in Big Data: Kafka and Spark With Singlestore
visaobuon
No ratings yet
Apache Kafka - Introduction
Document2 pages
Apache Kafka - Introduction
mapa2509
No ratings yet
Kafka and Mongodb
Document15 pages
Kafka and Mongodb
Mani Yangkatisal
No ratings yet
Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale - Neha Narkhede
Document4 pages
Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale - Neha Narkhede
zinozoca
0% (1)
Getting Started With Apache Kafka in Python - Towards Data Science PDF
Document17 pages
Getting Started With Apache Kafka in Python - Towards Data Science PDF
Deven Mali
No ratings yet
Kafka101training Public v2 140818033637 Phpapp01
Document119 pages
Kafka101training Public v2 140818033637 Phpapp01
hotsync101
No ratings yet
Kafka As A Storage System
Document6 pages
Kafka As A Storage System
Diullei Gomes
No ratings yet
More Than 80% of All Fortune 100 Companies Trust, and Use Kafka
Document4 pages
More Than 80% of All Fortune 100 Companies Trust, and Use Kafka
Hernan Avila
No ratings yet
HPCC2019
Document9 pages
HPCC2019
sahilsingla112
No ratings yet
Integrating Apache Nifi and Apache Kafka
Document5 pages
Integrating Apache Nifi and Apache Kafka
Mario Soares
No ratings yet
Lecture Intro Kafka
Document27 pages
Lecture Intro Kafka
kuntal.kgec.cse3239
No ratings yet
Top Answers To Kafka Interview Questions
Document3 pages
Top Answers To Kafka Interview Questions
Ejaz Alam
No ratings yet
Apache Kafka Tutorial
Document6 pages
Apache Kafka Tutorial
varam10
No ratings yet
Recommendations For Deploying Apache Kafka On Kubernetes
Document9 pages
Recommendations For Deploying Apache Kafka On Kubernetes
Iosif Koidis
No ratings yet
20 Best Practices For Working With Apache Kafka at Scale - DZone Big Data
Document10 pages
20 Best Practices For Working With Apache Kafka at Scale - DZone Big Data
Suresh Maruthirao
No ratings yet
Apache Kafka
Document11 pages
Apache Kafka
Arta abbasi
No ratings yet
Chapter 1 - Introduction To KAFKA: Objectives
Document17 pages
Chapter 1 - Introduction To KAFKA: Objectives
Suchismita Sahu
No ratings yet
Amazon Managed Streaming For Apache Kafka
Document11 pages
Amazon Managed Streaming For Apache Kafka
Mukesh Barnwal
No ratings yet
Swift-X Accelerating OpenStack Swift With RDMA For
Document10 pages
Swift-X Accelerating OpenStack Swift With RDMA For
mir rownak
No ratings yet
20231206-EB-Top Six Kafka Projects Fail
Document11 pages
20231206-EB-Top Six Kafka Projects Fail
deepak thandra
No ratings yet
File:///E:/Profwork/Dataengineer - Course/File/Literation/Data Pipelines With Oskari Saarenmaa Postgresql Amp Kafka PDF
Document3 pages
File:///E:/Profwork/Dataengineer - Course/File/Literation/Data Pipelines With Oskari Saarenmaa Postgresql Amp Kafka PDF
Achmad Ardi
No ratings yet
Kafka Architectures Notes
Document9 pages
Kafka Architectures Notes
skhanshaikh3
No ratings yet
Kafka Interview Q&A
Document28 pages
Kafka Interview Q&A
Rushi Khandare
No ratings yet
Apache Kafka - Strom Foundation - Classes TOC
Document1 page
Apache Kafka - Strom Foundation - Classes TOC
Chandan Kumar
No ratings yet
Stream Processing Using Kafka
Document46 pages
Stream Processing Using Kafka
1himaniarora
No ratings yet
Apache Kafka Reinvented For The Cloud
Document15 pages
Apache Kafka Reinvented For The Cloud
alamtariq.ds
No ratings yet
Design A Google Analytic Like Backend System
Document3 pages
Design A Google Analytic Like Backend System
Abdul Rehman
No ratings yet
Kafka
Document7 pages
Kafka
Nouhaila
No ratings yet
Apache Kafka Cookbook - Sample Chapter
Document14 pages
Apache Kafka Cookbook - Sample Chapter
Packt Publishing
100% (1)
SantoshR - HDP Kafka
Document6 pages
SantoshR - HDP Kafka
HARSHA
No ratings yet
Introduction To Apache Kafka
Document18 pages
Introduction To Apache Kafka
Bhavin Bhadran
No ratings yet
Kafka
Document23 pages
Kafka
PHƯƠNG THẢO
No ratings yet
AWS Complete
Document64 pages
AWS Complete
yash Mahajan
100% (1)
Apache Kafka Tutorial
Document3 pages
Apache Kafka Tutorial
Mario Soares
No ratings yet
Microservices in The Apache Kafka Ecosystem: Confluent, © 2017 Confluent, Inc
Document12 pages
Microservices in The Apache Kafka Ecosystem: Confluent, © 2017 Confluent, Inc
sahil
No ratings yet
PARAM Shavak: High Performance Computing (HPC), Grid and Cloud Computing
Document3 pages
PARAM Shavak: High Performance Computing (HPC), Grid and Cloud Computing
archikakatiyar
No ratings yet
Confluent Cloud: What Customers Are Saying
Document2 pages
Confluent Cloud: What Customers Are Saying
Deni Diana
No ratings yet
Comparative Study of Opennebula, Cloudstack, Eucalyptus and Openstack
Document6 pages
Comparative Study of Opennebula, Cloudstack, Eucalyptus and Openstack
Fadjar Tandabawana
No ratings yet
Mis 01
Document8 pages
Mis 01
HARSH NAG
No ratings yet
Bda 201070046 01
Document24 pages
Bda 201070046 01
HARSH NAG
No ratings yet
Bda 06
Document15 pages
Bda 06
HARSH NAG
No ratings yet
Bda 05
Document12 pages
Bda 05
HARSH NAG
No ratings yet
6) Matrix Chain Multiplication
Document6 pages
6) Matrix Chain Multiplication
HARSH NAG
No ratings yet
5) N Catalan Numbers
Document4 pages
5) N Catalan Numbers
HARSH NAG
No ratings yet
Safe Lorry Loader Crane Operations
Document4 pages
Safe Lorry Loader Crane Operations
jdmultimodal sdn bhd
No ratings yet
Standard Cost Estimate
Document21 pages
Standard Cost Estimate
MOORTHY
No ratings yet
Napoleonic Wargaming
Document13 pages
Napoleonic Wargaming
andy
No ratings yet
Shandong Baoshida Cable Co, LTD.: Technical Parameter
Document3 pages
Shandong Baoshida Cable Co, LTD.: Technical Parameter
kmiqd
No ratings yet
Arduino Oscilloscope Project
Document12 pages
Arduino Oscilloscope Project
Sathya Narayan
100% (1)
Hydrodynamic Calculation Butterfly Valve (Double Disc)
Document31 pages
Hydrodynamic Calculation Butterfly Valve (Double Disc)
met-calc
No ratings yet
Implicit Explicit Signals
Document8 pages
Implicit Explicit Signals
Versoza Nel
100% (2)
Assignment 2 Mat435
Document2 pages
Assignment 2 Mat435
Arsene Lupin
No ratings yet
Fundamentals of Pain Medicine: Jianguo Cheng Richard W. Rosenquist
Document346 pages
Fundamentals of Pain Medicine: Jianguo Cheng Richard W. Rosenquist
May
No ratings yet
Biology Accel Syllabus 2011-2012
Document3 pages
Biology Accel Syllabus 2011-2012
Mike Deleon
No ratings yet
Capstone
Document23 pages
Capstone
A - CAYAGA, Kirby, C 12 - Hermon
No ratings yet
DR PDF
Document252 pages
DR PDF
a_ouchar
0% (1)
Eng DS Epp-2314 1410
Document2 pages
Eng DS Epp-2314 1410
MarkusAldoMaqu
No ratings yet
Paediatric Intake Form Modern OT 2018
Document6 pages
Paediatric Intake Form Modern OT 2018
Sef
No ratings yet
SDHI18 - Komparativna Analiza Primene Vodostana I Sinhronih Regulatora Turbina
Document13 pages
SDHI18 - Komparativna Analiza Primene Vodostana I Sinhronih Regulatora Turbina
Aleksandar Petkovic
No ratings yet
LET General Math Reviewer
Document7 pages
LET General Math Reviewer
Marco Rhonel Eusebio
100% (1)
Twilight Princess
Document49 pages
Twilight Princess
Hikari Diego
No ratings yet
18 Ray Optics Revision Notes Quizrr
Document108 pages
18 Ray Optics Revision Notes Quizrr
aafaf.sdfddfa
No ratings yet
Semi Finals in Tle 2015
Document3 pages
Semi Finals in Tle 2015
LoraineTenorio
No ratings yet
Tesla Coil Project
Document8 pages
Tesla Coil Project
Shivam singh
No ratings yet
Assignment Booklet July 2021
Document22 pages
Assignment Booklet July 2021
Saksham Tiwari
No ratings yet
Straight Line
Document15 pages
Straight Line
Ayan
No ratings yet
De Vault 1996
Document22 pages
De Vault 1996
Harumi OO
No ratings yet
Table Equivalent Schedule 40 Steel Pipe
Document1 page
Table Equivalent Schedule 40 Steel Pipe
oris
No ratings yet
IV. Network Modeling, Simple System
Document16 pages
IV. Network Modeling, Simple System
Jaya Bayu
No ratings yet
The Broadband Forum
Document21 pages
The Broadband Forum
Anouar Aleya
No ratings yet
GP1 Q1 Week-1
Document18 pages
GP1 Q1 Week-1
kickyknacks
No ratings yet
Rail Vehicle Dynamics
Document55 pages
Rail Vehicle Dynamics
df
No ratings yet
AKI in Children
Document43 pages
AKI in Children
Yonas Awgichew
No ratings yet
12.5 Collision Theory - Chemistry
Document15 pages
12.5 Collision Theory - Chemistry
Ari Clecius
No ratings yet