Professional Documents
Culture Documents
Apache Kafka
TABLE OF CONTENTS
Preface 1
Introduction 1
Installing and Running Apache Kafka 1
Apache Kafka Components 2
The Publish-Subscribe System 2
The Kafka Connect Framework 5
Kafka Streams 7
KStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
KGroupedStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
KTable and KGroupedTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Notable Apache Kafka Alternatives 11
Additional Resources 11
PREFACE
PREFACE • Prerequisites
systems and databases, such as databases (e.g., allowing you to track the performance and
MySQL, PostgreSQL), cloud storage (e.g., status of connectors. Furthermore connectors
Amazon S3), message queues (e.g., RabbitMQ), often come with built-in error handling
and more. These connectors simplify mechanisms, allowing them to handle various
integration with these systems without the issues that may arise during data transfer, such
need for custom coding. as network interruptions, data format/schema
mismatches, and more. This ensures data
• Configuration-Driven: Connectors are
reliability and minimizes data loss.
configured using simple configuration files or
via REST APIs. The configuration defines how
Below is a step-by-step guide of how to stream data
data should be extracted, transformed, and
from a source file to a target file using the Kafka
loaded between Kafka and the external system.
Connect framework with the Confluent Kafka
• Schema Support: Kafka Connectors often Source and File Sink connectors. The File Source
support data serialization and deserialization connector reads data from the source file and
using schemas. This allows connectors to publishes it to a "test-topic." The File Sink connector
handle structured data formats like Avro, JSON, subscribes to "test-topic" and writes the data to the
or others, maintaining compatibility and data target file as specified in its configuration.
quality.
Prerequisites
• Transformation and Enrichment: You can
apply transformations and enrichments to data • Make sure you have Kafka and Kafka Connect
as it flows through Kafka Connect. These set up and running.
transformations can include filtering, mapping,
and more, depending on your requirements. • Download and install the Confluent Kafka
Source and File Sink connectors.
• Real-time Data Processing: Kafka connectors
facilitate real-time data streaming, enabling Create a Kafka topic (Optional)
you to process and analyze data as it flows
through your pipeline. This is crucial for First, create a Kafka topic to which you want to
applications that require timely insights and publish the data from the source file. You can use
actions based on the data. the Kafka command-line tools for this:
JCG delivers over 1 million pages each month to more than 700K software
developers, architects and decision makers. JCG offers something for everyone,
including news, tutorials, cheat sheets, research guides, feature articles, source code
and more.
CHEATSHEET FEEDBACK
WELCOME
support@javacodegeeks.com
Copyright © 2014 Exelixis Media P.C. All rights reserved. No part of this publication may be SPONSORSHIP
reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, OPPORTUNITIES
mechanical, photocopying, or otherwise, without prior written permission of the publisher. sales@javacodegeeks.com